New file format helps researchers reduce DNA analysis time
The University of New South Wales and the Garvan Institute of Medical Research have developed a new computer file format to speed up nanopore sequencing analysis and improve specialist treatments for patients with cancer and other diseases.
Posted in Natural biotechnologyresearch has indicated that the new SLOW5 format can process complex DNA nanopore sequencing “more than 30 times faster” than the previous file format called – ironically – FAST5.
Nanopore sequencing is used to identify a range of diseases and help healthcare professionals analyze DNA samples in detail so they can provide tailored treatments for cancer patients.
The data produced from this process was routinely saved in FAST5 file formats, which produced large files of approximately 1.3 terabytes, equivalent to approximately 650 hours of high definition video. Due to its large size, it would take computers two weeks to process FAST5 files, the researchers said.
However, lead author and genomic computing systems engineer from the Garvan Institute, Hasindu Gamaarachchi, said that processing human genome data using SLOW5 is reduced to half a day.
He explains that unlike FAST5, the SLOW5 format allows for parallel computing in which multiple processors can simultaneously run multiple smaller analyzes decomposed from a larger, complex, and complete data set.
“You can think of it like trying to dig a really big hole with 10 people, but there’s only one shovel they have to share. That was how it was with FAST5,” he said. .
“But with SLOW5, everyone has their own shovel, and they can all dig at the same time and get the job done much faster.
“The FAST5 format is slow because the data cannot be accessed in parallel. It is based on the Hierarchical Data Format which was designed in the 1990s to work on machines which at the time had only one processor , rather than on modern machines that include multiple processors.
“The hierarchical data format is also generic, whereas the SLOW5 is purpose-built. So in terms of the digging analogy, it’s like we’re also providing a purpose-built shovel for the type of soil. And because the new SLOW5 can be accessed in parallel by several processors at the same time, the processing time has been reduced by 30.”