Today, completely sequencing the genome of an individual human being is still time-consuming and cost intensive. However, with technology developed by a team of researchers from Eurac Research and Medizin Uni Innsbruck, DNA analyses have been drastically simplified and costs reduced by 50-100 times. Currently the largest sequencing study in the world is using the technology that was born in Bolzano.
The genomes of more than 53,000 people were sequenced by the TOPMed research consortium based in Washington USA. The study participants originated from the general American population with a special focus on minorities. The health status of the sequenced participants varied, some of whom were also suffering from diseases. Sequencing a genome means decoding all three billion positions of the genome. The large-scale study, involving more than 30 research groups, is currently investigating genetic causes of heart, lung, blood and sleep disorders. Initial results have shown that while 2.6 billion positions in people's genomes are identical, the research group was able to identify around 400 million positions where people's genomes differ, so-called variants. This data set now forms the basis for future in-depth research into the extent to which these variants are associated with certain diseases. If these genetic risk factors are known, the diagnosis, treatment and prevention of the corresponding diseases can be significantly improved.
Now there is a fundamental innovation in the bioinformatics field that makes it possible to examine the genetic data of a large number of people in a relatively short period of time, called "imputation". This mathematical procedure enables an adequate measurement of only part of the genome and the remaining part is "filled in" or extrapolated. The underlying algorithm was developed by South Tyrolean bioinformatician Christian Fuchsberger, who works at the Institute of Biomedicine at Eurac Research. Thanks to the new method, research entailing the enormous amounts of data involved in sequencing can be drastically simplified and significantly reduce the costs of DNA analysis. Ten years ago, the complete sequencing of a single human genome took twenty days, today it can be undertaken within a day, but still costs around 1,000 euros for each genome. Due to the large amount of data, decoding is still costly and computationally intensive and requires special technologies such as Next Generation Sequencing, which sequences a very large number of DNA molecules simultaneously. Although the result of imputation is not identical to that of sequencing, all positions of the genome can be determined with a very high probability, and the costs are reduced by a factor of 50-100 compared to complete sequencing.
To advance research on genetic risk factors, the TOPMed consortium turned to the Eurac Research Institute of Biomedicine and the Medical University of Innsbruck’s Institute of Genetic Epidemiology. Both institutes are experts in "Big Data" who for years have been operating one of the largest bioinformatics web services based on the algorithm developed in Bolzano and whose latest contributions to the software has enabled the consortium to make its data collection available to other research groups. The technological development of the web service is also in the hands of Christian Fuchsberger. "From a technological point of view, imputing is a very computationally intensive step that was previously not possible with a normal computer. We have now refined the technology so that researchers can access the service from their own computers in the simplest possible way," says Fuchsberger, who has been involved in the success story from the creation of the algorithm to the software to the sought-after web service from the very beginning. "The development of our software based on the algorithm quickly attracted international interest. Shortly after the paper was published, the operators of the world's most widely used programmes for DNA analysis got in touch and asked if they could integrate our software into their programmes," he says. Together with colleagues from the Institute of Genetic Epidemiology at the Medical University of Innsbruck, Fuchsberger has been working constantly over the past few years to improve the web service and to meet the ongoing requests from international users and research groups.
The web service is linked to the databases of numerous large sequencing studies - with the combined genetic data of 97,000 subjects of which the TOPMed study is currently the largest. During imputation the software can access all the databases and complete the fragmentary genome analyses being fed in. More and more research consortia are turning to Fuchsberger and his team to integrate their data. Both data protection and data security are particularly important and here too, the team has found a way for the research community to benefit from the databases while maintaining the confidentiality of the individual data. "We have further developed the web service so that a user only uploads their own data and gets their specific sequencing back, no one can access other studies raw data which has been collected in the databases," explains Fuchsberger.
For now, the web service is operating on a server consisting of more than 1,500 processors on which the imputations are simultaneously executed. So far, 1,200 users from the research world have benefitted from the web service which hosts the genomic information of more than 13 million people.
Link to the TOPMed Imputation-Server: https://imputation.biodatacatalyst.nhlbi.nih.gov/#!