​​​​​​​​​​​​​​​​​​Youtube.jpgFacebook_2015.pngTwitter.jpg​​​​Linkedin​JOBS​​​​CALLS​​​​​​​​PR​ESS​​​​​​RESTRICT​ED AREA
S-MIG++

S-MIG++ is a sampling based, memory and runtime efficient algorithm for the whole-genome LD-based haplotype blocks recognition. It uses the haplotype block definition proposed by Gabriel et al. 2002, which is the most commonly used definition and was implemented in software like Haploview (Barrett et al. 2005) and PLINK (Purcell et al. 2007). 

The S-MIG++ algorithm is significantly faster than its predecessor MIG++, which was implemented in the LDExplorer R package. It is specifically designed to process huge datasets with millions of SNPs and/or thousands of samples. 

The integrated support for distributed computations in S-MIG++ makes this algorithm especially scalable.

The runtime efficiency and scalability in S-MIG++ were achieved by using two steps approach:

  1. sample a small proportion of SNP pairs within a chromosome and estimate upper limits for haplotype block boundaries (Figure 1);
  2. refine the exact haplotype blocks boundaries within their estimated upper limits (Figure 2).
                                                                                                                                         
 Figure 1. Sampled SNP pairs and estimated haplotype block boundaries (gray line). Red, green, and blue colors reflect strong, moderate and low LD between SNPs, respectively. Figure 2. Refined exact haplotype block boundaries (black line). Red, green, and blue colors reflect strong, moderate and low LD  between SNPs, respectively. 


Our experiments showed, that it is sufficient to sample only 1%-5% of all SNP pairs within a chromosome. The probability of error in estimations is proved to be not greater than 0.01 and in practice is very close to 0.

Downloads

The source code of the S-MIG++ algorithm is available below:

Compilation

To compile the S-MIG++ algorithm:

 1) decompress the SMIGPP_X.Y.Z.tar.gz (or SMIGPP_X.Y.Z_MPI.tar.gz);

 2) execute make command.

​Usage

The software accepts both HapMap II and VCF format files with phased genotypes.

The detailed description of the command line arguments and output format can be obtained by executing:

./smigpp --help

or

./smigppMPI --help

​Requirements

Below are listed requirements for the S-MIG++ compilation and use:

  • Linux operating system.
  • C++ compiler with C++11 support. Preferably, from GNU Compiler Collection (GCC) version 4.9.1 or higher.
  • GNU Scientific Library (GSL).
  • zlib compression library.
  • Open MPI (for distributed computations only). 

​​GPL License

Copyright © 2014 by Daniel Taliun, Johann Gamper and Cristian Pattaro. All rights reserved.

S-MIG++ is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. 

S-MIG++ is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. 

You should have received a copy of the GNU General Public License along with S-MIG++. If not, see http://www.gnu.org/licenses/.

Contact Us

For questions, comments, or any other help regarding the S-MIG++, please contact us through email daniel.taliun@eurac.edu.

​​​

CONTACT
Galvanistraße 31/Via Galvani 31
39100 BOZEN-BOLZANO
Tel. +39 0471 055 500
Fax. +39 0471 055 599
​​​​​​​​​​​​​​​​​​​​​QUICK LINKS
AWARDS

 
HOW TO REACH US

 
CONTACT

Viale Druso, 1 / Drususallee 1
39100 Bolzano / Bozen - Italy
Tel: +39 0471 055 055
Fax: +39 0471 055 099
Email: info@eurac.edu
Partita IVA: 01659400210
Newsletter​           Privacy
Host of the Alpine Convention