What is snpQT?
snpQT
(pronounced snip-cutie) makes your single-nucleotide polymorphisms
cute. Also, it provides support for processing human genomic variants to do:
- human genome build conversion
- sample quality control
- population stratification
- variant quality control
- pre-imputation quality control
- local imputation
- post-imputation quality control
- genome-wide association studies
within an automated nextflow pipeline. We run a collection of versioned bioinformatics software in Singularity and Docker containers or Anaconda and Environment Modules environments to improve reliability and reproducibility.
Who is snpQT for?
snpQT
might be useful for you if:
- you want a clean genomic dataset using a reproducible, fast and comprehensive pipeline
- you are interested to identify significant SNP associations to a trait
- you want to identify and remove outliers based on their ancestry
- you wish to perform imputation locally
- you wish to prepare your genomic dataset for imputation in an external server (following a comprehensive QC and a pre-imputation QC preparation)
What do you need to get started?
- you have already called your variants using human genome build 37 or 38
- your variants are in VCF or
plink
bfile format - your variants have "rs" ids
- your samples have either a binary or a quantitative phenotype
If this sounds like you, check out the installation guide.
snpQT
definitely won't be useful for you if:
- you want to do quality control on raw sequence reads
- you want to call variants from raw sequence reads
- you are working on family GWAS data
- you're not working with human genomic data
Citation
If you find snpQT
useful please cite:
Vasilopoulou C, Wingfield B, Morris AP and Duddy W. snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:567 https://doi.org/10.12688/f1000research.53821.1
License and third-party software
snpQT
is distributed under a MIT license. Our pipeline wouldn't be possible without the following amazing third-party software:
Software | Version | Reference | License |
---|---|---|---|
EIGENSOFT | 7.2.1 | Price, Alkes L., et al. "Principal components analysis corrects for stratification in genome-wide association studies." Nature genetics 38.8 (2006): 904-909. | Custom open source |
impute5 | 1.1.4 | Rubinacci, Simone, Olivier Delaneau, and Jonathan Marchini. "Genotype imputation using the positional burrows wheeler transform." PLoS Genetics 16.11 (2020): e1009049.APA | Academic use only |
nextflow | 21.04.3 | Di Tommaso, Paolo, et al. "Nextflow enables reproducible computational workflows." Nature biotechnology 35.4 (2017): 316-319. | GPL3 |
picard | 2.24.0 | MIT | |
PLINK | 1.90b6.18 | Purcell, Shaun, et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American journal of human genetics 81.3 (2007): 559-575. | GPL3 |
PLINK2 | 2.00a2.3 | Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4. | GPL3 |
samtools | 1.11 | Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1-4, 2021 | MIT |
bcftools | 1.9 | Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1-4, 2021 | MIT |
shapeit4 | 4.1.3 | Delaneau, Olivier, et al. "Accurate, scalable and integrative haplotype estimation." Nature communications 10.1 (2019): 1-10. | MIT |
snpflip | 0.0.6 | https://github.com/biocore-ntnu/snpflip | MIT |
We also use countless other bits of software like R, the R tidyverse, etc.