What is snpQT?

snpQT (pronounced snip-cutie) makes your single-nucleotide polymorphisms cute. Also, it provides support for processing human genomic variants to do:

human genome build conversion
sample quality control
population stratification
variant quality control
pre-imputation quality control
local imputation
post-imputation quality control
genome-wide association studies

within an automated nextflow pipeline. We run a collection of versioned bioinformatics software in Singularity and Docker containers or Anaconda and Environment Modules environments to improve reliability and reproducibility.

Who is snpQT for?

snpQT might be useful for you if:

you want a clean genomic dataset using a reproducible, fast and comprehensive pipeline
you are interested to identify significant SNP associations to a trait
you want to identify and remove outliers based on their ancestry
you wish to perform imputation locally
you wish to prepare your genomic dataset for imputation in an external server (following a comprehensive QC and a pre-imputation QC preparation)

What do you need to get started?

you have already called your variants using human genome build 37 or 38
your variants are in VCF or plink bfile format
your variants have "rs" ids
your samples have either a binary or a quantitative phenotype

If this sounds like you, check out the installation guide.

snpQT definitely won't be useful for you if:

you want to do quality control on raw sequence reads
you want to call variants from raw sequence reads
you are working on family GWAS data
you're not working with human genomic data

Citation

If you find snpQT useful please cite:

Vasilopoulou C, Wingfield B, Morris AP and Duddy W. snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:567 https://doi.org/10.12688/f1000research.53821.1

License and third-party software

snpQT is distributed under a MIT license. Our pipeline wouldn't be possible without the following amazing third-party software:

Software	Version	Reference	License
EIGENSOFT	7.2.1	Price, Alkes L., et al. "Principal components analysis corrects for stratification in genome-wide association studies." Nature genetics 38.8 (2006): 904-909.	Custom open source
impute5	1.1.4	Rubinacci, Simone, Olivier Delaneau, and Jonathan Marchini. "Genotype imputation using the positional burrows wheeler transform." PLoS Genetics 16.11 (2020): e1009049.APA	Academic use only
nextflow	21.04.3	Di Tommaso, Paolo, et al. "Nextflow enables reproducible computational workflows." Nature biotechnology 35.4 (2017): 316-319.	GPL3
picard	2.24.0		MIT
PLINK	1.90b6.18	Purcell, Shaun, et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American journal of human genetics 81.3 (2007): 559-575.	GPL3
PLINK2	2.00a2.3	Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4.	GPL3
samtools	1.11	Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1-4, 2021	MIT
bcftools	1.9	Danecek, Petr et al. "Twelve years of SAMtools and BCFtools." GigaScience, 10(2), 1-4, 2021	MIT
shapeit4	4.1.3	Delaneau, Olivier, et al. "Accurate, scalable and integrative haplotype estimation." Nature communications 10.1 (2019): 1-10.	MIT
snpflip	0.0.6	https://github.com/biocore-ntnu/snpflip	MIT

We also use countless other bits of software like R, the R tidyverse, etc.