Objective:To investigate effects of methods of pre-processing and variant filtering on variant recognition in analyzing whole exome sequencing data. Methods:Through the calculation of depth of coverage(DP),number of variants,transition/transversion and non-reference concordance,we compared the effects of different pre-processing methods(FASTX-Toolkit,Trimmomatic and non treat-ment) and strategies of single-end(SE) inclusion and ‘Hard’ filter and variants quality score recalibration(VQSR) on variants call-ing in variants filter using whole exome sequencing data from two test samples. Results:Trimmomatic pre-processed reads showed similar DP to reads without pre-processing,but significantly higher than that of FASTX-Toolkit pre-processed reads. With DP ≥10× and genotype quality(GQ)≥20,number of called single nucleotide variants(SNV) identified by Trimmomatic was greater than that identified by FASTX-Toolkit,but similar to that without pre-processing. With the inclusion of SE,number of variants increased signif-icantly for FASTX-Toolkit pre-processing(28%) than Trimmomatic pre-processing(5%). In the all settings,‘Hard’ filtering filtered less SNVs than VQSR filtering in small sample size. Conclusions:Sequence reads are trimmed and/or filtered moderately by Trim-momatic,whereas they seemed to be over-filtered by FASTX-Toolkit. Keeping the SE is good for variants recognition in the down-stream analysis. The ‘Hard’ filtering showed a more favorable tolerability profile than ‘VQSR’ filtering.
Reference
Related
Cited by
Get Citation
YAN Jin, PAN Qi, REN Hong. Comparison of methods of pre-processing and variant filtering in analyzing whole exome sequencing data[J]. Journal of Chongqing Medical University,2013,(12):1397-1404