全外显子组测序分析中预处理方法和变异识别方法的比较
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Comparison of methods of pre-processing and variant filtering in analyzing whole exome sequencing data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的:比较全外显子组数据分析中不同的预处理方法和变异过滤方法对变异识别的影响。方法:利用2例全外显子组测序数据,从使用不同的预处理方法(FASTX-Toolkit、Trimmomatic及未做预处理)、修饰后不成对读长(single-end reads,SE)取舍策略以及变异过滤方法[Hard过滤和变异质量得分重新校正(variant quality score recalibration,VQSR)]3个方面,通过数据覆盖深度(depth of coverage,DP)、识别变异的数目、转换/颠换比值和基因型一致性等特征,比较他们对全外显子组变异识别结果的影响。结果:Trimmomatic预处理后的读长测序DP与未预处理的原始数据接近,但明显高于FASTX-Toolkit预处理方法。当DP≥10×且基因型质量分数(genotype quality score,GQ)≥20时,经Trimmomatic预处理后识别到的单核苷酸变异(single nucleotide variant,SNV)数量比FASTX-Toolkit多,与未预处理组接近。当包含SE时,FASTX-Toolkit组多识别出的SNV数量高于(28%)Trimmomatic组(5%)。当样本量较少时,在所有实验组中Hard过滤方法滤掉的SNV要少于VQSR。结论:Trimmo-matic修饰(过滤)原始序列更温和,而FASTX-Toolkit可能过度过滤了原始数据。保留SE有利于下游变异识别。Hard过滤相较于VQSR表现出了更高的容忍度。

    Abstract:

    Objective:To investigate effects of methods of pre-processing and variant filtering on variant recognition in analyzing whole exome sequencing data. Methods:Through the calculation of depth of coverage(DP),number of variants,transition/transversion and non-reference concordance,we compared the effects of different pre-processing methods(FASTX-Toolkit,Trimmomatic and non treat-ment) and strategies of single-end(SE) inclusion and ‘Hard’ filter and variants quality score recalibration(VQSR) on variants call-ing in variants filter using whole exome sequencing data from two test samples. Results:Trimmomatic pre-processed reads showed similar DP to reads without pre-processing,but significantly higher than that of FASTX-Toolkit pre-processed reads. With DP ≥10× and genotype quality(GQ)≥20,number of called single nucleotide variants(SNV) identified by Trimmomatic was greater than that identified by FASTX-Toolkit,but similar to that without pre-processing. With the inclusion of SE,number of variants increased signif-icantly for FASTX-Toolkit pre-processing(28%) than Trimmomatic pre-processing(5%). In the all settings,‘Hard’ filtering filtered less SNVs than VQSR filtering in small sample size. Conclusions:Sequence reads are trimmed and/or filtered moderately by Trim-momatic,whereas they seemed to be over-filtered by FASTX-Toolkit. Keeping the SE is good for variants recognition in the down-stream analysis. The ‘Hard’ filtering showed a more favorable tolerability profile than ‘VQSR’ filtering.

    参考文献
    相似文献
    引证文献
引用本文

闫 瑾,潘 琦,任 红.全外显子组测序分析中预处理方法和变异识别方法的比较[J].重庆医科大学学报,2013,(12):1397-1404

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2014-10-15
  • 出版日期:
文章二维码