当前位置 :首页>研究报道

低丰度物种和抗性基因的长读和短读宏基因组组装的比较

发布者:抗性基因网 时间:2023-06-09 浏览量:252

摘要
      最近的技术和计算进步使宏基因组组装成为实现复杂微生物群落高分辨率视图的可行方法。在之前的基准测试中,短读(SR)宏基因组组装器具有最高的准确性,长读(LR)组装器生成最连续的序列,而杂交(HY)组装器平衡了长度和准确性。然而,没有评估具体比较这些组装器在低丰度物种上的性能,这些物种包括肠道中的临床相关生物。我们通过将少量且不断增加的大肠杆菌分离物读数添加到粪便宏基因组中,生成了半合成的LR和SR数据集,并使用不同的组装器检查了大肠杆菌重叠群和抗生素耐药性基因(ARGs)的存在。对于ARG组装,尽管SR组装者以高精度回收了更多的ARG,即使在低覆盖率下,LR组装也允许将ARG放置在更长的大肠杆菌特异性重叠群中,从而精确定位其分类起源。HY组装体以高精度鉴定抗性基因,并且具有比LR组装体更低的邻接性。即使我们的分离物中掺入了竞争菌株,每种组装类型的优势都得到了保持,这会分散并降低所有组装的准确性。对于菌株表征和确定基因背景,LR组装是最佳的,而对于碱基准确的基因鉴定,SR组装器优于其他选择。HY装配提供了邻接性和基本精度,但需要在多个平台上生成数据,并且当存在应变多样性时,可能会出现高误装配率。我们的研究结果强调了与每种回收低丰度分类群的方法相关的权衡,并且最佳方法取决于目标。
Abstract
Recent technological and computational advances have made metagenomic assembly a viable approach to achieving high-resolution views of complex microbial communities. In previous benchmarking, short-read (SR) metagenomic assemblers had the highest accuracy, long-read (LR) assemblers generated the most contiguous sequences and hybrid (HY) assemblers balanced length and accuracy. However, no assessments have specifically compared the performance of these assemblers on low-abundance species, which include clinically relevant organisms in the gut. We generated semi-synthetic LR and SR datasets by spiking small and increasing amounts of Escherichia coli isolate reads into fecal metagenomes and, using different assemblers, examined E. coli contigs and the presence of antibiotic resistance genes (ARGs). For ARG assembly, although SR assemblers recovered more ARGs with high accuracy, even at low coverages, LR assemblies allowed for the placement of ARGs within longer, E. coli-specific contigs, thus pinpointing their taxonomic origin. HY assemblies identified resistance genes with high accuracy and had lower contiguity than LR assemblies. Each assembler type’s strengths were maintained even when our isolate was spiked in with a competing strain, which fragmented and reduced the accuracy of all assemblies. For strain characterization and determining gene context, LR assembly is optimal, while for base-accurate gene identification, SR assemblers outperform other options. HY assembly offers contiguity and base accuracy, but requires generating data on multiple platforms, and may suffer high misassembly rates when strain diversity exists. Our results highlight the trade-offs associated with each approach for recovering low-abundance taxa, and that the optimal approach is goal-dependent.

https://academic.oup.com/bib/article-abstract/24/2/bbad050/7048897?login=false