当前位置 :首页>研究报道

DeepARG:从宏基因组数据预测抗生素耐药基因的深度学习方法

发布者:抗性基因网 时间:2020-03-24 浏览量:2454

       摘要

       背景:对抗生素耐药率不断上升的担忧,要求扩大和全面的全球监测。特别需要改进监测环境介质(如废水、农业废弃物、食品和水)的方法,以确定新的抗生素抗性基因(ARGs)的潜在资源、基因交换的热点以及ARGs的传播和人类接触的途径。下一代测序现在可以直接访问和分析整个亚基因组DNA池,在那里,arg通常是根据对现有数据库的序列搜索的“最佳点击”来识别或预测的。不幸的是,这种方法产生了很高的假阴性率。为了解决这些限制,我们在这里提出一种深度学习方法,考虑到使用所有已知的arg类别创建的不同矩阵。分别针对短读序列和全基因长度序列构建了两种深度学习模型DeepARG-SS和DeepARG-LS。
        结果:对30个抗生素耐药类别的深度学习模型的评价表明,DeepARG模型可以同时预测高精度的ARGs(>0.97)和召回率(>0.90)。与典型的最佳命中率方法相比,该模型显示出一种优势,它可以持续降低假阴性率,从而提高整体召回率(大于0.9)。随着越来越多的数据可用于表示不足的ARG类别,由于底层神经网络的性质,DeepARG模型的性能有望进一步提高。我们最新开发的ARG数据库DeepARG DB包含了高度可信的ARGs预测和广泛的手动检查,极大地扩展了当前的ARG存储库。
        结论:与当前的生物信息学实践相比,本文开发的深度学习模型提供了更为准确的抗菌耐药性注释。DeepARG不需要严格的截断,这使得能够识别更广泛的arg多样性。DeepARG模型和数据库作为命令行版本和Web服务在http://bench.cs.vt.edu/DeepARG上提供。


BACKGROUND:

Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively.

RESULTS:

Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories.

CONCLUSIONS:

The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

       https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5796597/