使用多特征联合变量的支持向量机方法预测外膜蛋白
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(No. 60603054) 资助。


Prediction of Outer Membrane Proteins Using Support Vector Machine with Combined Features
Author:
Affiliation:

Fund Project:

the National Natural Science Foundation of China (No. 60603054).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。

    Abstract:

    Outer membrane proteins (OMPs) are embedded in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts. The cellular location and functional diversity of OMPs makes them an important protein class. Researches on prediction of OMPs by bioinformatics methods can bring helpful methodologies for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this paper, three feature classes were calculated from protein sequences: amino acid compositions, dipeptide compositions and weighted amino acid index correlation coefficients. Then, three feature classes were combined and inputted into a support vector machine (SVM) based predictor to identify OMPs from other folding types of proteins. The results of discrimination using several combined features including four amino acid index categories were calculated, and the influence on discrimination accuracy using different correlation coefficients with different orders and weights was discussed. In cross-validated tests and independent tests for identifying OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins, the method using combined features obtains an overall accuracy of 96.96% and 97.33% respectively. And these results outperform that of other methods in the literature. Using this method, high specificities are shown from the results of identifying OMPs in five bacterial genomes, and over 99% OMPs with known three-dimensional structures in the PDB database are correctly discriminated. These results indicate that the method is a powerful tool for OMPs discrimination in genomes.

    参考文献
    相似文献
    引证文献
引用本文

邹凌云,王正志,王勇献. 使用多特征联合变量的支持向量机方法预测外膜蛋白[J]. 生物工程学报, 2008, 24(4): 651-658

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-08-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
生物工程学报 ® 2024 版权所有

通信地址:中国科学院微生物研究所    邮编:100101

电话:010-64807509   E-mail:cjb@im.ac.cn

技术支持:北京勤云科技发展有限公司