PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Rania A. Abul Seoud; Nahed H. Solouma; Abou-Baker M. Youssef; Yasser M. Kadah

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33093

PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah

Abstract:

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060042

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2253

References:

[1] D. Eisenberg, "DIP - Database of interacting Proteins," University of California, http://dip.doe-mbi.ucla.edu. 1999.
[2] "BOND - Biomolecular Object network databank," Thomson Scientific, http://www.bind.ca. 1999.
[3] T. Igarashi, and H. Kaminuma, "CSNDB - Cell Signaling Networks Database," National Institute of Health Sciences, Japan, http://geo.nihs.go.jp/csndb. 1998.
[4] H. Higashi-ku, and Fukuoka, "Signaling PAthway Database (SPAD)," Kyushu University, http: //www.grt.Kyushu-u.ac.jp/ eny-doc. 1998.
[5] "MEDLINE - National Library of Medicine (NLM)," National Institutes of Health (NIH), http://www.nlm.nih.gov. 1993.
[6] "PubMed Centeral," (NCBI), http://www.ncbi.nlm.nih.gov /sites/entrez/. 1988.
[7] K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi., "Toward Information Extraction: Identifying protein names from biological papers," Proc. Pacific Symp. Biocomputing, pp. 707-718, 1998.
[8] C. Blaschke, M. Andrade, C. Ouzounis, and A. Valencia, "Automatic extraction of biological information from scientific Text: Protein-Protein interactions," Proc. AAAI Conf. Intelligence sys. in Molecular biology, pp. 60-67, 1999.
[9] T. Sekimizu, H.S. Park, and J. Tsujii, "Identifying the Interaction between Genes and gens Products based on Frequently Seen Verbs in MEDLINE Abstracts," Genome inform Ser Workshop Genome inform., pp. 62-71, 1998.
[10] N.S. Kiong, M. Wong, "Toward Routine Automatic pathway Discovery from on-line scientific text Abstractsd," Proc. Tenth Inter. Workshop Genome inform., pp. 104-112, 1999.
[11] A. Clegg, and A. Shepherd "Benchmarking Natural-Language Parsers for biological Applications using dependency Graphs," J. BMC Bioinformatics, vol.8- pp. 24, Jan 2007.
[12] J. Thomas, "D. Milward, C.A. Ouzounis, S. Pulman, and M. Caroll, "Automatic Extraction of Protein Interactions from Scientific Abstracts", Pacific Symp. Biocomputing, pp. 541-552, 2000.
[13] L. Gondy, C. Hsinchun, and D. Jesse, "A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text," J. Biomedical Informatics, vol.36, pp. 145-158, August 2004.
[14] G. Claudio, L. Alberto, and Lorenza Romano, "Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature," Proc. 11th Conf. the European Chapter of the Association for Computational Linguistics (EACL 2006), 2006.
[15] C. Friedman, "MedLEE - A Medical Language Extraction and Encoding System," Columbia University, and Queens College of CUNY, http://lucid.cpmc.columbia.edu/medlee. 1995.
[16] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, "GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles," J. Bioinformatics, vol. 17, pp. 74-82(9), June 2001.
[17] C. Friedman, "MedScan - A Medical Language Extraction and Encoding System," Columbia University, and Queens College of CUNY, http://www.ariadnegenomics.com/products/medscan. 1995.
[18] A. Rzhetsky, "Geneways: A search engine and information extraction tool for biological research," Columbia Genome Center, http://geneways.genomecenter.columbia.edu. 2005.
[19] D. Corney, D. Jones and B. Buxton, "BioRAT System," Columbia Genome Center, http://bioinf.cs.ucl. Ac.uk/biorat. 2005.
[20] J. Xiao, J. Su, G. Zhou and C. Tan, "Protein-Protein Interaction Extraction: A Supervised Learning Approach," Proc. first Inter. Symp. Semantic mining in Biomedicine (SMBM 2005), pp. 51-59, 2005.
[21] J. Ding, D. Berleant, J. Xu, and A.W. Fulmer, "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser," Proc. 15th IEEE Inter. Conf. Tools with Artificial Intelligence (ICTAI-03), pp. 467- 471, 2003.
[22] Y.C. Lin, C.L. Peng, C.Y. Kao, H.F. Juan,H. C. Huang, "ProtExt: A system for protein-protein interactionextraction from PubMed abstracts" , Proc. 12th Inter. Conf. Intelligent Systems for Molecular Biology (ISMB) and Conf. Computational Biology (ECCB), 2005.
[23] S.T. Ahmed, D. Chidambaram, H. Davulcu, and C. Baral, "IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio- Medical Text," Proc. ACL-ISMB workshop linking biological literature, ontologies and databases: Mining biological semantics, pp. 54-61, 2005.
[24] Z. Yang, H. Lin, and B. Wu, "BioPPIExtractor: A Protein-Protein Interaction Extraction System for PubMed Abstracts," J. Expert Systems with Applications, Article in press, doi: 10.1016 /j.eswa.2007.12.014. 23 Dec. 2007.
[25] "LocusLink - Database of genes," (NCBI), http://www.ncbi.nlm.nih.gov/sites/ entrez?db=gene. 1988.
[26] "Universal Protein Resource (UniProt)," European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR), http://beta.uniprot.org. 2002.
[27] "ExPASy Proteomics Server," Swiss Institute of Bioinformatics (SIB), http://www.expasy.ch. 2003.
[28] R. Hoffmann and A. Valencia, "A Gene Network for Navigating the Literature - iHOP," Nature Genetics, http://www.ihop-net.org. 2004.
[29] D. Temperley, D. Sleator, and J. Lafferty, "Link Grammar," Carnegie Mellon University, http://www.link.cs.cmu.Edu/link. 1998.
[30] D. Sleator, and D. Temperley, "Parsing English with a Link Grammar," Third International Workshop on Parsing Technologies, pp. 277-292, 1993.
[31] D. Grinberg, J. Lafferty, and D. Sleator, "A Robust Parsing Algorithm for Link Grammars," Proc. second inter. colloquium on grammatical inference and applications, vol. 862, pp. 78-92,1995.
[32] D. Temperley, D. Sleator, and J. Lafferty, "Abiword- word processor for everyone," Carnegie Mellon University, http://www.abisource.com. 1998.
[33] D. Brian "Lingua::LinkParser- Perl module implementing the Link Grammar Parser," Carnegie Mellon University, http://search.cpan.org/~dbrian/Lingua- LinkParser 1.08. 2004.
[34] "CPAN - Comprehensive Perl Archive Network," http:// www.cpan.org. 1995
[35] D. Temperley, D. Sleator, and J. Lafferty, "The parser Application Program Interface (API)," Carnegie Mellon University, http://www.abisource.com/projects/link-grammar/api/index.html. 1998.
[36] S. Pyysalo, T. Salakoski, S. Aubin and A. Nazarenko, "Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches," J. BMC Bioinformatics, vol. 7, pp. 60-67, November 2006.
[37] E. Turner, "The LinkGrammar-WN," http://www.eturner.net/ linkgrammar-wn.2007
[38] "WordNet-a lexical database for the English language," Princeton University, http://wordnet.princeton.edu. 2006
[39] P. Szolovits, "Adding a Medical Lexicon to an English Parser," Proc. AMIA 2003 Annual Symposium. pp. 639-643 ,2003
[40] "UMLS'-Unified Medical Language System," U.S. National Library of Medicine, http://umlsinfo.nlm.nih.gov.1999.
[41] S. Pyysalo, F. Ginter, T. Pahikkala, J. Boberg, J. J┬¿arvinen, and T. Salakoski, "Evaluation of Two Dependency Parsers on Biomedical Corpus Targeted at ProteinÔÇöProtein interactions," J. Inter. Medical Informatics, Vol. 75, Issue 6, pp. 430-442, June 2005.
[42] V. Harsha, Madhyastha, N. Balakrishnan, K.R. Ramakrishnan "Event Information Extraction Using Link Grammar," Inter. Workshop Research Issues in Data Eng.: Multi-lingual Information Management (RIDE'03), pp. 16- 22, 2003.
[43] L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu5, "Accomplishments and Challenges in Literature Data Mining for Biology." J. Bioinformatics, vol. 18, pp. 1553-1561, June 2002.