Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31903
PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah


Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2076


[1] D. Eisenberg, "DIP - Database of interacting Proteins," University of California, 1999.
[2] "BOND - Biomolecular Object network databank," Thomson Scientific, 1999.
[3] T. Igarashi, and H. Kaminuma, "CSNDB - Cell Signaling Networks Database," National Institute of Health Sciences, Japan, 1998.
[4] H. Higashi-ku, and Fukuoka, "Signaling PAthway Database (SPAD)," Kyushu University, http: // eny-doc. 1998.
[5] "MEDLINE - National Library of Medicine (NLM)," National Institutes of Health (NIH), 1993.
[6] "PubMed Centeral," (NCBI), /sites/entrez/. 1988.
[7] K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi., "Toward Information Extraction: Identifying protein names from biological papers," Proc. Pacific Symp. Biocomputing, pp. 707-718, 1998.
[8] C. Blaschke, M. Andrade, C. Ouzounis, and A. Valencia, "Automatic extraction of biological information from scientific Text: Protein-Protein interactions," Proc. AAAI Conf. Intelligence sys. in Molecular biology, pp. 60-67, 1999.
[9] T. Sekimizu, H.S. Park, and J. Tsujii, "Identifying the Interaction between Genes and gens Products based on Frequently Seen Verbs in MEDLINE Abstracts," Genome inform Ser Workshop Genome inform., pp. 62-71, 1998.
[10] N.S. Kiong, M. Wong, "Toward Routine Automatic pathway Discovery from on-line scientific text Abstractsd," Proc. Tenth Inter. Workshop Genome inform., pp. 104-112, 1999.
[11] A. Clegg, and A. Shepherd "Benchmarking Natural-Language Parsers for biological Applications using dependency Graphs," J. BMC Bioinformatics, vol.8- pp. 24, Jan 2007.
[12] J. Thomas, "D. Milward, C.A. Ouzounis, S. Pulman, and M. Caroll, "Automatic Extraction of Protein Interactions from Scientific Abstracts", Pacific Symp. Biocomputing, pp. 541-552, 2000.
[13] L. Gondy, C. Hsinchun, and D. Jesse, "A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text," J. Biomedical Informatics, vol.36, pp. 145-158, August 2004.
[14] G. Claudio, L. Alberto, and Lorenza Romano, "Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature," Proc. 11th Conf. the European Chapter of the Association for Computational Linguistics (EACL 2006), 2006.
[15] C. Friedman, "MedLEE - A Medical Language Extraction and Encoding System," Columbia University, and Queens College of CUNY, 1995.
[16] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, "GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles," J. Bioinformatics, vol. 17, pp. 74-82(9), June 2001.
[17] C. Friedman, "MedScan - A Medical Language Extraction and Encoding System," Columbia University, and Queens College of CUNY, 1995.
[18] A. Rzhetsky, "Geneways: A search engine and information extraction tool for biological research," Columbia Genome Center, 2005.
[19] D. Corney, D. Jones and B. Buxton, "BioRAT System," Columbia Genome Center, http://bioinf.cs.ucl. 2005.
[20] J. Xiao, J. Su, G. Zhou and C. Tan, "Protein-Protein Interaction Extraction: A Supervised Learning Approach," Proc. first Inter. Symp. Semantic mining in Biomedicine (SMBM 2005), pp. 51-59, 2005.
[21] J. Ding, D. Berleant, J. Xu, and A.W. Fulmer, "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser," Proc. 15th IEEE Inter. Conf. Tools with Artificial Intelligence (ICTAI-03), pp. 467- 471, 2003.
[22] Y.C. Lin, C.L. Peng, C.Y. Kao, H.F. Juan,H. C. Huang, "ProtExt: A system for protein-protein interactionextraction from PubMed abstracts" , Proc. 12th Inter. Conf. Intelligent Systems for Molecular Biology (ISMB) and Conf. Computational Biology (ECCB), 2005.
[23] S.T. Ahmed, D. Chidambaram, H. Davulcu, and C. Baral, "IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio- Medical Text," Proc. ACL-ISMB workshop linking biological literature, ontologies and databases: Mining biological semantics, pp. 54-61, 2005.
[24] Z. Yang, H. Lin, and B. Wu, "BioPPIExtractor: A Protein-Protein Interaction Extraction System for PubMed Abstracts," J. Expert Systems with Applications, Article in press, doi: 10.1016 /j.eswa.2007.12.014. 23 Dec. 2007.
[25] "LocusLink - Database of genes," (NCBI), entrez?db=gene. 1988.
[26] "Universal Protein Resource (UniProt)," European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR), 2002.
[27] "ExPASy Proteomics Server," Swiss Institute of Bioinformatics (SIB), 2003.
[28] R. Hoffmann and A. Valencia, "A Gene Network for Navigating the Literature - iHOP," Nature Genetics, 2004.
[29] D. Temperley, D. Sleator, and J. Lafferty, "Link Grammar," Carnegie Mellon University, 1998.
[30] D. Sleator, and D. Temperley, "Parsing English with a Link Grammar," Third International Workshop on Parsing Technologies, pp. 277-292, 1993.
[31] D. Grinberg, J. Lafferty, and D. Sleator, "A Robust Parsing Algorithm for Link Grammars," Proc. second inter. colloquium on grammatical inference and applications, vol. 862, pp. 78-92,1995.
[32] D. Temperley, D. Sleator, and J. Lafferty, "Abiword- word processor for everyone," Carnegie Mellon University, 1998.
[33] D. Brian "Lingua::LinkParser- Perl module implementing the Link Grammar Parser," Carnegie Mellon University, LinkParser 1.08. 2004.
[34] "CPAN - Comprehensive Perl Archive Network," http:// 1995
[35] D. Temperley, D. Sleator, and J. Lafferty, "The parser Application Program Interface (API)," Carnegie Mellon University, 1998.
[36] S. Pyysalo, T. Salakoski, S. Aubin and A. Nazarenko, "Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches," J. BMC Bioinformatics, vol. 7, pp. 60-67, November 2006.
[37] E. Turner, "The LinkGrammar-WN," linkgrammar-wn.2007
[38] "WordNet-a lexical database for the English language," Princeton University, 2006
[39] P. Szolovits, "Adding a Medical Lexicon to an English Parser," Proc. AMIA 2003 Annual Symposium. pp. 639-643 ,2003
[40] "UMLS'-Unified Medical Language System," U.S. National Library of Medicine,
[41] S. Pyysalo, F. Ginter, T. Pahikkala, J. Boberg, J. J┬¿arvinen, and T. Salakoski, "Evaluation of Two Dependency Parsers on Biomedical Corpus Targeted at ProteinÔÇöProtein interactions," J. Inter. Medical Informatics, Vol. 75, Issue 6, pp. 430-442, June 2005.
[42] V. Harsha, Madhyastha, N. Balakrishnan, K.R. Ramakrishnan "Event Information Extraction Using Link Grammar," Inter. Workshop Research Issues in Data Eng.: Multi-lingual Information Management (RIDE'03), pp. 16- 22, 2003.
[43] L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu5, "Accomplishments and Challenges in Literature Data Mining for Biology." J. Bioinformatics, vol. 18, pp. 1553-1561, June 2002.