Informatics and Applications
2016, Volume 10, Issue 1, pp 119-128
BioNLP ONTOLOGY EXTRACTION FROM A RESTRICTED LANGUAGE CORPUS WITH CONTEXT-FREE GRAMMARS
Abstract
BioNLP is an emerging area of NLP that brings new challenging objects for language processing and new valuable resources for bioinformatics and medicine. One notable task in BioNLP is creating de-novo ontologies.
This is generally a tedious process; however, in some cases, it is possible to automate it to some extent. One such case is when a corpus of texts in a restricted subset of natural language is available. This paper presents a simple approach to automate ontology creation in such cases. The approach is aimed to simplify mapping of entities in natural texts to predefined ontologies wherever possible. The paper discusses which properties of the corpus enable the approach presented.
[+] References (18)
- Dogan, R. I., R. Leaman, and Zh. Lu. 2014. Ncbi disease corpus: A resource for disease name and concept normalization. J. Biomed. Inform. 47:1-10. doi: 10.1016/j .jbi .2013.12.006.
- Li, Ch., R. Song, M. Liakata, A. Vlachos, S. Seneff, and X. Zhang. 2015. Using word embedding for bio-event ex-traction. 2015 Workshop on Biomedical Natural Language
Processing (BioNLP 2015) Proceedings. Beijing, China: ACL. 121-126.
- Kim, J., N. Nguyen, Yu. Wang, J. Tsujii, T Takagi, and
A. Yonezawa. 2012. The genia event and protein coreference tasks of the BioNLP shared task 2011. BMC Bioinformatics 13(Suppl. 11:S1). doi: 10.1186/1471-2105-13- S11-S1.
- Nedellec, C., R. Bossy, Ji. Kim, Ju. Kim, To. Ohta,
S. Pyysalo, and P. Zweigenbaum. 2013. Overview of
BioNLP shared task 2013. BioNLP Shared Task 2013 Workshop (BioNLP-ST2013) Proceedings. ACL. 1-7.
- Tanabe, M., and M. Kanehisa. 2012. Unit 1-12 using the KEGG database resource. Current protocols in bioin-formatics. John Wiley & Sons, Inc. 1.12.1-1.12.43. doi: 10.1002/0471250953.bi0112s38.
- The UniProt Consortium. 2015. Uniprot: A hub for protein information. Nucleic Acids Res. 43:D204-D212. doi:10.1093/nar/gku989.
- Tonkon, M.J., R. R. Miller, A. N. DeMaria, L.A. Vis- mara, E. A. Amsterdam, and D. T. Mason. 1977. Multi-factor evaluation of the determinants of ischemic elec-trocardiographic response to maximal treadmill testing in coronary disease. Am. J. Med. 62(3):339-346. doi: 10.1016/0002-9343(77)90830-0.
- Giaretta, P., andN. Guarino. 1995. Ontologies and knowl-edge bases towards a terminological clarification. Towards very large knowledge bases. Amsterdam: IOS Press. 25-32.
- Jones, D., T. Bench-Capon, and P. Visser. 1998. Method-ologies for ontology development. IT&KNOWS Conference, XVIFIP World Computer Congress Proceedings. Budapest. 62-75.
- Reed, S. L., and D. B. Lenat. 2002. Mapping ontologies into Cyc. AAAI 2002 Conference Workshop on Ontologies For The Semantic Web. 1-6.
- Chomsky, N. 1969. Aspects ofthe theory of Syntax. MIT Press. 261 p.
- Criteria used to assign the pe level of entries. Available at: http://www.uniprot. org/docs/pe_criteria (accessed January 21, 2016).
- Controlled vocabulary. Available at: http://www.uniprot. org/help/controlled_vocabulary (accessed January 21, 2016).
- UniProt Consortium. Uniprot manual curation sop. Available at: http://www.uniprot.org/docs/sop_manual_ curation.pdf (accessed January 21, 2016).
- Beale, A. 1999-2015. Spell checker oriented word lists. Available at: http://wordlist.aspell.net/12dicts-readme (accessed January 21, 2016).
- Van Rossum, G. 2007. Python programming language. USENIXAnnual Technical Conference.
- Bird, S., E. Klein, and E. Loper. 2009. Natural language processing with Python. O'Reilly Media. 512 p.
- Horridge, M., and P. F Patel-Schneider. 2009. OWL 2 Web Ontology Language Manchester Syntax. 2nd ed. W3C Working Group Note. Available at: http://www.w3.org/TR/owl2-manchester-syntax (accessed January 21, 2016).
[+] About this article
Title
BioNLP ONTOLOGY EXTRACTION FROM A RESTRICTED LANGUAGE CORPUS WITH CONTEXT-FREE GRAMMARS
Journal
Informatics and Applications
2016, Volume 10, Issue 1, pp 119-128
Cover Date
2016-01-30
DOI
10.14357/19922264160111
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
BioNLP; ontology creation; context-free grammar
Authors
D. A. Alexeyevsky
Author Affiliations
National Research University Higher School of Economics; 20 Myasnitskaya Str., Moscow 101000, Russian Federation
|