An Information Retrieval System For Extracting Bacteria Names In Biomedical Documents
An information retrieval system which provides structural and comprehensive classification of species in documents and articles is crucial in bioinformatics studies.
An information retrieval system which provides structural and comprehensive classification of species in documents and articles is crucial in bioinformatics studies. Having this information spread through scientific articles and web pages leads to a need for automatically detecting bacteria entities in text, semantically tagging them using taxonomy, and finally extracting the classification among them. These are the challenges set forth by the Bacteria Biotopes Task of the BioNLP Shared Task 2016. This paper describes a system for bacteria entity normalization through the NCBI taxonomy. The system, which obtained promising results on the shared task data set, utilizes basic information retrieval techniques with a rule based approach.