The biomedical literature represents a rich resource on biomarker information. However, both the size of literature databases and their lack of standardization hamper the automatic exploitation of the information contained in these resources. Text mining approaches have proven to be useful for the exploitation of information contained in the scientific publications. We have developed a knowledge-driven text mining approach that can exploit a large literature database to extract a dataset of biomarkers related to diseases covering all therapeutic areas. Our methodology takes advantage of the annotation of MEDLINE publications pertaining to biomarkers with MeSH terms, narrowing the search to specific publications and, therefore, minimizing the false positive ratio. It is based on a dictionary-based Named Entity Recognition system and a relation extraction module. The application of this methodology resulted in the identification of 131,012 disease-biomarker associations between 2,803 genes and 2,751 diseases, and represents a valuable knowledge base for those interested in disease-related biomarkers.
This work is described in the following article:
À. Bravo, M. Cases, N. Queralt-Rosinach, F. Sanz, L.I. Furlong, "A knowledge-Driven Approach to Extract Disease-Related Biomarkers from the Literature ," BioMed Research International, vol. 2014, Article ID 253128, 11 pages, 2014. doi:10.1155/2014/253128. (Article, for the "Big Data and Network Biology" special issue at BioMed Research International)
To browse the results click on the following links:
Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics (GRIB) IMIM-UPF
Please send questions or comments on Database of disease-related biomarkers to: lfurlong(at)imim(dot)es
The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grants agreements n°  (eTOX) and nº  (Open PHACTS)], resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution. À.B. and L.I.F received support from Instituto de Salud Carlos III Fondo Europeo de Desarollo Regional (CP10/00524). The Research Unit on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).