BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

Henning Schäfer, Ahmad Idrissi-Yaghir, Kamyar Arzideh, Hendrik Damm, Tabea Pakull, Cynthia Schmidt, Mikel Bahn, Georg C. Lodde, Elisabeth Livingstone, Dirk Schadendorf, Felix Nensa, Peter Horn, Christoph M. Friedrich

October 2024

Abstract

Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs. Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models. Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1 -Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1 -Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab. Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.

Type

Journal article

BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

Abstract

Henning Schäfer

Researcher in the first cohort

Ahmad Idrissi-Yaghir

Researcher in the first cohort

Kamyar Arzideh

Associated Researcher

Hendrik Damm

Researcher in the second cohort

Tabea Pakull

Researcher in the second cohort

Mikel Bahn

Researcher in the second cohort

Georg C. Lodde

Principal Investigator

Elisabeth Livingstone

Principal Investigator

Dirk Schadendorf

Principal Investigator

Felix Nensa

Speaker

Peter Horn

Principal Investigator

Christoph M. Friedrich

Co-Speaker

Related