BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

Abstract

Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs. Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models. Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1 -Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1 -Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab. Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.

Henning Schäfer
Henning Schäfer
Researcher in the first cohort

My research interests include Deep Learning, Computer Vision, Radiomics, and Explainable AI.

Ahmad Idrissi-Yaghir
Ahmad Idrissi-Yaghir
Researcher in the first cohort

My research interests include Deep Learning, Natural Language Processing, and Information Retrieval.

Kamyar Arzideh
Kamyar Arzideh
Associated Researcher

My research interests include NLP.

Hendrik Damm
Hendrik Damm
Researcher in the second cohort

My research interests include Deep Learning, Natural Language Processing, and Information Retrieval.

Tabea Pakull
Tabea Pakull
Researcher in the second cohort

My research interests include Deep Learning, Natural Language Processing, Lay Summarization and Explainable AI.

Mikel Bahn
Mikel Bahn
Researcher in the second cohort

My research interests include Graph Representation Learning, Deep Learning, Natural Language Processing, and Large Language Models

Georg C. Lodde
Georg C. Lodde
Clinician Scientist

My research interests include Dermatology, Medical Research, and Digitalization.

Elisabeth Livingstone
Elisabeth Livingstone
Principal Investigator

My research interests include Medical Research, Dermatology, and Digitalization.

Dirk Schadendorf
Dirk Schadendorf
Principal Investigator

My research interests include Dermatology, Medical Research, and Digitalization.

Felix Nensa
Felix Nensa
Speaker

My research interests include medical digitalization, computer vision and radiology.

Peter Horn
Peter Horn
Principal Investigator

My research interests include Transfusion Medicine, Immunology, and Bioinformatics.

Previous

Related