Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language

Abstract

In this work, cross-linguistic span prediction based on contextualized word embedding models is used together with neural machine translation (NMT) to transfer and apply the state-of-the-art models in natural language processing (NLP) to a low-resource language clinical corpus. Two directions are evaluated: (a) English models can be applied to translated texts to subsequently transfer the predicted annotations to the source language and (b) existing high-quality annotations can be transferred beyond translation and then used to train NLP models in the target language. Effectiveness and loss of transmission is evaluated using the German Berlin-Tübingen-Oncology Corpus (BRONCO) dataset with transferred external data from NCBI disease, SemEval-2013 drug-drug interaction (DDI) and i2b2/VA 2010 data. The use of English models for translated clinical texts has always involved attempts to take full advantage of the benefits associated with them (large pre-trained biomedical word embeddings). To improve advances in this area, we provide a general-purpose pipeline to transfer any annotated BRAT or CoNLL format to various target languages. For the entity class medication, good results were obtained with 0.806 F1-score after re-alignment. Limited success occurred in the diagnosis and treatment class with results just below 0.5 F1-score due to differences in annotation guidelines.

Publication
Proceedings of the 4th Clinical Natural Language Processing Workshop
Henning Schäfer
Henning Schäfer
Researcher in the first cohort

My research interests include Deep Learning, Computer Vision, Radiomics, and Explainable AI.

Ahmad Idrissi-Yaghir
Ahmad Idrissi-Yaghir
Researcher in the first cohort

My research interests include Deep Learning, Natural Language Processing, and Information Retrieval.

Peter Horn
Peter Horn
Principal Investigator

My research interests include Transfusion Medicine, Immunology, and Bioinformatics.

Christoph M. Friedrich
Christoph M. Friedrich
Co-Speaker

My research interests include Deep Learning, Computer Vision, Radiomics, and Explainable AI.

Next
Previous