Data fusion techniques provide opportunities for com- bining information from multiple domains, such as meta and medical report data with radiology images. This helps to obtain knowledge of enriched quality. The objec- tive of this paper is to fuse automatically generated im- age keywords with radiographs, enabling multi-modal im- age representations for body part and abnormality recog- nition. As manual annotation is often impractical, time- consuming and prone to errors, automatic visual recogni- tion and annotation of radiographs is a fundamental step towards computer-aided interpretation. As the number of digital medical images taken daily rapidly increases, there is a need to create systems capable of appropriately de- tecting and classifying anatomy and abnormality in radiol- ogy images. The Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) Show-and-Tell model is adopted for keyword generation. The presented work fuses multi-modal information by incorporating automatically generated keywords into radiographs via augmentation. This leads to enriched sufficient features, with which deep learning systems are trained. To demonstrate the proposed approach, evaluation is computed on the Musculoskele- tal Radiographs (MURA) using two classification schemes. Prediction accuracy was higher for all tested datasets using the proposed approach with 95.93 % for anatomic regions and 81.5 % for abnormality classification, respectively.