Late fusion of deep learning and handcrafted visual features for biomedical image modality classification

Much of medical knowledge is stored in the biomedical literature, collected in archives like PubMed Central that continue to grow rapidly. A significant part of this knowledge is contained in images with limited metadata available which makes it difficult to explore the visual knowledge in the biomedical literature. Thus extraction of metadata from visual content is important. One important piece of metadata is the type of the image, which could be one of the various medical imaging modalities such as X-ray, computed tomography or magnetic resonance images and also of general graphs that are frequent in the literature. This study explores a late, score-based fusion of several deep convolutionl neural networks with a traditional hand-crafted bag of visual words classifier to classify images from the biomedical literature into image types or modalities. It achieved a classification accuracy of 85.51% on the ImageCLEF 2013 modality classification task, which is better than the best visual methods in the challenge that the data were produced for, and similar compared to mixed methods that make use of both visual and textual information It achieved similarly good results of 84.23 and 87.04% classification accuracy before and after augmentation, respectively, on the related ImageCLEF 2016 subfigure classification task.