(Accepted for IEEE QPAIN 2025 and possible inclusion in IEEE Xplore Digital Library, and indexed by Scopus and other indexing services.)
Abstract—The study of brain tumors creates a paradox in neuro-oncology, with various morphologies and complex growth patterns that further complicate diagnosis and treatment attempts in brain tumor pathology. With this background, this study develops a new hybrid model of CNN and Transformer architecture for improving the classification accuracy of brain tumors from MRI scans. The ResNet50 convolutional neural network (CNN) was employed for efficient feature extraction, which is in turn able to extract relevant local visual cues from MRI data. While many of the classical CNN mechanisms do well
in local feature extraction, they are limited in their ability to model global contextual information continent. These limitations were overcome by the introduction of a Transformer encoder that ingests these features to recognize complex dependencies across the image. The training and validation of the model were done with an extensive set of several thousands of MRI images classified into four categories: gliomas, meningiomas, pituitary tumors, and non-tumorous images. The results of our study were promising, the hybrid CNN-transformer model performed better than available CNN models in terms of accuracy and was also more generalizable to unseen data. The combination of Grad-CAM heatmaps supplies a further layer of explainability, as well as visual insights into the model’s decision-making process in a clinically relevant manner. This research illustrates the capabilities of hybrid deep learning architecture in medical imaging and provides an essential bridge toward rapid and accurate diagnosis of brain tumors for timely and individualized therapeutic intervention.