Laryngeal cancer is one of the most common malignant tumors in otolaryngology, and histopathological image analysis is the gold standard for the diagnosis of laryngeal cancer. However, pathologists have high subjectivity in their diagnoses, which makes it easy to miss diagnoses and misdiagnose. In addition, according to a literature search, there is currently no computer-aided diagnosis (CAD) algorithm that has been applied to the classification of histopathological images of laryngeal cancer. Convolutional neural networks (CNNs) are widely used in various other cancer classification tasks. However, the potential global and channel relationships of images may be ignored, which will affect the feature representation ability. Simultaneously, due to the lack of interpretability, the results are often difficult to accept by pathologists. we propose a laryngeal cancer classification network (LPCANet) based on a CNN and attention mechanisms. First, the original histopathological images are sequentially cropped into patches. Then, the patches are input into the basic ResNet50 to extract the local features. Then, a position attention module and a channel attention module are added in parallel to capture the spatial dependency and the channel dependency, respectively. The two modules produce the fusion feature map to enhance the feature representation and improve network classification performance. Moreover, the fusion feature map is extracted and visually analyzed by the grad-weighted class activation map (Grad_CAM) to provide a certain interpretability for the final results. The three-class classification performance of LPCANet is better than those of the five state-of-the-art classifiers (VGG16, ResNet50, InceptionV3, Xception and DenseNet121) on the two original resolutions (534 * 400 and 1067 * 800). On the 534 * 400 data, LPCANet achieved 73.18% accuracy, 74.04% precision, 73.15% recall, 72.9% F1-score, and 0.8826 AUC. On the 1067 * 800 data, LPCANet achieved 83.15% accuracy, 83.5% precision, 83.1% recall, 83.1% F1-score, and 0.9487 AUC. The results show that LPCANet enhances the feature representation by capturing the global and channel relationships and achieves better classification performance. In addition, the visual analysis of Grad_CAM makes the results interpretable, which makes it easier for the results to be accepted by pathologists and allows the method to become a second tool for auxiliary diagnosis. Graphic Abstract: [Figure not available: see fulltext.]

LPCANet: Classification of Laryngeal Cancer Histopathological Images Using a CNN with Position Attention and Channel Attention Mechanisms

Mercaldo F.;Santone A.;
2021-01-01

Abstract

Laryngeal cancer is one of the most common malignant tumors in otolaryngology, and histopathological image analysis is the gold standard for the diagnosis of laryngeal cancer. However, pathologists have high subjectivity in their diagnoses, which makes it easy to miss diagnoses and misdiagnose. In addition, according to a literature search, there is currently no computer-aided diagnosis (CAD) algorithm that has been applied to the classification of histopathological images of laryngeal cancer. Convolutional neural networks (CNNs) are widely used in various other cancer classification tasks. However, the potential global and channel relationships of images may be ignored, which will affect the feature representation ability. Simultaneously, due to the lack of interpretability, the results are often difficult to accept by pathologists. we propose a laryngeal cancer classification network (LPCANet) based on a CNN and attention mechanisms. First, the original histopathological images are sequentially cropped into patches. Then, the patches are input into the basic ResNet50 to extract the local features. Then, a position attention module and a channel attention module are added in parallel to capture the spatial dependency and the channel dependency, respectively. The two modules produce the fusion feature map to enhance the feature representation and improve network classification performance. Moreover, the fusion feature map is extracted and visually analyzed by the grad-weighted class activation map (Grad_CAM) to provide a certain interpretability for the final results. The three-class classification performance of LPCANet is better than those of the five state-of-the-art classifiers (VGG16, ResNet50, InceptionV3, Xception and DenseNet121) on the two original resolutions (534 * 400 and 1067 * 800). On the 534 * 400 data, LPCANet achieved 73.18% accuracy, 74.04% precision, 73.15% recall, 72.9% F1-score, and 0.8826 AUC. On the 1067 * 800 data, LPCANet achieved 83.15% accuracy, 83.5% precision, 83.1% recall, 83.1% F1-score, and 0.9487 AUC. The results show that LPCANet enhances the feature representation by capturing the global and channel relationships and achieves better classification performance. In addition, the visual analysis of Grad_CAM makes the results interpretable, which makes it easier for the results to be accepted by pathologists and allows the method to become a second tool for auxiliary diagnosis. Graphic Abstract: [Figure not available: see fulltext.]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/107209
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 19
social impact