Background: Human papillomavirus (HPV) plays a crucial role in the pathogenesis of oropharyngeal squamous cell carcinomas (OPSCC). Accurate HPV status classification is essential for therapeutic stratification. While p16 immunohistochemistry (IHC) is the clinical surrogate marker, it has limited specificity. Methods: In this study, we implemented a weakly supervised deep learning approach using the Clustering-constrained Attention Multiple-Instance Learning (CLAM) framework to directly predict HPV status from hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) of OPSCC. A total of 123 WSIs from two cohorts (The Cancer Genome Atlas (TCGA) cohort and OPSCC cohort from the University of Naples Federico II (OPSCC-UNINA)) were used. Results: Attention heatmaps revealed that the model predominantly focused on tumor-rich regions. Errors were primarily observed in slides with conflicting p16/in situ hybridization (ISH) status or suboptimal quality. Morphological analysis of high-attention patches confirmed that cellular features extracted from correctly classified slides align with HPV status, with a Random Forest classifier achieving 83% accuracy at the cell level. Conclusions: This work supports the feasibility of deep learning-based HPV prediction from routine H&E slides, with potential clinical implications for streamlined, cost-effective diagnostics.

A Weakly Supervised Approach for HPV Status Prediction in Oropharyngeal Carcinoma from H&E-Stained Slides

Merolla, Francesco
;
2025-01-01

Abstract

Background: Human papillomavirus (HPV) plays a crucial role in the pathogenesis of oropharyngeal squamous cell carcinomas (OPSCC). Accurate HPV status classification is essential for therapeutic stratification. While p16 immunohistochemistry (IHC) is the clinical surrogate marker, it has limited specificity. Methods: In this study, we implemented a weakly supervised deep learning approach using the Clustering-constrained Attention Multiple-Instance Learning (CLAM) framework to directly predict HPV status from hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) of OPSCC. A total of 123 WSIs from two cohorts (The Cancer Genome Atlas (TCGA) cohort and OPSCC cohort from the University of Naples Federico II (OPSCC-UNINA)) were used. Results: Attention heatmaps revealed that the model predominantly focused on tumor-rich regions. Errors were primarily observed in slides with conflicting p16/in situ hybridization (ISH) status or suboptimal quality. Morphological analysis of high-attention patches confirmed that cellular features extracted from correctly classified slides align with HPV status, with a Random Forest classifier achieving 83% accuracy at the cell level. Conclusions: This work supports the feasibility of deep learning-based HPV prediction from routine H&E slides, with potential clinical implications for streamlined, cost-effective diagnostics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/154771
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact