The great number of online scientific publications on cancer research makes large scale data mining possible. The hallmarks or characteristics of cancer can be used to distinguish cancerous cells from normal cells. Therefore, it is extremely necessary to organize and categorize a sea of scientific articles into the corresponding hallmarks by predicting whether or not they contain the information of interest. In the past, many research works tended to employ traditional machine learning methods that characterize feature engineering. Deep learning-based methods have achieved state-of-the-art performance in a wide range of Natural Language Processing (NLP) tasks. However, there is only a limited number of work with a focus on deep learning techniques for the task of cancer hallmark text classification. To advance this task, a novel neural architecture DEep Contextualized Attentional Bidirectional LSTM (DECAB-LSTM) was proposed, capable of learning to attend to the valuable information in a sentence by introducing contextual attention mechanism. We also investigated the effect of a good word embedding for the cancer hallmark text classification. We trained our model on a benchmark dataset and reported the accuracy, f score, and AUC metrics. Compared to several baselines like Logistic regression, Support Vector Machines, Convolutional Neural Networks, fastText, etc., the proposed model have achieved state-of-the-art performance over baselines, demonstrating its great potential in the empirical application to cancer research.

DECAB-LSTM: Deep Contextualized Attentional Bidirectional LSTM for cancer hallmark classification

Mercaldo F.;Santone A.
2020-01-01

Abstract

The great number of online scientific publications on cancer research makes large scale data mining possible. The hallmarks or characteristics of cancer can be used to distinguish cancerous cells from normal cells. Therefore, it is extremely necessary to organize and categorize a sea of scientific articles into the corresponding hallmarks by predicting whether or not they contain the information of interest. In the past, many research works tended to employ traditional machine learning methods that characterize feature engineering. Deep learning-based methods have achieved state-of-the-art performance in a wide range of Natural Language Processing (NLP) tasks. However, there is only a limited number of work with a focus on deep learning techniques for the task of cancer hallmark text classification. To advance this task, a novel neural architecture DEep Contextualized Attentional Bidirectional LSTM (DECAB-LSTM) was proposed, capable of learning to attend to the valuable information in a sentence by introducing contextual attention mechanism. We also investigated the effect of a good word embedding for the cancer hallmark text classification. We trained our model on a benchmark dataset and reported the accuracy, f score, and AUC metrics. Compared to several baselines like Logistic regression, Support Vector Machines, Convolutional Neural Networks, fastText, etc., the proposed model have achieved state-of-the-art performance over baselines, demonstrating its great potential in the empirical application to cancer research.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/95686
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? ND
social impact