IRIS Catalogo Istituzionale della Ricerca dell'Università degli Studi del Molise

Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks to the excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these models is to first pre-train them on a generic dataset using a self-supervised task (filling masked words in sentences). Then, these models are fine-tuned to support specific tasks of interest (language translation). A single model can be fine-tuned to support multiple tasks, possibly exploiting the benefits of . This means that knowledge acquired to solve a specific task (language translation) can be useful to boost performance on another task (sentiment classification). While the benefits of transfer learning have been widely studied in NLP, limited empirical evidence is available when it comes to code-related tasks. In this paper, we assess the performance of the Text-To-Text Transfer Transformer (T5) model in supporting four different code-related tasks: (i) automatic bug-fixing, (ii) injection of code mutants, (iii) generation of assert statements, and (iv) code summarization. We pay particular attention in studying the role played by pre-training and multi-task fine-tuning on the model’s performance. We show that (i) the T5 can achieve better performance as compared to state-of-the-art baselines; and (ii) while pre-training helps the model, not all tasks benefit from a multi-task fine-tuning. IEEE

Using Transfer Learning for Code-Related Tasks

Mastropaolo, Antonio;Cooper, Nathan;Palacio, David Nader;Scalabrino, Simone;Poshyvanyk, Denys;Oliveto, Rocco;Bavota, Gabriele

2022-01-01

Abstract

Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks to the excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these models is to first pre-train them on a generic dataset using a self-supervised task (filling masked words in sentences). Then, these models are fine-tuned to support specific tasks of interest (language translation). A single model can be fine-tuned to support multiple tasks, possibly exploiting the benefits of . This means that knowledge acquired to solve a specific task (language translation) can be useful to boost performance on another task (sentiment classification). While the benefits of transfer learning have been widely studied in NLP, limited empirical evidence is available when it comes to code-related tasks. In this paper, we assess the performance of the Text-To-Text Transfer Transformer (T5) model in supporting four different code-related tasks: (i) automatic bug-fixing, (ii) injection of code mutants, (iii) generation of assert statements, and (iv) code summarization. We pay particular attention in studying the role played by pre-training and multi-task fine-tuning on the model’s performance. We show that (i) the T5 can achieve better performance as compared to state-of-the-art baselines; and (ii) while pre-training helps the model, not all tasks benefit from a multi-task fine-tuning. IEEE

Scheda breve

Scheda completa

Scheda completa (DC)

	Codice DOI
	
				https://dx.doi.org/10.1109/TSE.2022.3183297
			
	Codice Scopus
	
				2-s2.0-85141383902
			
	URL
	
				https://ieeexplore.ieee.org/abstract/document/9797060
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/117111

Citazioni

ND

24

ND

social impact