The rapid adoption of blockchain technology highlighted the importance of ensuring the security of smart contracts due to their critical role in automated business logic execution on blockchain platforms. This paper provides an empirical evaluation of automated vulnerability analysis tools specifically designed for Solidity smart contracts. Leveraging the extensive SmartBugs 2.0 framework, which includes 20 analysis tools, we conducted a comprehensive assessment using an annotated dataset of 2,182 instances, manually labeled at the line level with vulnerability labels. Our evaluation highlights the detection effectiveness of these tools in detecting various types of vulnerabilities, as categorized by the DASP TOP 10 taxonomy. We evaluated the efficacy of a Large Language Model-based detection method on two popular datasets. In this case, we obtained inconsistent results with the two datasets, showing unreliable detection when analyzing real-world smart contracts. Our study identifies significant variations in the accuracy and reliability of different tools and demonstrates the advantages of combining multiple detection methods to improve vulnerability identification. We identified a set of 3 tools that, combined, achieve up to 76.78% found vulnerabilities, taking less than one minute to run, on average. This study contributes to the field by releasing the largest dataset of manually analyzed smart contracts with line-level vulnerability annotations and by conducting the largest empirical evaluation of tools to date.

An empirical analysis of vulnerability detection tools for solidity smart contracts

Scalabrino S.;Oliveto R.;Pareschi R.
2026-01-01

Abstract

The rapid adoption of blockchain technology highlighted the importance of ensuring the security of smart contracts due to their critical role in automated business logic execution on blockchain platforms. This paper provides an empirical evaluation of automated vulnerability analysis tools specifically designed for Solidity smart contracts. Leveraging the extensive SmartBugs 2.0 framework, which includes 20 analysis tools, we conducted a comprehensive assessment using an annotated dataset of 2,182 instances, manually labeled at the line level with vulnerability labels. Our evaluation highlights the detection effectiveness of these tools in detecting various types of vulnerabilities, as categorized by the DASP TOP 10 taxonomy. We evaluated the efficacy of a Large Language Model-based detection method on two popular datasets. In this case, we obtained inconsistent results with the two datasets, showing unreliable detection when analyzing real-world smart contracts. Our study identifies significant variations in the accuracy and reliability of different tools and demonstrates the advantages of combining multiple detection methods to improve vulnerability identification. We identified a set of 3 tools that, combined, achieve up to 76.78% found vulnerabilities, taking less than one minute to run, on average. This study contributes to the field by releasing the largest dataset of manually analyzed smart contracts with line-level vulnerability annotations and by conducting the largest empirical evaluation of tools to date.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/159329
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact