Patent retrieval is fundamental for research and development in both the industrial and academic fields. This activity is performed for several reasons; for example, for understanding advances in technology within a specific field, or prior to filing a patent application, or even to detect if there are any intellectual property violations. However, the huge quantity of patents, consistently increasing year by year, together with complexities characterizing patent documents, pose serious challenges to the patent system and its stakeholders. Classical research is time-consuming and it is becoming quickly inefficient; automated techniques are strongly needed. This work aims to be a comprehensive review of existing techniques, from query expansion to deep learning approaches, highlighting advantages and limitations. Furthermore, a specific focus is then reserved for the Life Sciences domain, where custom needs and issues, consisting mainly of nomenclature ambiguities, polysemy of terms, and high dependency on visual representations (e.g., chemical structures), need to be addressed. The final outcome shows that current automated methods, especially in the life sciences domain, suffer from suboptimal recall and fragmentation of the information across databases, highlighting the necessity for multimodal methods integrating multiple kinds of data from different sources.

A survey on automated and AI-based tools for patent retrieval with a special focus on the life sciences domain

Poce S.;Cerro G.
2026-01-01

Abstract

Patent retrieval is fundamental for research and development in both the industrial and academic fields. This activity is performed for several reasons; for example, for understanding advances in technology within a specific field, or prior to filing a patent application, or even to detect if there are any intellectual property violations. However, the huge quantity of patents, consistently increasing year by year, together with complexities characterizing patent documents, pose serious challenges to the patent system and its stakeholders. Classical research is time-consuming and it is becoming quickly inefficient; automated techniques are strongly needed. This work aims to be a comprehensive review of existing techniques, from query expansion to deep learning approaches, highlighting advantages and limitations. Furthermore, a specific focus is then reserved for the Life Sciences domain, where custom needs and issues, consisting mainly of nomenclature ambiguities, polysemy of terms, and high dependency on visual representations (e.g., chemical structures), need to be addressed. The final outcome shows that current automated methods, especially in the life sciences domain, suffer from suboptimal recall and fragmentation of the information across databases, highlighting the necessity for multimodal methods integrating multiple kinds of data from different sources.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/156812
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact