Conservation biology recognizes the importance of genetics in understanding the evolutionary context of endangered species and in developing effective management and preservation strategies. Characterizing the genetic diversity of threatened species is crucial for reconstructing their evolutionary history, examining current status, and predicting future prospects to safeguard their genetic and cultural heritage from further loss. Recent advances in genotyping technologies have brought unprecedented opportunities to implement effective conservation programs. However, these technologies also pose new challenges, which require novel methodologies and computational assets. One of the primary challenges in computational analysis is data management as the sheer volume of data requires robust computational infrastructure, including high-performance computing systems and efficient data storage solutions. Another challenge is data analysis as genomic data is highly complex and traditional statistical methods may not be sufficient to properly analyse it. Specialized computational tools and algorithms have been developed to extract and interpret biological information hidden within large and complex genomics datasets. Among the others, Machine Learning algorithms have emerged as promising methods, notably due to their ability to analyse large and complex datasets and identify patterns that traditional statistical methods may not detect. The aim of the thesis is to explore the genetic diversity of two ecologically and economically important species: brown trout inhabiting two main rivers in the Molise region in Italy, and two local subspecies of honey bee in Central-North Algeria. Both species are at risk due to the introgression of alien species, which might lead to the creation of hybrids, potentially threatening the genetic makeup of the native population. Using genotyping arrays and sequencing data, SNP variants were detected, and population structure was inferred. Genetic analyses of brown trout detected hybridization with alien specimens, while Algerian honeybees did not show any hybridization with European honeybee subspecies. In this last case, although two subspecies were expected to be present in Algeria, it was not possible to genetically differentiate between them. Moreover, in both cases, a Machine Learning approach, in combination with other statistical methods, was employed to identify reduced panels of SNPs capable of distinguishing between different subspecies, even when hybridized. These panels can be used as an essential input for conservation purposes. The results of this study provide valuable information on the genetic diversity of these species, which can be used to develop effective conservation strategies and policies. Additionally, this study highlights the potential of bioinformatics in advancing genomic research and its applications in the conservation field. A better understanding of the genetic diversity of endangered species can lead to improved protection and management for future generations.

Computational genomics and a machine learning approach: new perspectives in brown trout and honey bee biodiversity

SALVATORE, GIOVANNA
2023-10-13

Abstract

Conservation biology recognizes the importance of genetics in understanding the evolutionary context of endangered species and in developing effective management and preservation strategies. Characterizing the genetic diversity of threatened species is crucial for reconstructing their evolutionary history, examining current status, and predicting future prospects to safeguard their genetic and cultural heritage from further loss. Recent advances in genotyping technologies have brought unprecedented opportunities to implement effective conservation programs. However, these technologies also pose new challenges, which require novel methodologies and computational assets. One of the primary challenges in computational analysis is data management as the sheer volume of data requires robust computational infrastructure, including high-performance computing systems and efficient data storage solutions. Another challenge is data analysis as genomic data is highly complex and traditional statistical methods may not be sufficient to properly analyse it. Specialized computational tools and algorithms have been developed to extract and interpret biological information hidden within large and complex genomics datasets. Among the others, Machine Learning algorithms have emerged as promising methods, notably due to their ability to analyse large and complex datasets and identify patterns that traditional statistical methods may not detect. The aim of the thesis is to explore the genetic diversity of two ecologically and economically important species: brown trout inhabiting two main rivers in the Molise region in Italy, and two local subspecies of honey bee in Central-North Algeria. Both species are at risk due to the introgression of alien species, which might lead to the creation of hybrids, potentially threatening the genetic makeup of the native population. Using genotyping arrays and sequencing data, SNP variants were detected, and population structure was inferred. Genetic analyses of brown trout detected hybridization with alien specimens, while Algerian honeybees did not show any hybridization with European honeybee subspecies. In this last case, although two subspecies were expected to be present in Algeria, it was not possible to genetically differentiate between them. Moreover, in both cases, a Machine Learning approach, in combination with other statistical methods, was employed to identify reduced panels of SNPs capable of distinguishing between different subspecies, even when hybridized. These panels can be used as an essential input for conservation purposes. The results of this study provide valuable information on the genetic diversity of these species, which can be used to develop effective conservation strategies and policies. Additionally, this study highlights the potential of bioinformatics in advancing genomic research and its applications in the conservation field. A better understanding of the genetic diversity of endangered species can lead to improved protection and management for future generations.
13-ott-2023
Genetic diversity; Machine learning; Biodiversity preservation; Mediterranean trout; Apis mellifera
File in questo prodotto:
File Dimensione Formato  
Tesi_G_Salvatore.pdf

embargo fino al 13/04/2025

Descrizione: Tesi di Dottorato
Dimensione 3.72 MB
Formato Adobe PDF
3.72 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/127390
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact