The European GLOMICAVE project is to develop a new digital platform which can process large-scale omics datasets using Big Data and Artificial Intelligence by leveraging pre-existing data to enhance our understanding of biological systems as a whole.
The project, which has funding coming to €6,372,583, addresses the need to build systems which can relate genotypes, i.e. the genetic content of an organism, with phenotypes, which are the visible characteristics of the organism resulting from the interaction between its genotype and the environment, by integrating experimental omics datasets with data available in public repositories and scientific literature.
The project “will provide scientific and industrial experts and non-experts alike with a tool which will help them to pinpoint and understand new links between animal, plant and environmental genotypes and phenotypes,” says GLOMICAVE project coordinator Biotza Gutierrez, public programme manager at the Eurecat technology centre.
“Multi-omics integration enriched with automated extraction and interpretation of knowledge in scientific literature will enhance our understanding of biological systems and enable more accurate genotypic-phenotypic associations while streamlining the experimental design of new studies,” argues Núria Canela, the director of Eurecat’s Omics Sciences Unit.
In GLOMICAVE, “the scientific literature, made by humans for humans, will be transformed into a computational knowledge base, i.e. a knowledge graph,” she adds. “This will allow users to query operations to find unknown genotype-phenotype links not explicitly described in the literature and also to integrate this information with new experimental datasets in scientific studies.”
To this end, data mining strategies will be used to compile and extract information from the scientific literature coupled with Natural Language Processing (NLP) to interpret the information and integrate it into the knowledge base. State-of-the-art information extraction methods for biomedical text will be adapted and extended to create and populate existing knowledge graphs representing biological entities and concepts. The focus will be on methods which deliver interpretable and verifiable extractions.
The platform driven by the GLOMICAVE project will be validated in livestock, agro-biotechnology and environmental sectors. It will address specific challenges in six business cases related to animal breeding technology, meat quality, fruit growth and quality, plant growth, phosphorus removal and recovery, and bioenergy production in the urban water cycle.
The GLOMICAVE project is funded by the European Union’s Horizon 2020 programme and has a consortium made up of 15 partners from Spain, France, Germany, Portugal, Belgium and Denmark. They consist of five technology and research centres in Eurecat, the project coordinator, SERIDA, INRAE, ASINCAR and Forschungszentrum Julich; three universities in Aalborg University, the University of Minho and Katholieke Universiteit Leuven; three SMEs in TREE Technology, Allice and AkiNaO; two large corporations in NEC Laboratories Europe and Aguas do Norte; the ASEAVA animal cluster, and UNE as a standardisation body.