Computational and Structural Biotechnology Journal (April 2022)
by João C. Sequeira; Miguel Rocha; M. Madalena Alves; Andreia F. Salvador (CEB – Centre of Biological Engineering, University of Minho; LABBELS – Associate Laboratory)
Omics and meta-omics technologies are powerful approaches to explore microorganisms’ functions, but the sheer size and complexity of omics datasets often turn the analysis into a challenging task. Software developed for omics and meta-omics analyses, together with knowledgebases encompassing information on genes, proteins, taxonomic and functional annotation, among other types of information, are valuable resources for analyzing omics data. Although several bioinformatics resources are available for meta-omics analyses, many require significant computational expertise. Web interfaces are more user-friendly, but often struggle to handle large data files, such as those obtained in metagenomics, metatranscriptomics, or metaproteomics experiments.
In this work, we present three novel bioinformatics tools, which are available through user-friendly command-line interfaces, can be run sequentially or stand-alone, and combine popular resources for functional annotation. UPIMAPI performs sequence homology-based annotation and obtains data from UniProtKB (e.g., protein names, EC numbers, Gene Ontology, Taxonomy, cross-references to external databases). reCOGnizer performs multithreaded domain homology-based annotation of protein sequences with several functional databases (i.e., CDD, NCBIfam, Pfam, Protein Clusters, SMART, TIGRFAM, COG and KOG) and in addition, obtains information on domain names and descriptions and EC numbers. KEGGCharter represents omics results, including differential gene expression, in KEGG metabolic pathways. In addition, it shows the taxonomic assignment of the enzymes represented, which is particularly useful in metagenomics studies in which several microorganisms are present.
reCOGnizer, UPIMAPI and KEGGCharter together provide a comprehensive and complete functional characterization of large datasets, facilitating the interpretation of microbial activities in nature and in biotechnological processes.