The background of bioinformatics tools for reads-based taxonomic and functional analysis of metagenomics datasets, as well as for their assembly and management of genomics data, is highly fragmentery. Moreover, large part of available software needs input of millions of reads and rely on approximations in data analysis in order to reduce computing times. Thus, resulting in sub-optimal results quality in term of accuracy, sensibility and specificity when used either for the reconstruction of taxonomic or functional profiles through analysis of reads or analysis of genomes reconstructed by metagenomic assembly. Moreover, the recent introduction of novel technologies for long reads sequencing, such as Nanopore and PacBio, represent a valuable resource that still miss dedicated tools for integrated hybrid analysis alongside short reads data.
In order to overcome these limitations, we have developed a comprehensive bioinformatic platform, METAnnotatorX2, aiming to provide an optimized user-friendly resource that maximizes quality of results as well as allowing easy personalization of the pipeline and straightforward integrated analysis of both short- and long-reads data.
These encompass reads-based analyses for taxonomic profile reconstruction and functional assessment of metagenomes. Moreover, short and long reads can be used as input for unmixed or hybrid assembly and the retrieved metagenomic contigs can be taxonomically classified and processed to obtain species-specific GenBank files with predicted and functionally annotated genes, ready for downstream genomics analyses.
In the framework of this study, we also developed a set of pre-processed databases specific for viruses, prokaryotes as well as fungi and protists that were designed to allow accurate taxonomic assignments of reads and contigs corresponding to complex microbial communities.