Once you have a BlobDir dataset (see Creating a dataset), further data can be added by parsing analysis output files into one or more fields using the
blobtools add command. Dedicated parsers are available for a range of analysis types, assigning values to each contig in an assembly: BLAST or Diamond hits provide taxonomic assignments; read mapping files provide base and read coverage; BUSCO results show completeness metrics for full and filtered assemblies; and fields from generic text files can be imported for maximum flexibility.
blobtools add options can be passed directly to
blobtools create to allow dataset creation and analysis import in a single step.
The BlobTools approach uses BLAST hits to provide taxonomic annotation for each sequence in an assembly. When run using the BlobToolKit Pipeline, a wrapper script is used to break long sequences into chunks to obtain a distribution of BLAST hits and closely related taxa can be automatically filtered out. This is filtering step is particularly important for publicly available assemblies as the BLAST databases may already contain the query sequence. Read more…
--hits, the final data required to generate a standard blob plot is coverage information from mapping sequencing reads back to the assembly. BlobTools2 parses sorted BAM/SAM/CRAM format files to calculate coverage information using the PySAM library. Read more…
Adding text files
Files can be imported from generic text files, with the ability to specify column separators and map columns to field names. This provides flexibility to view and filter a wide range of additional analysis outputs beyond those with dedicated parsers. Read more…