BlobTools2 – BlobToolKit

BlobTools2 is a reimplementation of the original BlobTools, written in Python 3 with a fully modular design to make creating new datasets and adding additional analysis types even easier. Notable differences include the addition of a blobtools add command to allow new data to be added to an existing dataset and the ability to apply any filters from the interactive BlobToolKit Viewer when using the blobtools filter and blobtools view commands. We use BlobTools2 as the name of the package, but the command line executable is still called blobtools and when you run it, this is what you will see:

BlobTools2 - assembly exploration, QC and filtering.

usage: blobtools [<command>] [<args>...] [-h|--help] [--version]

commands:
    add             add data to a BlobDir
    create          create a new BlobDir
    filter          filter a BlobDir
    host            host interactive view of all BlobDirs in a directory
    replace         call blobtools add with --replace flag
    view            generate plots using BlobToolKit Viewer
    -h, --help      show this
    -v, --version   show version number
See 'blobtools <command> --help' for more information on a specific command.

`blobtools create`

The minimum requirement to create a new dataset with BlobTools2 is an assembly FASTA file. The underlying data structure has been updated from a single JSON format BlobDB file to a collection of JSON files in a BlobDir directory. This change makes it trivial to separate dataset creation from the subsequent addition of analyses and presents the data in a format that can be efficiently processed for interactive visualisation with the BlobToolKit Viewer.

`blobtools add`

Additional data can be added to an existing BlobDir by parsing analysis output files into one or more fields using the blobtools add command. This command can also be used to add metadata including links to external resources and full taxonomic information to a dataset. Currently supported analyses outputs include BLAST/Diamond sequence similarity searches, BAM/SAM/CRAM read mappings and BUSCO genome completeness assessments. Parsers are implemented as Python modules that convert the data to one of several generic datatypes (identifier, variable, category, array, array of arrays) so new analyses can be supported by adding an appropriate parser. The blobtools replace command calls blobtools add with a --replace flag to allow fields to be updated.

`blobtools filter`

Datasets can be filtered based on the values in any variable or category field, or using a list of identifiers. Filters may be applied to a complete dataset to allow for use of a reduced dataset without repeating analyses or applied to assembly FASTA and read FASTQ files to allow for reassembly and reanalysis. Filter parameters are all shared between BlobTools2 and the BlobToolKit Viewer, allowing interactive sessions to be reproduced on the command line.

`blobtools host`

Assuming the BlobToolKit Viewer code and dependencies are available, the blobtools host command provides a convenient way to start the Viewer to begin interactive exploration of all BlobDir datasets in a directory.

`blobtools view`

Provides options to generate views by running the BlobToolKit Viewer from the command line.

See the BlobTools2 Tutorials for more information on how to use BlobTools2 on your own datasets or check out our open-source code on GitHub to get started.