BlobTools2

BlobTools2 is a reimplementation of BlobTools, written in Python 3 with a fully modular design to make creating new datasets and adding additional analysis types even easier. Notable differences relative to BlobTools include the addition of a blobtools add command to allow new data to be added to an existing dataset and the absence of a blobtools view command as dataset visualisation is handled by the BlobToolKit Viewer.

BlobTools2 - assembly exploration, QC and filtering.

usage: blobtools [<command>] [<args>...] [-h|--help] [--version]

commands:
    add             add data to a BlobDir
    create          create a new BlobDir
    filter          filter a BlobDir
    host            host interactive view of all BlobDirs in a directory
    replace         call blobtools add with --replace flag
    -h, --help      show this
    -v, --version   show version number
See 'blobtools <command> --help' for more information on a specific command.

blobtools create

The minimum requirement to create a new dataset with BlobTools2 is an assembly FASTA file. The underlying data structure has been updated from a single JSON format BlobDB file to a collection of JSON files in a BlobDir directory. This change makes it trivial to separate dataset creation from the subsequent addition of analyses and presents the data in a format that can be efficiently processed for interactive visualisation with the BlobToolKit Viewer.

blobtools add

Additional data can be added to an existing BlobDir by parsing analysis output files into one or more fields using the blobtools add command. This command can also be used to add metadata including links to external resources and full taxonomic information to a dataset. Currently supported analyses outputs include BLAST/Diamond sequence similarity searches, BAM/SAM/CRAM read mappings and BUSCO genome completeness assessments. Parsers are implemented as Python modules that convert the data to one of several generic datatypes (identifier, variable, category, array, array of arrays) so new analyses can be supported by adding an appropriate parser. The blobtools replace command calls blobtools add with a --replace flag to allow fields to be updated.

blobtools filter

Datasets can be filtered based on the values in any variable or category field, or using a list of identifiers. Filters may be applied to a complete dataset to allow for use of a reduced dataset without repeating analyses or applied to assembly FASTA and read FASTQ files to allow for reassembly and reanalysis. Filter parameters are all shared between BlobTools2 and the BlobToolKit Viewer, allowing interactive sessions to be reproduced on the command line.

blobtools host

Assuming the BlobToolKit Viewer code and dependencies are available, the blobtools host command provides a convenient way to start the Viewer to begin interactive exploration of all BlobDir datasets in a directory.

See the BlobTools2 Tutorials for more information on how to use BlobTools2 on your own datasets or check out our open-source code on GitHub to get started.