BlobTools2 is a reimplementation of BlobTools, written in Python 3 with a fully modular design to make creating new datasets and adding additional analysis types even easier. Notable differences relative to BlobTools include the addition of a
blobtools add command to allow new data to be added to an existing dataset and the absence of a
blobtools view command as dataset visualisation is handled by the BlobToolKit Viewer.
BlobTools2 - assembly exploration, QC and filtering. usage: blobtools [<command>] [<args>...] [-h|--help] [--version] commands: add add data to a BlobDir create create a new BlobDir filter filter a BlobDir host host interactive view of all BlobDirs in a directory replace call blobtools add with --replace flag -h, --help show this -v, --version show version number See 'blobtools <command> --help' for more information on a specific command.
The minimum requirement to create a new dataset with BlobTools2 is an assembly FASTA file. The underlying data structure has been updated from a single JSON format BlobDB file to a collection of JSON files in a BlobDir directory. This change makes it trivial to separate dataset creation from the subsequent addition of analyses and presents the data in a format that can be efficiently processed for interactive visualisation with the BlobToolKit Viewer.
Additional data can be added to an existing BlobDir by parsing analysis output files into one or more fields using the
blobtools add command. This command can also be used to add metadata including links to external resources and full taxonomic information to a dataset. Currently supported analyses outputs include BLAST/Diamond sequence similarity searches, BAM/SAM/CRAM read mappings and BUSCO genome completeness assessments. Parsers are implemented as Python modules that convert the data to one of several generic datatypes (identifier, variable, category, array, array of arrays) so new analyses can be supported by adding an appropriate parser. The
blobtools replace command calls
blobtools add with a
--replace flag to allow fields to be updated.
Datasets can be filtered based on the values in any variable or category field, or using a list of identifiers. Filters may be applied to a complete dataset to allow for use of a reduced dataset without repeating analyses or applied to assembly FASTA and read FASTQ files to allow for reassembly and reanalysis. Filter parameters are all shared between BlobTools2 and the BlobToolKit Viewer, allowing interactive sessions to be reproduced on the command line.
Assuming the BlobToolKit Viewer code and dependencies are available, the
blobtools host command provides a convenient way to start the Viewer to begin interactive exploration of all BlobDir datasets in a directory.