Adding coverage

After loading --hits, the final data required to generate a standard blob plot is coverage information from mapping sequencing reads back to the assembly. BlobTools2 parses sorted BAM/SAM/CRAM format files to calculate coverage information using the PySAM library.

Example

Due to the size of read alignment files, BAM files for analysed datasets on the public Viewer instance are not retained. To add coverage information for Strongyloides venezuelensis without recomputing the alignment, it is possible to import coverage from a BlobDir dataset. While this is not the typical usage, the command remains the same as the filetype (in this case JSON) is inferred from the file extension.

Fetch and extract coverage data for Strongyloides venezuelensis from the BlobToolKit downloads site:

curl -L https://blobtoolkit.genomehubs.org/download/ALIAS/S_v/S_venezuelensis_HH1/S_venezuelensis_HH1.blobdir.tar.gz | tar -xJf - S_venezuelensis_HH1/DRR008460_cov.json

This file can be imported to add coverage information to the assembly contigs:

~/blobtoolkit/blobtools2/blobtools add \
--cov ~/BTK_TUTORIAL/FILES/S_venezuelensis_HH1/DRR008460_cov.json \
~/BTK_TUTORIAL/DATASETS/ASSEMBLY_NAME

Configuration

There are 3 configuration options when adding coverage using blobtools add:

  • --cov – BAM/SAM/CRAM read alignment file to import. [Required]
  • --threads – Number of threads to use when parsing read alignment files. [Default: 1]
  • --replace – Replace existing fields if present. [Default: false]

--cov

In the more typical usage, one or more BAM, SAM or CRAM format read mapping files can be specified using the --cov flag. The BlobToolKit Pipeline uses minimap2 to generate these files, with commands similar to:

minimap2 -ax sr \
         -t 16 ASSEMBLY_NAME.fasta \
         DRR008460_1.fastq.gz DRR008460_2.fastq.gz \
| samtools sort -@16 -O BAM -o ASSEMBLY_NAME.DRR008460.bam -

For PacBio or Oxford NanoPore reads, substitute -ax srfor -ax map-pbor -ax map-ont, respectively.

When importing, the file basename is used to determine the resulting field names so ASSEMBLY_NAME.DRR008460.bam would generate a base coverage field named ASSEMBLY_NAME.DRR008460_cov and a read coverage field named ASSEMBLY_NAME.DRR008460_read_cov. To set a different name, use the syntax --cov FILE_NAME.bam=FIELD_NAME to add FIELD_NAME_cov and FIELD_NAME_read_cov fields.

Usage:

blobtools add \
    --cov ASSEMBLY_NAME.DRR008460.bam \
    --cov ASSEMBLY_NAME.DRR008461.bam=DRR008461 \
    ...

--threads

BlobTools2 can use multiple threads when parsing read coverage files. To use more than one thread, set --threads to an appropriate value.

Usage:

blobtools add \
    ...
    --threads 8 \ 
    ...

--replace

If a blobtools add command would overwrite an existing field, the default behaviour is to issue a warning and not replace the existing field. To change this behaviour and allow existing fields to be overwritten, set the --replace flag.

Usage:

blobtools add \
    ...
    --replace \ 
    ...