Install

All BlobToolKit components (Viewer, BlobTools2, Pipeline and Specification) can be run in Docker containers via the BlobToolKit Docker image. Alternatively they can be installed manually using the instructions below.

Dependencies

To install BlobToolKit without using Docker, you will need Conda, Firefox and X11.

  • Firefox – the Firefox browser is required to be able to open plots from the command line.
  • Conda – allows specific software versions to be set up in a discrete environment.
  • X11 – an X window system is required to be able to open plots from the command line (available by default on Linux Desktop).

Install Firefox and X11

On Linux Desktop or Mac OS X, Firefox can be installed using the links at www.mozilla.org/en-GB/firefox/new/.

On Linux Server, Firefox and the required X11 components can be installed with the command:

sudo apt update && sudo apt-get -y install firefox xvfb

For Max OS X only, see www.xquartz.org to install X11 via XQuartz.

Install Conda

Conda can be installed on the command line using the Miniconda installer:

curl https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh > Miniconda3.sh;
chmod +x Miniconda3.sh;
./Miniconda3.sh;
# You must open a new terminal window before using conda as the commands will not be available in the current session

Use Conda to install remaining dependencies

conda create -n btk_env -c conda-forge -y python=3.6 docopt pyyaml ujson tqdm nodejs=10 yq;
conda activate btk_env;
conda install -c bioconda -y pysam seqtk;
conda install -c conda-forge -y geckodriver selenium pyvirtualdisplay;
pip install fastjsonschema;

Fetch BlobToolKit code

mkdir -p ~/blobtoolkit;
cd ~/blobtoolkit;
git clone https://github.com/blobtoolkit/blobtools2;
git clone https://github.com/blobtoolkit/viewer;
git clone https://github.com/blobtoolkit/specification;
git clone https://github.com/blobtoolkit/insdc-pipeline;

Install Viewer packages

cd viewer;
npm install;
cd ..;

Databases

A local copy of the NCBI taxdump is required for any features that use taxonomy information. Typical usage also requires copies of the NCBI nucleotide (nt) and UniProt databases. These can all be fetched automatically when running the BlobToolKit Pipeline, alternatively use the commands below to fetch copies for standalone use.

Fetch the NCBI Taxdump

mkdir -p taxdump;
cd taxdump;
curl -L ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz | tar xzf -;
cd ..;

Fetch the nt database

mkdir -p nt_v5
wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/nt_v5.??.tar.gz" -P nt_v5/ && \
        for file in nt_v5/*.tar.gz; \
            do tar xf $file -C nt_v5 && rm $file; \
        done


Fetch and format the UniProt database

mkdir -p uniprot
wget -q -O uniprot/reference_proteomes.tar.gz \
 ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/$(curl \
     -vs ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/ 2>&1 | \
     awk '/tar.gz/ {print $9}')
cd uniprot
tar xf reference_proteomes.tar.gz

touch reference_proteomes.fasta.gz
find . -mindepth 2 | grep "fasta.gz" | grep -v 'DNA' | grep -v 'additional' | xargs cat >> reference_proteomes.fasta.gz

echo "accession\taccession.version\ttaxid\tgi" > reference_proteomes.taxid_map
zcat /.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t" 0}' >> reference_proteomes.taxid_map

diamond makedb -p 16 --in reference_proteomes.fasta.gz --taxonmap reference_proteomes.taxid_map --taxonnodes ../taxdump/nodes.dmp -d reference_proteomes.dmnd
cd -

Fetch any BUSCO lineages that you plan to use

mkdir -p busco
wget -q -O eukaryota_odb9.gz "https://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz" \
        && tar xf eukaryota_odb9.gz -C busco