Files can be imported from generic text files, with the ability to specify column separators and map columns to field names. This provides flexibility to view and filter a wide range of additional analysis outputs beyond those with dedicated parsers.
Example
Text file import can be from any text file. This example just uses GC proportion and length of the 10 longest contigs in the Strongyloides venezuelensis assembly
cd ~/BTK_TUTORIAL/FILES nano ASSEMBLY_NAME.values.csv "id","gc","length" "LM524968.1",0.2553,5851329 "LM524969.1",0.2523,3462877 "LM524970.1",0.2523,2731838 "LM524971.1",0.2513,2360143 "LM524972.1",0.2711,1484661 "LM524973.1",0.2619,1243793 "LM524974.1",0.2591,1114134 "LM524975.1",0.2558,1106993 "LM524976.1",0.2374,1082826 "LM524977.1",0.2577,1069219
This file can be imported to add two new fields to the dataset:
~/blobtoolkit/blobtools2/blobtools add \
--text ASSEMBLY_NAME.values.csv \
--text-delimiter ',' \
--text-cols id=identifiers,gc=gc_proportion,length=contig_length \
--text-header \
~/BTK_TUTORIAL/DATASETS/ASSEMBLY_NAME
Configuration
There are several configuration options when adding text files using blobtools add
:
--text
– Text file to import. [Required]--delimiter
– Text file delimiter. [Default: whitespace]--text-cols
– Comma separated list of index[=field_name] or header[=field_name]. [Required]--text-header
– Flag to indicate if first row of text file contains field names. [Default: false]--text-no-array
– Flag to prevent fields in files with duplicate identifiers being loaded as array fields. [Default: false]--replace
– Replace existing fields if present. [Default: false]
--text
A text file can be specified using the --text
flag. As with other parsers, multiple files may be specified, however, it would be necessary for field names to be specified using unique headers in the different files due to the mapping of field names to columns. Alternatively, it is possible to specify =FIELDNAME
after the filename to load all specified columns into a single array or multiarray type field, with multiple values (or multiple sets of values) per identifier.
Usage:
blobtools add \
--text ASSEMBLY_NAME.results.txt \
--text ASSEMBLY_NAME.results2.txt=results2 \
...
--delimiter
The text file column separator can be specified using the --delimiter
flag. The default value is whitespace
, which spilts rows on any whitespace character (e.g. SPACE or TAB). Alternative delimiters can use Python regular expression syntax, e.g. use --delimiter '\t'
for tab delimited files.
Usage:
blobtools add \
...
--delimiter '\t' \
...
--text-cols
For any text file, it is necessary to specify which column contains the contig identifiers (which must match those already imported into the dataset from sequence headers) and at least one column containing values to be assigned to a dataset field. Column specification can be based on column indices or (if --text-header
is set) column names. Field names can be set by adding =FIELD_NAME
after the column index/header (field names must be set this way if --text-header
is not set). The datatype for each field (category or variable) will be detected automatically during import. If the file contains duplicate identifiers and --text-no-array
is not set, array fields will be created with multiple values per identifier.
Usage:
blobtools add \
--text-cols 1=identifiers,2,3=score,total=total_score \
...
--text-header
Set the --text-header
flag if a text file contains a header row. if present, the header row can be used to determine field names.
Usage:
blobtools add \
--text-header \
...
--text-no-array
Set the --text-no-array
flag to prevent files with duplicated identifiers being loaded as array fields. Typically used when a text file is not expected to contain duplicate identifiers.
Usage:
blobtools add \
--text-no-array \
...
--replace
If a blobtools add
command would overwrite an existing field, the default behaviour is to issue a warning and not replace the existing field. To change this behaviour and allow existing fields to be overwritten, set the --replace
flag.
Usage:
blobtools add \
...
--replace \
...