Pre-processing

Note

Most of routines specified in this section will output processed clonotype tables and re-normalize individual clonotype frequencies by dividing their read count by the total read count in resulting (filtered/processed) sample. For some of the routines this behavior can be disabled with --save-freqs option. In this case original clonotype frequencies will be carried over from input samples and they will likely not sum to 1.0 in the resulting clonotype table.

Correct

Performs frequency-based correction to eliminate erroneous clonotypes. Searches the sample for clonotype pairs that differ by one, two … (up to specified depth) mismatches. In case the ratio of smallest to largest clonotype sizes is lower than the threshold specified as ratio ^ number_of_mismatches correction is performed. Largest clonotype in pair increases its size by the read count of the smaller one and the smaller one is discarded. Note that the original sample is not changed during correction, so all comparisons are performed with original count values and erroneous clonotypes are only removed after search procedure is finished. It is also possible to restrict correction to clonotypes with identical V/J segments using -a option.

Command line usage

$VDJTOOLS Correct \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-d --depth 1+ Maximum number of mismatches allowed between clonotypes being compared. Default is 2
-r --ratio [0, 1) Child-to-parent clonotype size ratio threshold under which child clonotype is considered erroneous. Default is 0.05
-a --match-segment   Check for erroneous clonotypes only among those that have identical V and J assignments
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs corrected samples to the path specified by output prefix and creates a corresponding metadata file. Will also append corr:[-d option value]:[-r option value]:['vjmatch' or 'all' based on -a option] to ..filter.. metadata column.

Graphical output

none


Decontaminate

Cross-sample contamination can occur at library prep stage, for example sample barcode swithing resulting from PCR chimeras. Those could lead to a high number of artificial shared clonotypes for samples sequenced in the same batch. If no sophisticated library prep method (e.g. paired-end barcoding) is applied, it is highly recommended to filter those before performing any kind of cross-sample analysis.

This routine filters out all clonotypes that have a matching clonotype in a different sample which is -r times more abundant. Clonotype fractions within samples are considered, which is good for dealing with FACS-related contaminations. In case of dealing with cross-sample contaminations in samples coming from the same sequencing lane use --read-based option that tells the routine to compare read counts.

Command line usage

$VDJTOOLS Decontaminate \
[options] [sample1.txt sample2.txt ... if -m is not specified] filter_sample output_prefix

Parameters

Shorthand Long name Argument Description
-S --software string Input format. See Common parameters
  --read-based string If set will compare clonotype read counts. Clonotype fractions in corresponding samples are compared by default.
-m --metadata path Path to metadata file. See Common parameters
-r --ratio numeric Parent-to-child clonotype frequency ratio for contamination filtering. Defaults to 20
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs filtered samples to the path specified by output prefix and creates a corresponding metadata file. Will also append dec:[-r value] to ..filter.. metadata column.

Graphical output

none


DownSample

Down-samples a list of clonotype abundance tables by randomly selecting a pre-defined number of reads or clonotypes. This routine could be useful for

  • normalizing samples to remove certain biases for depth-dependent statistics
  • speeding up computation / decreasing file size and memory footprint.

Command line usage

$VDJTOOLS DownSample \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-x --size integer Number of reads/clonotypes to take. Required
-u --unweighted   Will not weight clonotypes by frequency
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs sub-samples to the path specified by output prefix and creates a corresponding metadata file. Will also append ds:[-x value] to ..filter.. metadata column.

Graphical output

none


FilterNonFunctional

Filters non-functional (non-coding) clonotypes, i.e. the ones that contain a stop codon or frameshift in their receptor sequence. Those clonotypes do not have any functional role, but they are useful for dissecting and studying the V-(D)-J recombination machinery as they do not pass thymic selection.

Command line usage

$VDJTOOLS FilterNonFunctional \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-e --negative   Negative filtering, i.e. only non-functional clonotypes are retained
-e --negative   Negative filtering, i.e. only non-functional clonotypes are retained
  --save-freqs   Don’t re-calculate clonotype frequencies and use those from original sample (no re-normalization)
-h --help   Display help message

Tabular output

Outputs filtered samples to the path specified by output prefix and creates a corresponding metadata file. Will also append ncfilter:[retain or remove based on -e option] to ..filter.. metadata column.

Creates a filter summary file with a ncfilter.summary.txt suffix containing info on the number of unique clonotypes that passed the filtering process, their total frequency and count.

Graphical output

none


SelectTop

Selects top N clonotypes from the sample. Useful for studying exapanded clonotypes and clonotypes with strong convergent recombination bias, as well as robust computing of unweighted statistics.

Command line usage

$VDJTOOLS SelectTop \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-x --top integer Number of top clonotypes to take. Required
  --save-freqs   Don’t re-calculate clonotype frequencies and use those from original sample (no re-normalization)
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs sub-samples to the path specified by output prefix and creates a corresponding metadata file. Will also append top:[-x value] to ..filter.. metadata column.

Graphical output

none


FilterByFrequency

Selects clonotypes that either have a frequency above the specified threshold and/or constitute more than a specified percent of reads (e.g. quantile threshold of 0.25 will top N clonotypes that in total contain 25% of reads in the sample). Those two filters can be used together or separately by setting either frequency threshold to 0 or quantile threshold to 1.

Command line usage

$VDJTOOLS FilterByFrequency \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-f --freq-threshold 0.0-1.0 Clonotype frequency threshold. Default is 0.01
-q --quantile-threshold 0.0-1.0 Quantile threshold. Will retain a set of top N clonotypes so that their total frequency is equal or less to the specified threshold. Default is 0.25
  --save-freqs   Don’t re-calculate clonotype frequencies and use those from original sample (no re-normalization)
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs filtered samples to the path specified by output prefix and creates a corresponding metadata file. Will also append freqfilter:[-f value]:[-q value] to ..filter.. metadata column.

Graphical output

none


ApplySampleAsFilter

Retains/filters out all clonotypes found in a given sample S from other samples. Useful when S contains some specific cells of interest e.g. tumor-infiltrating T-cells or sorted tetramer+ T-cells.

Command line usage

$VDJTOOLS ApplySampleAsFilter \
[options] [sample1.txt sample2.txt ... if -m is not specified] filter_sample output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-i --intersect-type string Sample intersection rule. Defaults to strict. See Common parameters
-e --negative   Negative filtering, i.e. only clonotypes absent in sample S are retained
  --save-freqs   Don’t re-calculate clonotype frequencies and use those from original sample (no re-normalization)
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs filtered samples to the path specified by output prefix and creates a corresponding metadata file. Will also append asaf:[- if -e, + otherwise]:[-i value] to ..filter.. metadata column.

Graphical output

none


FilterBySegment

Filters clonotypes that have V/D/J segments that match a specified segment set.

Command line usage

$VDJTOOLS FilterBySegment \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-n --negative   Retain only clonotypes that lack specified V/D/J segments.
-v --v-segments v1,v2,… A comma-separated list of Variable segment names. Non-matching incomplete names will be partially matched.
-d --d-segments d1,d2,… A comma-separated list of Diversity segment names. Non-matching incomplete names will be partially matched.
-j --j-segments j1,j2,… A comma-separated list of Joining segment names. Non-matching incomplete names will be partially matched.
  --save-freqs   Don’t re-calculate clonotype frequencies and use those from original sample (no re-normalization)
-c --compress   Compress output sample files
-h --help   Display help message

Tabular output

Outputs filtered samples to the path specified by output prefix and creates a corresponding metadata file. Will also append segfilter:[retain or remove based on -e option]:[-v value]:[-d value]:[-j value] to ..filter.. metadata column.

Creates a filter summary file with a segfilter.summary.txt suffix containing info on the number of unique clonotypes that passed the filtering process, their total frequency and count.

Graphical output

none