Basic analysis

CalcBasicStats

This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.

Command line usage

$VDJTOOLS CalcBasicStats \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-u --unweighted   If not set, all statistics will be weighted by clonotype frequency
-h --help   Display help message

Tabular output

The following table with .basicstats.txt suffix is generated,

Column Description
sample_id Sample unique identifier
Metadata columns. See Metadata section
count Number of reads in a given sample
diversity Number of clonotypes in a given sample
mean_frequency Mean clonotype frequency
geomean_frequency Geometric mean of clonotype frequency
nc_diversity Number of non-coding clonotypes
nc_frequency Frequency of reads that belong to non-coding clonotypes
mean_cdr3nt_length Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency
mean_insert_size Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes
mean_ndn_size Mean number of nucleotides that lie between V and J segment sequences in CDR3
convergence Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence

Graphical output

none


CalcSegmentUsage

This routine computes Variable (V) and Joining (J) segment usage vectors, i.e. the frequency of associated reads for each of V/J segments present in sample(s). If plotting is on, will also perform clustering for V/J usage vectors and samples à la gene expression analysis.

Command line usage

$VDJTOOLS CalcSegmentUsage \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-u --unweighted   Will compute the number of unique clonotypes with a given V/J segment. Counts the number of reads otherwise
-p --plot   Turns on plotting. See Common parameters
-f --factor string Specifies plotting factor. See Common parameters
-n --numeric   Specifies if plotting factor is numeric. See Common parameters
-l --label string Specifies label used for plotting. See Common parameters
-h --help   Display help message

Tabular output

The following tables with .segments.[unwt or wt depending on -u parameter].[V or J].txt suffix are generated,

Column Description
sample_id Sample unique identifier
Metadata columns. See Metadata section
Segment name, e.g. TRBJ1-1 Segment frequency in a given sample
Next segment name, e.g. TRBJ1-2

Graphical output

Images, having the same name as tables, with the exception of .pdf extension, are created if plotting is on. They display segment usage heatmap and hierarchical clustering for samples and segment.

This figure will be created using heatmap.2 function from gplots R package with default clustering parameters.

_images/basic-segmentusage.png

Sample clustering based on Variable segment usage. Weighted Variable usage profiles are used, hierarchical clustering is performed using euclidean distance. A continuous factor is displayed (-n -f age argument).


CalcSpectratype

Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length. The spectratype is useful to detect pathological and highly clonal repertoires, as the spectratype of non-expanded T- and B-cells has a symmetric gaussian-like distribution.

Command line usage

$VDJTOOLS CalcSpectratype \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-u --unweighted   Instead of computing read frequency, will compute the number of unique clonotypes with specific a CDR3 length
-a --amino-acid   Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
-h --help   Display help message

Tabular output

The following table with .spectratype.[aa or nt  depending on -a parameter].[unwt or wt depending on -u parameter].txt suffix is generated,

Column Description
sample_id Sample unique identifier
Metadata columns. See Metadata section
CDR3 length, e.g. 22 Frequency of reads with a given CDR3 length in a given sample
Next CDR3 length, 23

Graphical output

none


PlotFancySpectratype

Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample. This plot allows to detect the highly-expanded clonotypes.

Command line usage

$VDJTOOLS PlotFancySpectratype [options] sample.txt output_prefix

Parameters:

Shorthand Long name Argument Description
-t --top int Number of top clonotypes to visualize. Should not exceed 20, default is 10
-h --help   Display help message

Tabular output

Following table with .fancyspectra.txt prefix is generated,

Column Description
Len Length of CDR3 nucleotide sequence
Other Frequency of clonotypes with a given CDR3 length, other than top N
Clonotype#N, e.g. CASRLLRAGSTEAFF Clonotype frequency, at the corresponding CDR3 length
Clonotype#N-1

Graphical output

The following image file with .fancyspectra.pdf suffix,

_images/basic-fancyspectra.png

Spectratype with additional detalization. Most abundant clonotypes are explicitly shown.


PlotFancyVJUsage

Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.

Command line usage

$VDJTOOLS PlotFancyVJUsage [options] sample.txt output_prefix

Parameters:

Shorthand Long name Argument Description
-u --unweighted   Instead of computing read frequency, will compute the number of unique clonotypes with specific V-J junctions
-h --help   Display help message

Tabular output

A matrix with rows corresponding to different J segments and columns corresponding to different V segments. Each cells contains the frequency of a give V-J junction. The file has .fancyvj.[unwt or wt depending on -u parameter].txt suffix.

Graphical output

An image having the same name as the output table, with the exception of .pdf extension, is generated. The plot is built using circlize R package.

_images/basic-fancyvj.png

V-J junction circos plot for a single sample. Arcs correspond to different V and J segments, scaled to their frequency in sample. Ribbons represent V-J pairings and their size is scaled to the pairing frequency (weighted in present case).


PlotSpectratypeV

Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families. This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.

Command line usage

$VDJTOOLS PlotSpectratypeV [options] sample.txt output_prefix

Parameters

Shorthand Long name Argument Description
-t --top int Number of top (by frequency) V segments to visualize. Should not exceed 12 default is 12
-u --unweighted   Instead of counting read frequency, will count the number of unique clonotypes
-h --help   Display help message

Tabular output

Following table with .spectraV.[unwt or wt depending on -u parameter].txt prefix is generated,

Column Description
Len Length of CDR3 nucleotide sequence
Other Frequency of clonotypes with a given CDR3 length, having V segments other than the top N
Segment#N, e.g. TRBV10-1 Frequency of clonotypes with a given V segment at the corresponding CDR3 length
Segment#N-1

Graphical output

The following image file with .spectraV.[unwt or wt depending on -u parameter].pdf suffix,

_images/basic-spectrav.png

Stacked spectratypes by Variable segment for a single sample. Most frequent Variable segments are highlighted.