Basic analysis¶

CalcBasicStats¶

This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.

Command line usage¶

$VDJTOOLS CalcBasicStats \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-m`	`--metadata`	path	Path to metadata file. See Common parameters
`-u`	`--unweighted`		If not set, all statistics will be weighted by clonotype frequency
`-h`	`--help`		Display help message

Tabular output¶

The following table with .basicstats.txt suffix is generated,

Column	Description
sample_id	Sample unique identifier
…	Metadata columns. See Metadata section
count	Number of reads in a given sample
diversity	Number of clonotypes in a given sample
mean_frequency	Mean clonotype frequency
geomean_frequency	Geometric mean of clonotype frequency
nc_diversity	Number of non-coding clonotypes
nc_frequency	Frequency of reads that belong to non-coding clonotypes
mean_cdr3nt_length	Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency
mean_insert_size	Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes
mean_ndn_size	Mean number of nucleotides that lie between V and J segment sequences in CDR3
convergence	Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence

Graphical output¶

none

CalcSegmentUsage¶

This routine computes Variable (V) and Joining (J) segment usage vectors, i.e. the frequency of associated reads for each of V/J segments present in sample(s). If plotting is on, will also perform clustering for V/J usage vectors and samples à la gene expression analysis.

Command line usage¶

$VDJTOOLS CalcSegmentUsage \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-m`	`--metadata`	path	Path to metadata file. See Common parameters
`-u`	`--unweighted`		Will compute the number of unique clonotypes with a given V/J segment. Counts the number of reads otherwise
`-p`	`--plot`		Turns on plotting. See Common parameters
`-f`	`--factor`	string	Specifies plotting factor. See Common parameters
`-n`	`--numeric`		Specifies if plotting factor is numeric. See Common parameters
`-l`	`--label`	string	Specifies label used for plotting. See Common parameters
`-h`	`--help`		Display help message

Tabular output¶

The following tables with .segments.[unwt or wt depending on -u parameter].[V or J].txt suffix are generated,

Column	Description
sample_id	Sample unique identifier
…	Metadata columns. See Metadata section
Segment name, e.g. TRBJ1-1	Segment frequency in a given sample
Next segment name, e.g. TRBJ1-2	…
…	…

Graphical output¶

Images, having the same name as tables, with the exception of .pdf extension, are created if plotting is on. They display segment usage heatmap and hierarchical clustering for samples and segment.

This figure will be created using heatmap.2 function from gplots R package with default clustering parameters.

Sample clustering based on Variable segment usage. Weighted Variable usage profiles are used, hierarchical clustering is performed using euclidean distance. A continuous factor is displayed (-n -f age argument).

CalcSpectratype¶

Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length. The spectratype is useful to detect pathological and highly clonal repertoires, as the spectratype of non-expanded T- and B-cells has a symmetric gaussian-like distribution.

Command line usage¶

$VDJTOOLS CalcSpectratype \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-m`	`--metadata`	path	Path to metadata file. See Common parameters
`-u`	`--unweighted`		Instead of computing read frequency, will compute the number of unique clonotypes with specific a CDR3 length
`-a`	`--amino-acid`		Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
`-h`	`--help`		Display help message

Tabular output¶

The following table with .spectratype.[aa or nt depending on -a parameter].[unwt or wt depending on -u parameter].txt suffix is generated,

Column	Description
sample_id	Sample unique identifier
…	Metadata columns. See Metadata section
CDR3 length, e.g. 22	Frequency of reads with a given CDR3 length in a given sample
Next CDR3 length, 23	…
…	…

Graphical output¶

none

PlotFancySpectratype¶

Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample. This plot allows to detect the highly-expanded clonotypes.

Command line usage¶

$VDJTOOLS PlotFancySpectratype [options] sample.txt output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-t`	`--top`	int	Number of top clonotypes to visualize. Should not exceed 20, default is 10
`-h`	`--help`		Display help message

Tabular output¶

Following table with .fancyspectra.txt prefix is generated,

Column	Description
Len	Length of CDR3 nucleotide sequence
Other	Frequency of clonotypes with a given CDR3 length, other than top N
Clonotype#N, e.g. CASRLLRAGSTEAFF	Clonotype frequency, at the corresponding CDR3 length
Clonotype#N-1	…
…	…

Graphical output¶

The following image file with .fancyspectra.pdf suffix,

Spectratype with additional detalization. Most abundant clonotypes are explicitly shown.

PlotFancyVJUsage¶

Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.

Command line usage¶

$VDJTOOLS PlotFancyVJUsage [options] sample.txt output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-u`	`--unweighted`		Instead of computing read frequency, will compute the number of unique clonotypes with specific V-J junctions
`-h`	`--help`		Display help message

Tabular output¶

A matrix with rows corresponding to different J segments and columns corresponding to different V segments. Each cells contains the frequency of a give V-J junction. The file has .fancyvj.[unwt or wt depending on -u parameter].txt suffix.

Graphical output¶

An image having the same name as the output table, with the exception of .pdf extension, is generated. The plot is built using circlize R package.

V-J junction circos plot for a single sample. Arcs correspond to different V and J segments, scaled to their frequency in sample. Ribbons represent V-J pairings and their size is scaled to the pairing frequency (weighted in present case).

PlotSpectratypeV¶

Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families. This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.

Command line usage¶

$VDJTOOLS PlotSpectratypeV [options] sample.txt output_prefix

Parameters¶

Shorthand	Long name	Argument	Description
`-t`	`--top`	int	Number of top (by frequency) V segments to visualize. Should not exceed 12 default is 12
`-u`	`--unweighted`		Instead of counting read frequency, will count the number of unique clonotypes
`-h`	`--help`		Display help message

Tabular output

Following table with .spectraV.[unwt or wt depending on -u parameter].txt prefix is generated,

Column	Description
Len	Length of CDR3 nucleotide sequence
Other	Frequency of clonotypes with a given CDR3 length, having V segments other than the top N
Segment#N, e.g. TRBV10-1	Frequency of clonotypes with a given V segment at the corresponding CDR3 length
Segment#N-1	…
…	…

Graphical output

The following image file with .spectraV.[unwt or wt depending on -u parameter].pdf suffix,

Stacked spectratypes by Variable segment for a single sample. Most frequent Variable segments are highlighted.