Operate on clonotype tables¶

JoinSamples¶

Joins several clonotype tables together to form a joint clonotype abundance table. Joint clonotype holds information on all clonotypes that match under a certain comparison criteria (e.g. identical CDR3nt and V segment), their samples of origin and corresponding abundances. At least two samples should be specified for this routine. For two sample case also consider using OverlapPair routine.

Attention

This is the most memory-demanding routine, especially for a large number of samples.

Command line usage¶

$VDJTOOLS JoinSamples \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-m`	`--metadata`	path	Path to metadata file. See See Common parameters
`-i`	`--intersect-type`	string	Sample intersection rule. Defaults to `aa`. See Common parameters
`-x`	`--times-detected`	integer	Minimal number of samples in which a clonotype should be detected to get to the final output. Default = 2
`-p`	`--plot`		Turns on plotting. See Common parameters
`-c`	`--compress`		Compressed output for clonotype table. See Common parameters
`-h`	`--help`		Display help message

Tabular output¶

Summary table suffixed join.[value of -i argument].summary.txt is created with the following columns.

Column	Description
<first sample id >	Indicator for the first sample, either `0` or `1`
<second sample id >	Indicator for the second sample
…
clonotypes	Number of clonotypes detected in all samples that have `1` indicator in a given row.

Joint clonotype abundance table file having join.[value of -i argument].table.txt suffix that contains joint clonotypes detected in at least -x samples. Table structure is described in the section below.

Joint clonotype abundance table structure¶

First columns have the same meaning as in VDJtools format clonotype abundance table, they are computed as follows:

Normalized frequency is computed as geometric mean of clonotype frequencies that comprise a given joint clonotype in intersected samples. If clonotype is missing, its frequency is set to 1e-9.

Note

Joint clonotype is formed as a union of all clonotype variants in all samples that match under the specified -i rule.
Normalized count is calculated by scaling normalized frequencies so that the joint clonotypes with smallest frequency has a count of 1.
Clonotype signature (CDR3nt, CDR3aa, V, D and J) is taken from a representative clonotype.

Note

When several clonotype variants are present in samples that correspond to the same clonotype under -i rule (e.g. several Variable segment variants when -i nt is set), only the most abundant form is selected as a representative clonotype to final output.

Column	Description
count	Normalized clonotype count
freq	Normalized clonotype frequency
cdr3nt	Representative CDR3 nucleotide sequence
cdr3aa	Representative CDR3 amino acid sequence
v	Representative Variable segment
d	Representative Diversity segment
j	Representative Joining segment
peak	Index of a time point at which given clonotype reaches its maximum frequency
occurrences	Number of samples the joint clonotype was detected in
<sample name>	Frequency of a joint clonotype at corresponding sample
…

Graphical output¶

A Venn diagram can be found in a file having join.[value of -i argument].venn.pdf suffix. Note that if there are more than 5 samples, it will be constructed for the first 5 samples. Plotting is performed using VennDiagram R package.

Overlap of clonotype sets. See Venn diagram wiki article for the description.

PoolSamples¶

Pools clonotypes from several samples together and merges clonotypes that that match under a certain comparison criteria (e.g. identical CDR3nt and V segment). Note that this routine can be used with a single sample to aggregate the sameple, e.g. by CDR3 amino acid sequence, in this case CDR3 nucleotide sequence, V and J segments will be taken from a representative clonotype variant with the highest frequency.

Command line usage¶

$VDJTOOLS PoolSamples \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand	Long name	Argument	Description
`-m`	`--metadata`	path	Path to metadata file. See Common parameters
`-i`	`--intersect-type`	string	Sample intersection rule. Defaults to `strict`. See Common parameters
`-p`	`--plot`		Turns on plotting. See Common parameters
`-c`	`--compress`		Compressed output for clonotype table. See Common parameters
`-h`	`--help`		Display help message

Tabular output¶

Summary table suffixed pool.[value of -i argument].summary.txt is created with the following columns.

Column	Description
incidence.count	Indicator for the first sample, either `0` or `1`
read.count	Total number of reads associated with a given pooled clonotype
convergence	Total number of clonotype variants that match the pooled clonotype under `-i` rule.

Pooled clonotype abundance table file having pool.[value of -i argument].summary.txt. Table structure is described in the section below.

Pooled clonotype abundance table structure¶

First columns have the same meaning as in VDJtools format clonotype abundance table, they are computed as follows:

Pooled count is computed as the total number of reads associated with clonotype variants that match under the specified -i rule.
Frequency is computed as pooled count divided by total number of reads in all samples.
Clonotype signature (CDR3nt, CDR3aa, V, D and J) is taken from a representative clonotype in the same way as described for Joint clonotype abundance table structure.

Column	Description
count	Pooled clonotype count
freq	Pooled clonotype frequency
cdr3nt	Representative CDR3 nucleotide sequence
cdr3aa	Representative CDR3 amino acid sequence
v	Representative Variable segment
d	Representative Diversity segment
j	Representative Joining segment
incidence	Number of samples containing clonotype variants that comprise a given pooled clonotype
convergence	Total number of clonotype variants that match the pooled clonotype under `-i` rule

Graphical output¶

planned