Operate on clonotype tables

JoinSamples

Joins several clonotype tables together to form a joint clonotype abundance table. Joint clonotype holds information on all clonotypes that match under a certain comparison criteria (e.g. identical CDR3nt and V segment), their samples of origin and corresponding abundances. At least two samples should be specified for this routine. For two sample case also consider using OverlapPair routine.

Attention

This is the most memory-demanding routine, especially for a large number of samples.

Command line usage

$VDJTOOLS JoinSamples \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See See Common parameters
-i --intersect-type string Sample intersection rule. Defaults to aa. See Common parameters
-x --times-detected integer Minimal number of samples in which a clonotype should be detected to get to the final output. Default = 2
-p --plot   Turns on plotting. See Common parameters
-c --compress   Compressed output for clonotype table. See Common parameters
-h --help   Display help message

Tabular output

Summary table suffixed join.[value of -i argument].summary.txt is created with the following columns.

Column Description
<first sample id > Indicator for the first sample, either 0 or 1
<second sample id > Indicator for the second sample
 
clonotypes Number of clonotypes detected in all samples that have 1 indicator in a given row.

Joint clonotype abundance table file having join.[value of -i argument].table.txt suffix that contains joint clonotypes detected in at least -x samples. Table structure is described in the section below.

Joint clonotype abundance table structure

First columns have the same meaning as in VDJtools format clonotype abundance table, they are computed as follows:

  • Normalized frequency is computed as geometric mean of clonotype frequencies that comprise a given joint clonotype in intersected samples. If clonotype is missing, its frequency is set to 1e-9.

    Note

    Joint clonotype is formed as a union of all clonotype variants in all samples that match under the specified -i rule.

  • Normalized count is calculated by scaling normalized frequencies so that the joint clonotypes with smallest frequency has a count of 1.

  • Clonotype signature (CDR3nt, CDR3aa, V, D and J) is taken from a representative clonotype.

    Note

    When several clonotype variants are present in samples that correspond to the same clonotype under -i rule (e.g. several Variable segment variants when -i nt is set), only the most abundant form is selected as a representative clonotype to final output.

Column Description
count Normalized clonotype count
freq Normalized clonotype frequency
cdr3nt Representative CDR3 nucleotide sequence
cdr3aa Representative CDR3 amino acid sequence
v Representative Variable segment
d Representative Diversity segment
j Representative Joining segment
peak Index of a time point at which given clonotype reaches its maximum frequency
occurrences Number of samples the joint clonotype was detected in
<sample name> Frequency of a joint clonotype at corresponding sample
 

Graphical output

A Venn diagram can be found in a file having join.[value of -i argument].venn.pdf suffix. Note that if there are more than 5 samples, it will be constructed for the first 5 samples. Plotting is performed using VennDiagram R package.

_images/join-venn.png

Overlap of clonotype sets. See Venn diagram wiki article for the description.


PoolSamples

Pools clonotypes from several samples together and merges clonotypes that that match under a certain comparison criteria (e.g. identical CDR3nt and V segment). Note that this routine can be used with a single sample to aggregate the sameple, e.g. by CDR3 amino acid sequence, in this case CDR3 nucleotide sequence, V and J segments will be taken from a representative clonotype variant with the highest frequency.

Command line usage

$VDJTOOLS PoolSamples \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix

Parameters:

Shorthand Long name Argument Description
-m --metadata path Path to metadata file. See Common parameters
-i --intersect-type string Sample intersection rule. Defaults to strict. See Common parameters
-p --plot   Turns on plotting. See Common parameters
-c --compress   Compressed output for clonotype table. See Common parameters
-h --help   Display help message

Tabular output

Summary table suffixed pool.[value of -i argument].summary.txt is created with the following columns.

Column Description
incidence.count Indicator for the first sample, either 0 or 1
read.count Total number of reads associated with a given pooled clonotype
convergence Total number of clonotype variants that match the pooled clonotype under -i rule.

Pooled clonotype abundance table file having pool.[value of -i argument].summary.txt. Table structure is described in the section below.

Pooled clonotype abundance table structure

First columns have the same meaning as in VDJtools format clonotype abundance table, they are computed as follows:

  • Pooled count is computed as the total number of reads associated with clonotype variants that match under the specified -i rule.
  • Frequency is computed as pooled count divided by total number of reads in all samples.
  • Clonotype signature (CDR3nt, CDR3aa, V, D and J) is taken from a representative clonotype in the same way as described for Joint clonotype abundance table structure.
Column Description
count Pooled clonotype count
freq Pooled clonotype frequency
cdr3nt Representative CDR3 nucleotide sequence
cdr3aa Representative CDR3 amino acid sequence
v Representative Variable segment
d Representative Diversity segment
j Representative Joining segment
incidence Number of samples containing clonotype variants that comprise a given pooled clonotype
convergence Total number of clonotype variants that match the pooled clonotype under -i rule

Graphical output

planned