Anatomy of a 10X Genomics 3′ scRNA cDNA Library

The below diagram is a representation of a barcoded double-stranded cDNA molecule produced from the 10X Genomics 3′ Gene Expression assay. This 5-minute video shows how Illumina’s “sequencing by synthesis” approach works and may be helpful for understanding the importance of some of these components.

This diagram is a reproduction of one produced and frequently used by 10X Genomics (example here).
  1. The P5 and P7 adapter sequences are universal sequences shared by every molecule in the cDNA library. These are used to allow the library to bind to the surface of the sequencing flow cell. These sequences are added in one of the final steps of the library preparation.
  2. The i5 and i7 “dual index” sequences are 10 bp barcode sequences added along with the P5 and P7 adapter sequences in one of the final steps of library preparation. Within a single library, all of the molecules will contain the same i5 and i7 indexes, but each separate library will have its own unique combination of i5 and i7 indexes. This allows libraries to be pooled together for efficient multiplexed sequencing.
  3. Like the P5 and P7 adapters, the “Read 1” and “Read 2” sites serve as primer binding sites. These particular sites are used to initiate the “sequencing by synthesis” reaction to read the sequences of the i5 and i7 indexes; the cell barcode (#4.); the UMI (#5); and the cDNA insert (#7). The “Read 1” sequence is added at the beginning of the library preparation when the transcripts are first captured; the “Read 2” sequence is added during one of the later steps of the protocol.
  4. The Cell Barcode (also referred to by 10X Genomics as the 10X Barcode) is a unique barcode sequence used to differentiate cells from each other in a single cell RNA-seq data set. All of the cDNA molecules generated from mRNA captured from a single cell will receive the same Cell Barcode sequence.
  5. The “unique molecular identifier” or UMI is a unique barcode sequence used to differentiate individual transcripts from each other within a single cell. Each cDNA molecule generated from a captured mRNA molecule will receive a unique UMI barcode, allowing for a slight measure of relative gene expression.
  6. The Poly(dT) sequence is a string of T nucleotides used to capture mRNA at the 3′ end by its polyA tail.
  7. The cDNA insert is the actual sequence of the transcript being captured. The library preparation process involves fragmentation of the initially full-length cDNA into smaller pieces; because we need the P5 adapter and Read 1 sequence to the left of the poly(dT) sequence in order to actually sequence the molecule, only those fragments containing the 3′ tail end of the transcript (and all of those barcodes and primer binding sites) will actually be sequenced. In the default 10X protocol, only 90 bp of the cDNA insert is sequenced.