USDA Bioinformatics Coordination Program for Animal Genome

Terminology - New Terms

ATAC-Seq - Assay for Transposase-Accessible Chromatin with high-throughput sequencing. The ATAC-Seq method relies on next-generation sequencing (NGS) library construction using the hyperactive transposase Tn5. By sequencing regions of open chromatin, ATAC-Seq can help to assess genome-wide chromatin accessibility, thus uncover how chromatin packaging and other factors affect gene expression.
ChIP-seq - ChIP-sequencing, is a method used to analyze protein interactions with DNA. The method combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins.
CAGE - Cap analysis of gene expression, is an approach to identify and monitor the activity (transcription initiation frequency) of transcription start sites (TSSs) at single base-pair resolution across the genome.
miRNA-seq - MicroRNA sequencing, a type of RNA-Seq, is the use of next-generation sequencing or massively parallel high-throughput DNA sequencing to sequence microRNAs, also called miRNAs. miRNA-seq differs from other forms of RNA-seq in that input material is often enriched for small RNAs. miRNA-seq allows researchers to examine tissue-specific expression patterns, disease associations, and isoforms of miRNAs, and to discover previously uncharacterized miRNAs.
RAD-seq - a protocol for genotyping and discovery of single-nucleotide polymorphisms (SNPs) (Baird et al., 2008). This approach is particularly useful for genotyping when a reference genome is not available, such as in ecological studies. Restriction site associated DNA (RAD) markers are a type of genetic marker which are useful for association mapping, QTL-mapping, population genetics, ecological genetics and evolutionary genetics. The use of RAD markers for genetic mapping is often called RAD mapping. An important aspect of RAD markers and mapping is the process of isolating RAD tags, which are the DNA sequences that immediately flank each instance of a particular restriction site of a restriction enzyme throughout the genome.[1] Once RAD tags have been isolated, they can be used to identify and genotype DNA sequence polymorphisms mainly in form of single nucleotide polymorphisms (SNPs).[1] Polymorphisms that are identified and genotyped by isolating and analyzing RAD tags are referred to as RAD markers.
RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome. Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression over time, or differences in gene expression in different groups or treatments
RACE - Rapid amplification of cDNA ends, is a technique used in molecular biology to obtain the full length sequence of an RNA transcript found within a cell.
FPKM - Fragments per kilobase of transcript per million fragments mapped. FPKM is very similar to RPKM. RPKM was made for single-end RNA-seq, where every read corresponded to a single fragment that was sequenced. FPKM was made for paired-end RNA-seq. With paired-end RNA-seq, two reads can correspond to a single fragment, or, if one read in the pair did not map, one read can correspond to a single fragment. The only difference between RPKM and FPKM is that FPKM takes into account that two reads can map to one fragment (and so it doesn’t count this fragment twice).
RPKM - Reads Per Kilobase Million

Count up the total reads in a sample and divide that number by 1,000,000 – this is our “per million” scaling factor.
Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving you reads per million (RPM)
Divide the RPM values by the length of the gene, in kilobases. This gives you RPKM.

TPM - Transcripts per million. TPM is very similar to RPKM and FPKM. The only difference is the order of operations

Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
Divide the RPK values by the “per million” scaling factor. This gives you TPM.

CPM (Counts per million) is a basic gene expression unit that normalizes only for sequencing depth (depth-normalized counts). CPM is also known as RPM (Reads per million). The CPM is biased in some applications where the gene length influences gene expression, such as RNA-seq. CPM is calculated by dividing the mapped reads count by a per million scaling factor of total mapped reads.

REFERENCES:

Guan, L., Yang, Q., Gu, M. et al. (2014) Exon expression QTL (eeQTL) analysis highlights distant genomic variations associated with splicing regulation. Quant Biol 2, 71–79 (2014). https://doi.org/10.1007/s40484-014-0031-9

RPKM, FPKM and TPM, clearly explained (in Data Normalization, Expression and Quantification, Statistical Analysis.

Last updated: May 01, 2023