What is a Genome?
A genome is a collection of an organism’s DNA.
Every living thing on earth share one thing in common: DNA. The name DNA comes from the letters of the long chemical name: Deoxyribonucleic acid.
DNA contains a set of instructions to make a human, an animal, a bacterium, a tree, every living thing, and viruses.
- What is a Genome?
- Common simple, incomplete or contradictory definitions of a genome
- Definition 1: Genome is the haploid set of chromosomes
- Info box: DNA
- Info box: Chromosomes
- Info box: A Schematic Paths and Fates of Chromosomes in Human Reproduction
- Three distinct haploid chromosome sets - which one is the genome?
- Definition 2: Genome contains all the inheritable traits of an organism
- Conclusions - What is a genome?
- Case A: Genome is DNA comprising a set of 23 chromosomes in humans
- Case B: Genome is all inheritable traits of an organism
- The sequence of the human genome
But this definition is simplified. The term genome is in frequent use in this sense, but other definitions include more than just DNA.
DNA alone cannot make an individual or reproduce itself, but needs to work together with the machinery of a cell and the final outcome is in the end affected by the environment.
Many widely spread definitions are confusing and even contradictory.
If you are interested in more details, keep on reading. I try to avoid many scientific terms.
Common simple, incomplete or contradictory definitions of a genome
Let’s look at few general definitions of the word genome. Don’t concern with the details for now, we unravel each of the definitions.
- 1. The word genome was first coined by Hans Winkler, a German botany professor in 1920 to mean "the haploid chromosome set, which, together with the pertinent protoplasm, specifies the material foundations of the species."
- 2. Oxford Dictionary (Online): "1. The haploid set of chromosomes in a gamete or microorganism, or in each cell of a multicellular organism. and 1.1. The complete set of genes or genetic material present in a cell or organism."
- 3. Dictionary.com: "a full set of chromosomes; all the inheritable traits of an organism."
- 4. Genetics Home Reference website, the National Institutes of Health (NIH) definition (Accessed March 9th, 2019): "A genome is an organism’s complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. In humans, a copy of the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a nucleus."
We have many concepts here, but I explain each term and explore how they relate to the definitions.
I try to keep the descriptions as simple as possible, and you should be able to follow without a degree in biology or genetics.
Definition 1: Genome is the haploid set of chromosomes
This definition is in the original, Hans Winkler's definition (1), and in the Oxford Dictionary (2).
Haploid means a half set of paired chromosomes. In humans, that is 23 chromosomes (Figure 2).
Non-sex cells in the human body contain 23 chromosome pairs, making the total chromosome count to be 46. The human sex cells don't contain paired chromosomes, but a single set, a haploid set; Thus, the chromosome count in sex cells is 23. Each chromosome set is also unique.
Given the above knowledge, the question is: does "the haploid set of chromosomes" refer to a particular set of chromosomes or to any of them?
A related question arises from the announcement of the completion of the human genome in 2003: "The international effort to sequence the 3 billion DNA letters in the human genome..."  What did they sequence?
The haploid set of human chromosomes (23) contains three billion letters or base pairs; So, the reference should be to one of the haploid sets then, but which one?
The answer is surprising: None of them. I discuss this later in this article.
Haploid, Chromosomes, and DNA
DNA molecules comprise a pair of strands forming a twisting helix. Both strands are a chain of small molecules. These small molecules are the nucleotides adenine, guanine, cytosine, and thymine. In short A, G, and T (Figure 1).
A specific order of the nucleotides in a chain is a code for, for example, information on how to make a certain protein.
We measure the length of a double-stranded DNA in base pairs because the nucleotides chemically are bases and form complementary pairs along each DNA strand. If we refer to a length of a single strand, we give the length in nucleotides.
In humans, DNA is in 23 pairs of fragments, called chromosomes. Chromosome 1 is the longest made up of 249 base pairs, and chromosome 21 is the shortest, comprising a chain of 48 million base pairs (Figure 2).
We have 23 pairs of chromosomes because we inherited 23 chromosomes from our mother and 23 from our father, the total of 46 chromosomes. The total length of 23 chromosomes is about three billion base pairs, and the length of all 46 chromosomes is 2 x 23 = 6 billion base pairs.
We have approximately 30 trillion cells in our body, and we can put them into three categories:
- 1. Cells that don't contain chromosomes or DNA, e.g., red blood cells.
- 2. Cells contain all the 23 chromosome pairs, and the total length is six billion base pairs of DNA in each cell.
- 3. Cells contain a single set of 23 chromosomes. These are a blend of the original 23 pairs. These are the sex cells for reproduction, sperm and egg cells, and contain a haploid set of chromosomes; Thus, the total length of DNA in each human sex cell is three billion base pairs. When the sex cells unite in reproduction, they become a single cell that again contains 46 chromosomes, i.e., 23 pairs.
A Schematic Paths and Fates of Chromosomes in Human Reproduction
The production of the sex cells starts with an exchange of some DNA among the paired chromosomes (Figure 3.1). A cell divides into two new genetically dissimilar cells (Figure 3.2).
After the division, each new cell contains only a single chromosome set (23 chromosomes). They are a unique mixture of the original pairs, either being an egg or sperm (Figure2.2). For simplicity, I show only one of the 23 chromosomes in the figure.
In the reproduction, when the sperm meets the egg cell, they fuse and result in a single cell which again contains 23 pairs of chromosomes (Figure 3.3).
Let's break down the definitions.
Three distinct haploid chromosome sets - which one is the genome?
So, we have three distinct haploid chromosome sets. One inherited from father, one inherited from mother, and new ones in the sex cells. Which one of them is the genome?
The Oxford Dictionary
The Oxford Dictionary states "1. The haploid set of chromosomes in a gamete or microorganism, or in each cell of a multicellular organism."
According to the Oxford Dictionary, any haploid chromosome set is a genome. A gamete is a sex cell containing a haploid set of chromosomes, and microorganisms, such as bacteria only have a single chromosome.
Dictionary.com states: "a full set of chromosomes; all the inheritable traits of an organism."
This definition does not differentiate between sex cells, eggs, sperm (gametes), and other cells in a body (somatic cells).
If we interpret the definition only to include chromosomes containing every inheritable trait, then this definition means that the genome only comprises the chromosomes in the sex cells (gametes).
The chromosomes in the sex cells are a mixture of chromosomes pairs in the non-sex cells (somatic cells); so, not every parents’ traits end up in each sex cell, and thus not inheritable at each reproduction (Figure 3).
Besides, the Y-chromosome originates from your father, in case you are a male. Women don’t have Y-chromosome; Thus, in this case, Y-chromosome is sometimes part of the genome and sometimes not.
Genetics Home Reference website (NIH)
Genetics Home Reference website (NIH) says: "A genome is an organism’s complete set of DNA, including all of its genes..."
Again, the definition is vague. It doesn’t state which set. We now know that the cells contain at least three distinct sets of chromosomes, each comprising DNA (and proteins). The statement part "...including all of its genes..." is redundant, because DNA encodes the genes.
But, cells contain more DNA than what is in the 23 chromosomes. For example, human cells contain mitochondria, which have their own 16,569 base pairs long DNA encoding 37 genes.
Because many scientific publications refer to mitochondrial DNA as its genome , should define genome to include mitochondrial DNA?
Bacteria have more DNA besides their single large DNA. The extra DNA, called plasmids, are short DNA sequences. They can contain, for example, genes to gain resistance to antibiotics.
Hans Winkler: "the haploid chromosome set,..."
Hans Winkler suggested the term genome in 1920, and at his time it nobody knew how the inheritable traits ended up on to the next generation. 
So, we understand that his definition of the term genome couldn’t contain as many details as we might demand today.
However, the term 'haploid' is part of many of today's definitions, but not all of them.
Definition 2: Genome contains all the inheritable traits of an organism
Some explanations incorporate "all the inheritable traits” in the term’s definition, some do not.
The Oxford Dictionary
Oxford Dictionary: "...1.1. The complete set of genes or genetic material present in a cell or organism."
In humans, "The complete set of genes” includes the genes present in each chromosome in a pair in non-sex cells, the genes in the sex cells, and the genes in mitochondria.
Genes in the human make up only about one percent of the total length of the three billion base pairs present in chromosomes in the sex cells or each chromosome set either inherited from mother or father.  So, this statement excludes about 99% of the DNA to belong to a genome.
We may understand the “genetic material” to mean everything related to genes or heredity and a genome to being everything that is inheritable.
Dictionary.com: "... all the inheritable traits of an organism."
"...all inheritable traits of an organism,“ may refer to the chromosomes in the sex cells, that are a blend of the chromosomes we inherit from our mother and father.
But, the traits that end up in an offspring may not always be same to the blend of parents’ chromosomes because the sex cell production may introduce mutations into the sequences.
If we interpret the description to mean possibly inheritable traits, then it defines the genome mean the chromosomes inherited from the parents including unknown mutations that may have occurred during the sex cell generation.
DNA carries information, but it requires many other components to be present in a cell for it to function, such as proteins, enzymes, and RNAs . It is well-known they can, for example, switch off or on inherited genes.
The environment also influences the resulting, visible trait.
In addition, we don’t yet know all the detailed mechanisms and components of all the extra-chromosomal inheritable traits.
Genetics Home Reference website (NIH)
Genetics Home Reference website (NIH): "... Each genome contains all of the information needed to build and maintain that organism. In humans, a copy of the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a nucleus."
Apart from the extra-chromosomal information, “all of the information needed to build and maintain that organism,” requires 46 chromosomes in humans. Their total length is about six billion base pairs.
Each non-sex cell in our bodies with a few exceptions have a copy of the chromosome sets we inherited from each of our parents. That is 2 x 23 = 46 chromosomes, and the total length is 2 x 3 billion = 6 billion base pairs. For us to function requires both of the chromosome sets.
Sex cells are the only cells that contain a single chromosome set, and their sole purpose is a reproduction. These cells do not maintain us.
To be clear, disregarding extra-chromosomal information, it takes six billion DNA base pairs to make you and me. Three billion base pairs are not enough.
Hans Winkler, 1920
Hans Winkler: "... the haploid chromosome set, which, together with the pertinent protoplasm, specifies the material foundations of the species."
About a century ago Hans Winkler coined the term genome and defined it to include the haploid chromosome set and the pertinent protoplasm. At his time it was unknown how traits flowed from generation to generation, let alone what the pertinent protoplasm contains. The pertinent protoplasm loosely means the relevant cell content.
However, one or both the terms haploid chromosome set and the relevant cell content seem to be part of the definitions today.
The expression 'relevant cell content' gives a lot of leeway for interpretation.
He may have included it because he knew that the gooey substance, DNA, included proteins or because he was an exceptional visionary and could envision future scientific breakthrough discoveries decades ahead of his time.
When we explore something new, almost without exception we learn something new. However, dwelling on the reasons why Hans Winkler included ‘relevant cell content’ into his description a century ago, is unlikely to help us a lot.
Conclusions - What is a genome?
In this blog, I have concentrated the discussion on the human genome, but what exactly genome might incorporate, depends on the organism. For example, some viruses don't have DNA but have RNA instead.
From the definitions, we can identify that the term genome may refer to A: DNA comprising a set of 23 chromosomes in humans, B: all inheritable traits of an organism.
Case A: Genome is DNA comprising a set of 23 chromosomes in humans
In this case, the definition is clear cut, and it is possible to make it precise. For example, both human sperm and egg cells each contain a single genome comprising 23 chromosomes (a haploid set of chromosomes).
Each chromosome comprises DNA wrapped around proteins (histones), the total length of DNA of 23 chromosomes combined, is approximately three billion base pairs, and the combined length about 1.8 meters.
In reproduction, the egg and sperm cells fuse into a single cell. This single cell will then contain two genomes in 46 chromosomes or 23 pairs, i.e., two haploid sets of chromosomes.
After sperm and egg cells fuse, the resulting single cell divides and produces two identical daughter cells. The original cell also produces a copy of all the 23 chromosome pairs. This way, each of the resulting daughter cells will have an identical set of 23 chromosome pairs.
The daughter cells continue dividing so that each new cell will contain an identical copy of the original 23 chromosome pairs.
The process will continue and at some point, result in a grown-up human being. This new human being will again produce eggs or sperm, and the whole process will repeat itself.
We can ask what is my genome then? Each egg or sperm we produce is unique, containing 23 chromosomes that are a blend of the DNA sequences in the 23 pairs, we inherited from our parents, plus a few mutations.
However, the inherited 23 pairs in non-sex cells are the ones orchestrating the functions of each cell and our body, not the 23 chromosomes present in our sex cells.
If we revise the question and ask what or where is the DNA that makes me? The answer is the DNA in the 23 chromosome pairs I inherited from my parents. Plus the DNA in the mitochondria I inherited from my mother.
If we define a genome to be a haploid set of chromosomes in sex cells, then non-sex cells contain two haploid genomes, one from each parent.
Case B: Genome is all inheritable traits of an organism
Recent evidence suggests that modifications RNA components of sperm influences traits in an offspring. In addition, several studies have provided evidence that non-sequence-based modifications are inheritable, mediated by proteins.
The environment in which an organism lives can cause many of the non-sequence-based heritable modifications, and thus it may not be possible to define the traits on their entirety.
The traits in chromosomes we inherited from our parents, and the traits encoded into the chromosomes in our sex cells. But, the chromosomes in the sex cells are a mixture of the parents’ chromosomes with added mutations.
Chromosomes in each sex cell are unique, and we can’t predict which one of them produces an offspring. So, does a genome encompass every chromosome in every sex cell?
Mitochondria contain their own chromosomes and only women pass them to the next generation.
The question is then should genome contain all the above, plus yet unknown mediators?
Perhaps, we should aim to be pragmatic and seek for a definition useful to us. In the wake of new discoveries, scientists invent new names for novel concepts. We could treat “all inheritable traits” as a pure information-based concept. Is it time for a new definition?
The ones who sequenced the human genome might know what the genome is.
The sequence of the human genome
In 2001, Craig Venter et al., and the International Human Genome Sequencing Consortium concurrently announced the successful completion of the Human Genome Project.
They sequenced the 23 pairs of chromosomes present in human non-sex cells. The resulting, about three billion base pairs of DNA sequence, is a blend of sequences of both chromosomes from many different individuals.
For privacy reasons, several individual volunteers contributed their sequence to the project to protect their identity.
So, if the genome refers to one of the 23 chromosomes we inherit from our parents, the completed human genome comprises a mosaic of many individual parents’ genomes.
Biologists refer to the sequence comprising half of the 23 chromosome pairs as haploid genome sequence, regardless of it being mosaic or not.
Winkler, H. L. (1920). Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche. Jena: Verlag Fischer.
Lederberg J, McCray AT (2001). 'Ome Sweet 'Omics: A Genealogical Treasury of Words. The Scientist. 2001;15:8.
Federico Abascal, David Juan, Irwin Jungreis, Laura Martinez, Maria Rigau, Jose Manuel Rodriguez, Jesus Vazquez, Michael L Tress (Aug 2018). "Loose ends: almost one in five human genes still have unresolved coding status." Nucleic Acids Research, Volume 46, Issue 14, 21 August 2018, Pages 7070–7084, https://doi.org/10.1093/nar/gky587
Daniel Schott, Itai Yanai, Craig P. Huntera(Dec 2014). "Natural RNA interference directs a heritable response to the environment." Sci Rep. 2014; 4: 7387. doi: 10.1038/srep07387
J. N. Tedeschi, W. J. Kennington, J. L. Tomkins, O. Berry, S. Whiting, M. G. Meekan, N. J. Mitchell (Jan 2016). "Heritable variation in heat shock gene expression: a potential mechanism for adaptation to thermal stress in embryos of sea turtles." Proc Biol Sci. 2016 Jan 13; 283(1822): 20152320. doi: 10.1098/rspb.2015.2320
Robersy Sanchez, Sally A. Mackenzie (Jun 2016). "Genome-Wide Discriminatory Information Patterns of Cytosine DNA Methylation." Int J Mol Sci. 2016 Jun; 17(6): 938. Published online 2016 Jun 17. doi: 10.3390/ijms17060938
Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang RY, Algire MA, Benders GA, Montague MG, Ma L, Moodie MM, Merryman C, Vashee S, Krishnakumar R, Assad-Garcia N, Andrews-Pfannkoch C, Denisova EA, Young L, Qi ZQ, Segall-Shapiro TH, Calvey CH, Parmar PP, Hutchison CA 3rd, Smith HO, Venter JC. (Jul 2010). "Creation of a bacterial cell controlled by a chemically synthesized genome." Science. 2010 Jul 2;329(5987):52-6.
Agustina D’Urso and Jason H. Brickner (Jun 2014). Trends Genet. 2014 Jun; 30(6): 230–236. PMC
Siklenka K, Erkek S, Godmann M, Lambrot R, McGraw S, Lafleur C, Cohen T, Xia J, Suderman M, Hallett M, Trasler J, Peters AH, Kimmins S. (Nov 2015). "Disruption of histone methylation in developing sperm impairs offspring health transgenerationally." Science. 2015 Nov 6;350(6261):aab2006. PubMed
Minoo Rassoulzadegan, Valérie Grandjean, Pierre Gounon, Stéphane Vincent, Isabelle Gillot and François Cuzin (May 2006). "RNA-mediated non-mendelian inheritance of an epigenetic change in the mouse." Nature volume 441, pages 469–474 (25 May 2006)
Greg Gibson (Jan 2012). Rare and Common Variants: Twenty arguments." Nat Rev Genet. 2011 Feb; 13(2): 135–145. PMC
Ali B. Rodgers, Christopher P. Morgan, N. Adrian Leu, and Tracy L. Bale (Nov 2015). Proc Natl Acad Sci U S A. 2015 Nov 3; 112(44): 13699–13704. PMC
D. L. NANNEY (Jul 1966). "Corticotype Transmission in Tetrahymena." Genetics. 1966 Oct; 54(4): 955–968. PMC
Zachary H. Harvey, Yiwen Chen, Daniel F. Jarosz1 (Jan 2018). "Protein-based inheritance: Epigenetics beyond 1 the chromosome." Mol Cell. 2018 Jan 18; 69(2): 195–202. PMC
Qi Chen, Wei Yan, Enkui Duan (Oct 2016). "Epigenetic inheritance of acquired traits through sperm RNAs and sperm RNA modifications." PMC
J. Craig Venter, Mark D. Adams, Eugene W. Myers ... (Feb 2001). "The Sequence of the Human Genome." Science 16 Feb 2001: Vol. 291, Issue 5507, pp. 1304-1351.
International Human Genome Sequencing Consortium (Feb 2001). "Initial sequencing and analysis of the human genome." Nature volume 409, pages 860–921.
Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, Peluso P, Boitano M, Chin CS, Korlach J, Wilson RK, Eichler EE (May 2017). "Discovery and genotyping of structural variation from long-read haploid genome sequence data." Genome Res. 2017 May;27(5):677-685. doi: 10.1101/gr.214007.116. PubMed
Sagi I, Chia G, Golan-Lev T, Peretz M, Weissbein U, Sui L, Sauer MV, Yanuka O, Egli D, Benvenisty N (Apr 2016). "Derivation and differentiation of haploid human embryonic stem cells." Nature. 2016 Apr 7;532(7597):107-11. doi: 10.1038/nature17408. PubMed
Yilmaz A, Peretz M, Sagi I, Benvenisty N (Nov 2016). "Haploid Human Embryonic Stem Cells: Half the Genome, Double the Value." Cell Stem Cell. 2016 Nov 3;19(5):569-572. doi: 10.1016/j.stem.2016.10.009. PubMed