Sunday, December 26, 2010

Human Genome (DNA) by Numbers (statistics)

Cells in the human bodY - 75-100 trillion Base pairs in each cell -3.1 billion
Base pairs in the largest human gene (dystrophin) -2.4 million
Genes in the human genome - 28,000-35,000
Chromosomes in each cell - 46

Human DNA measurements: DNA helix diameter - 20-26 Ångström2.0-2.6 Nanometers
Distance between base pairs - 3.3-3.4 Ångström0.33-0.34 Nanometers
Length of one helix turn - 33-34 Ångströమ - 3.3-3.4 Nanometers
Number of base pairs in one helix turn - 10

How Human DNA compares to other species:
Below we listed a number of species and their DNA properties like number of chromosomes, genes and base pairs and how they differentiate in comparison to human DNA.

species CHROMOSOMES GENES BASEPAIRS
Human (Homo sapiens)
46 (23 pairs)
23,686
~3.1 billion
Mouse (Mus musculus)
40
23,786
~2.7 billion
Pufferfish (Takifugu rubripes)
44
~31,000
~365 million
Malaria Mosquito (Anopheles gambiae)
6
~14,000
~289 million
Sea Squirt (Ciona intestinalis)
28
~16,000
~160 million
Fruit Fly (Drosophila melanogaster)
8
~14,000
~137 million
Roundworm (Caenorhabditis elegans)
12
~19,000
~97 million
Bacterium (Escherichia coli)
(1)
~5,000
~4.1 million


ANIMAL
Percentage of human DNA
Chimpanzee
96.0%
Orang Utan
96.4%
Gorilla
97.7%
Bonobo
98.4%
Mice
70-98.5%

The Human DNA Structure

The Human DNA (Deoxyribonucleic acid) consists of more than 3 billion base pairs of 4 different nucleic acids (nucleotides) that make up the genetic code of a human. This equals about 750 Megabytes of data that holds all instructions needed for the development of a complete living organism. It is therefore often referred to as a "blueprint" to construct all cells, organs, skin, hair, nails and other components (such as proteins and RNA) necessary for creating a functioning organism like our body. This genetic information is stored in segments of the DNA called "genes". This is the main purpose of DNA - the long-term storage of genetic information. The DNA contained in Human Embryonic Stem Cells can let them grow into every part of the human body such as heart, liver, kidney, brain, ears, eyes, bones or skin. As such, these cells are currently subject of intensive scientific research. The results could provide many new possibilities for therapeutically treatments for a whole range of diseases that are currently deemed incurable.


The DNA resides in the nucleus of each cell in the human body. Inside the nucleus of each body cell, there are 46 Chromosomes in humans, which is 2 complete identically sets of 22 chromosomes (autosome chromosomes) and 2 additional gender specific chromosomes (sex chromosomes) - either 2 X chromosomes (XX) for a female (woman) or 1 X and 1 Y chromosome (XY) for a male (man). Therefore humans are considered diploid as each body cell contains 2 homologous copies of each chromosome, normally one from the mother and one from the father. However, human haploid gametes (sperm and egg) only contain 23 chromosomes - haploid cells and organisms only have 1 copy of chromosomes. These 46 chromosomes are nothing else than an organized structure of the coiled human DNA and certain proteins.

The shape of DNA describes a long spiral, comparable to a twisted rope ladder or a spiral staircase with a fixed diameter. Also called a "double helix", the spiral forming the human DNA molecule is chemically made up of 2 strands of a sugar-phosphate backbone, running antiparallel. This backbone (skeleton) consists of a phosphate group (Phosphoric Acid, H3PO4) and a sugar group (Deoxyribose, C5H10O4) forming phosphodiester bonds between them, resulting in the Phosphate-Deoxyribose backbone. In between those 2 twisted strands of DNA there are complementary purine-pyrimidine base pairs holding the two strands. There are only 2 possible base pairs combinations:


Adenine (C5H5N5) and Thymine (C5H6N2O2)
Guanine (C5H5N5O) and Cytosine (C4H5N3O)


Those base pairs forming 2 (Adenine-Thymine) or 3 (Guanine-Cytosine) hydrogen bonds between each other. They are connected to the sugar molecule of the 2 backbone strands. The complex of a sugar-phosphate molecule together with a base molecule forms a "nucleotide". The chemical formula of Adenine, Thymine, Guanine and Cytosine (their molecular formula) is very similar and explains the binding process. The DNA structure model or double helix model of human DNA as we know it today (twisted ladder model or spiral staircase) was discovered in 1953 by James D. Watson and Francis Crick.
Human DNA Sequence
The human DNA sequence is unique for every individual person, even though it is nearly 99.9% identical. It's the tiny portion of only 0.1% of DNA that differentiates us from every other human and that contributes to our individual differences. These small variations in the human genome such as "Single Nucleotide Polymorphisms" (SNPs) and the "Variable Number Tandem Repeat" (VNTR) allow further analysis using DNA fingerprinting (DNA profiling) techniques. The results of this kind of human DNA analysis are being used for ancestry testing, paternity testing or especially for forensic criminal investigations. The later shows the increasingly important role of genetics in investigating crimes such as rape or murder; hence human DNA can act as evidence in court cases.

How Larger DNA Fragments Can Be Cloned?

Both λ phage vectors and the more commonly used E. coli plasmid vectors are useful for cloning DNA fragments up to ≈20 – 25 kb. However, cloning of much larger fragments is desirable for sequencing of extremely long DNAs such as the DNA in a eukaryotic chromosome. Also, because of the common occurrence of large introns in genes from higher eukaryotes, it is often necessary to clone DNA fragments greater than 25 kb in order to include an entire gene in one clone. Consequently, additional types of cloning vectors have been developed for cloning larger fragments of DNA.
One common method for cloning larger fragments makes use of elements of both plasmid and λ phage cloning. In this method, called cosmid cloning, recombinant plasmids containing inserted fragments up to 45 kb long can be efficiently introduced into E. coli cells. A cosmid vector is produced by inserting the COS sequence from λ phage DNA into a small E. coli plasmid vector about 5 kb long. Like other plasmid vectors discussed earlier, cosmid vectors contain a replication origin (ORI), an antibiotic-resistance gene (e.g., ampr), and a polylinker sequence containing numerous restriction-enzyme recognition sites.Next, the cosmid vector is cut with a restriction enzyme and then ligated to 35- to 45-kb restriction fragments of foreign DNA with complementary sticky ends. If the concentration of foreign DNA is high enough, the ligation reaction generates long DNA molecules containing multiple restriction fragments of the foreign DNA separated by the 5-kb cosmid DNA. These ligated DNA molecules, which resemble the concatomers that form during replication of λ phage in a host cell, can be packaged in vitro as described earlier.


General procedure for cloning DNA fragments in cosmid vectors. This procedure has the high efficiency associated with λ phage cloning and permits cloning of restriction fragments.
In the packaging reaction, the λ Nu1 and A proteins bind to COS sites in the ligated DNA and direct insertion of the DNA between two adjacent COS sites into empty phage heads. Packaging will occur so long as the distance between adjacent COS sites does not exceed about 50 kb (the approximate size of the λ genome). Phage tails then are attached to the filled heads, producing viral particles that contain a recombinant cosmid DNA molecule rather than the λ genome. When these virions are plated on a lawn of E. coli cells, they bind to phage receptors on the cell surface and inject the packaged DNA into the cells.
Since the injected DNA does not encode any λ proteins, no viral particles form in infected cells and no plaques develop on the plate. Rather, the injected DNA forms a large circular plasmid, composed of the cosmid vector and an inserted DNA fragment, in each host cell. This plasmid replicates and is segregated to daughter cells like other E. coli plasmids and the colonies that arise from transformed cells can be selected on antibiotic plates. The high efficiency of λ phage infection of E. coli cells makes cosmid cloning a practical method of generating plasmid clones carrying DNA fragments up to 45 kb long. Since many genes of higher eukaryotes are on the order of 30 – 40 kb in length, cosmid cloning increases the chances of obtaining DNA clones containing the entire sequences of genes.

cDNA Libraries Are Prepared from Isolated mRNAs

In higher eukaryotes, many genes are transcribed into mRNA only in specialized cell types. For example, mRNAs encoding globin proteins are found only in erythrocyte precursor cells, called reticulocytes. Likewise, the mRNA encoding albumin, the major protein in serum, is produced only in liver cells where albumin is synthesized. The specific DNA sequences expressed as mRNAs in a particular cell type can be cloned by synthesizing DNA copies of the mRNAs isolated from that type of cell, and then cloning the DNA copies in plasmid or bacteriophage λ vectors.
DNA copies of mRNAs are called complementary DNAs(cDNAs); clones of such DNA copies of mRNAs are called cDNA clones. In addition to representing only the sequences expressed as mRNAs in a particular cell type, cDNA clones lack the noncoding introns present in genomic DNA clones. Thus the amino acid sequence of a protein can be determined directly from the nucleotide sequence of its corresponding cDNA. Many genes in higher eukaryotes are too large to be included in a single λ clone because of their large introns. In contrast, all full-length cDNAs, containing the entire protein-coding sequence, can be included in a single λ clone. However, because of methodological difficulties, not all cDNA clones are full length when initially produced; to obtain a full-length cDNA, it often is necessary to isolate several overlapping cDNA clones and then ligate them at rare restriction sites. Just as a large collection of clones containing fragments of genomic DNA representing the entire genome of a species is called a genomic library, a large collection of cDNA copies of all the mRNAs in a cell type is called a cDNA library.
Isolation of mRNAs and Synthesis of cDNAs
The first step in preparing a cDNA library is to isolate the total mRNA from the cell type or tissue of interest. Nature has greatly simplified the isolation of eukaryotic mRNAs: the 3′ end of nearly all eukaryotic mRNAs consists of a string of 50 – 250 adenylate residues, the poly(A) tail. Because of their poly(A) tail, mRNAs can be easily separated from the much more prevalent rRNAs and tRNAs present in a cell extract by use of a column to which short strings of thymidylate (oligo-dTs) are linked to the matrix. When a cell extract is passed through an oligo-dT column, the mRNA poly(A) tails base-pair with the oligo-dTs, binding the mRNAs to the column. Since rRNAs, tRNAs, and other molecules do not bind to the column, they can be washed away. The bound mRNAs are recovered by elution with a low-salt buffer.

Isolation of eukaryotic mRNA by oligo-dT column affinity chromatography. Isolated cytoplasmic RNA consists mostly of ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs).
The enzyme reverse transcriptase, which is found in retroviruses, is then used to synthesize a strand of DNA complementary to each mRNA molecule. This enzyme can polymerize deoxynucleoside triphosphates into a complementary DNA strand using an RNA molecule as template. Like other DNA polymerases, reverse transcriptase can add nucleo-tides only to the 3′ end of a preexisting primer base-paired to the template. Added free oligo-dT serves this function by hybridizing to the 3′ poly(A) tail of each mRNA template.

Preparation of a bacteriophage λ cDNA library. A mixture of mRNAs, isolated as shown in Figure is used to produce cDNAs corresponding to all the cellular mRNAs.

Conversion of Single-Stranded cDNA to Double-Stranded cDNA
After cDNA copies of isolated mRNAs are synthesized, the mRNAs are removed by treatment with alkali, which hydrolyzes RNA but not DNA. The single-stranded cDNAs then are converted to double-stranded DNA molecules. To do this, the 3′ end of each cDNA strand is elongated by adding several residues of a single nucleotide (e.g., dG) through the action of terminal transferase, a unique DNA polymerase that does not require a template, but simply adds deoxynucleotides to free 3′ ends. A synthetic oligo-dC primer then is hybridized to this 3′ oligo-dG. DNA polymerase, which uses the oligo-dC as a primer, then is used to synthesize a DNA strand complementary to the original cDNA strand. These reactions produce a complete double-stranded DNA molecule corresponding to each of the mRNA molecules in the original preparation.Each double-stranded DNA, also called cDNA, contains an oligo-dC – oligo-dG double-stranded region at one end and an oligo-dT – oligo-dA double-stranded region at the other end.
Addition of Linkers and Incorporation of cDNA into a Vector
To prepare double-stranded cDNAs for cloning, short restriction-site linkers first are ligated to both ends. These are double-stranded DNA segments, usually ≈10 – 12 bp long, that contain the recognition site for a particular restriction enzyme. Restriction-site linkers are prepared by hybridizing chemically synthesized complementary oligonucleotides. The ligation reaction is carried out by DNA ligase from bacteriophage T4, which can join “blunt-ended” double-stranded DNA molecules lacking sticky ends. Although blunt-end ligation is relatively inefficient, the ligation reaction can be driven to completion by using high concentrations of linkers.
The resulting double-stranded cDNAs, which contain a restriction-site linker at each end, are treated with the restriction enzyme specific for the linker; this generates cDNA molecules with sticky ends at each end. To prevent digestion of any cDNAs that by chance have a recognition sequence for this restriction enzyme within the cDNA sequence, the mixture of double-stranded cDNAs is treated with the appropriate modification enzyme before addition of the linkers. This enzyme methylates specific bases within the restriction-site sequence, preventing the restriction enzyme from digesting the methylated sites.
The final step in construction of a cDNA library is ligation of the restriction-cleaved double-stranded cDNAs, which now have sticky ends, to plasmid or λ phage vectors that have been cut to generate complementary sticky ends. The recombinant vectors then are plated on a lawn of E. coli cells, producing a library of plasmid or λ clones.Each clone carries a cDNA derived from a single mRNA.

Sunday, December 19, 2010

How to Constructing DNA Libraries?

Most DNA cloning is done with E. coli plasmid vectors because of the relative simplicity of the cloning procedure. However, the number of individual clones that can be obtained by plasmid cloning is limited by the relatively low efficiency of E. coli transformation and the small number (only a few hundred) of individual transformed colonies that can be grown on a typical culture plate. These limitations make plasmid cloning of all the genomic DNA of higher organisms impractical. For example, ≈1.5 × 105 clones carrying 20-kb DNA fragments are required to represent the total human haploid genome, which contains ≈3 × 109 base pairs. Fortunately, cloning vectors derived from various bacteriophages have proved to be a practical means for obtaining the required number of clones to represent large genomes. A collection of clones that includes all the DNA sequences of a given species is called a genomic DNA library, or simply genomic library. Once a genomic library is prepared, it can be screened for clones containing a sequence of interest.

Bacteriophage λ Can Be Modified for Use as a Cloning Vector and Assembled in Vitro
Bacteriophage λ is probably the most extensively studied bacterial virus, and a great deal is known about its molecular biology and genetics. A λ phage virion has a head, which contains the viral DNA genome, and a tail, which functions in infecting E. coli host cells. When λ DNA enters the host-cell cytoplasm following infection, it undergoes either lytic or lysogenic growth. In lytic growth, the viral DNA is replicated and assembled into more than 100 progeny virions in each infected cell, killing the cell in the process and releasing the replicated virions. In lysogenic growth, the viral DNA inserts into the bacterial chromosome, where it is passively replicated along with the host-cell chromosome as the cell grows and divides.

The bacteriophage genome. (a) Electron micrograph of bacteriophage λ virion. The genome is contained within the head. (b) Simplified map of the λ phage genome.

The λ genes encoding the head and tail proteins as well as various proteins involved in the lytic and lysogenic growth pathways are clustered in discrete regions of the ≈50-kb viral genome. When bacteriophage λ is used as a cloning vector, it must be capable of lytic growth, but other viral functions are irrelevant. Consequently, the genes involved in the lysogenic pathway and other viral genes not essential for the lytic pathway are removed from the viral DNA and replaced with the DNA to be cloned. Up to ≈25 kb of foreign DNA can be inserted into the λ genome, resulting in a recombinant DNA that can be packaged in vitro to form virions capable of replicating and forming plaques on E. coli host cells.
During the in vivo assembly of λ virions within infected host cells, viral heads and tails initially are assembled separately, from multiple copies of the various proteins that compose these complex structures. Replication of λ DNA in a host cell generates long multimeric DNA molecules, called concatomers, that consist of multiple copies of the viral genome linked end to end and separated by specific nucleotide sequences called COS sites. Two λ proteins, designated Nu1 and A, bind to COS sites and direct insertion of the DNA lying between two adjacent COS sites into a preassembled head. This process results in the packaging of a single ≈50-kb λ genome from the multimeric concatomer into each preassembled head. Host-cell chromosomal DNA is not inserted into the λ heads because it does not contain any copies of the COS sequence. Once λ DNA is inserted into a preassembled λ head, the preassembled tail is attached, producing a complete virion.