Human

Homo sapiens

Genome assembly: GRCh38 (GCA 000001405.15)

This is the preliminary display of the GRCh38 assembly of the human genome (Homo sapiens, GCA_000001405.15), produced in December 2013 by the Genome Reference Consortium. It consists of 24 chromosomes (1-22, X and Y), 127 unplaced scaffolds and 42 unlocalized scaffolds. GRCh38 contains 261 alt loci scaffolds (including haplotypes for the MHC region on chromosome 6 and LRC region on chromosome 19), in 35 alternate assembly units. 72 of these alternate loci were previously available as NOVEL patches to GRCh37.

The N50 of the contigs of the submitted assembly is 56.4 Mb and the N50 of the scaffolds is 67.8 Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. Modeled centromere sequences have been incorporated.

Display your data in Pre

Gene annotation

What can I find? Protein-coding and non-coding genes, splice variants, cDNA and protein sequences, non-coding RNAs.

Preliminary transcript structures based on available human protein sequences are shown along with structures based on projections from Ensembl release 75 human gene set (GRCh37) and RefSeq genes from February 2014. In addition, alignments of human cDNA and EST sequences are provided as well as ab initio predictions and alignments of sequences from UniProt, UniGene and the ENA vertebrate RNA collection.

Genome statistics

Assembly: GRCh38, Dec 2013
Database version: 75
Base Pairs: 3,381,944,039
Golden Path Length: 3,099,750,718
Preliminary transcript models: 128,417
Imported RefSeq genes: 26,670
Human cDNAs: 177,949
Human ESTs: 4,502,566
GENCODE 19 genes projected from GRCh37: 61,349
Genscan gene predictions: 50,117
Projected short variants: 67,990,369

Variation

What can I find? Short sequence variants from Ensembl release 75 projected to the GRCh38 assembly. In addition, variation consequences for variants overlapping transcripts that have been projected from Ensembl release 75 to the new assembly and variation consequences based on RefSeq transcripts are provided. Variation data is available as GVF and VCF data dumps.