Sorex araneus

Genome assembly: SorAra2.0 (GCA 000181275.2)

This release features the second assembly of the common shrew (Sorex araneus) genome, SorAra2.0, provided by the Broad Institute in August 2012. There are 12,845 scaffolds in this assembly, representing sequence contigs that can be ordered and oriented with respect to each other or isolated contigs that could not be linked. The total length of the assembly is 2.42Gb with 231Mb of gaps. The N50 of the contigs is 22.6Kb and the N50 of the scaffolds is 22.8Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Display your data in Pre

Gene annotation

What can I find? Protein-coding and non-coding genes, splice variants, cDNA and protein sequences, non-coding RNAs.

Preliminary gene annotation in shrew has been generated by alignments of proteins from two different sources: Ensembl human translations from October 2012 genebuild (Ensembl release 69, GRCh37 assembly) as well as a single shrew-specific proteins obtained from UniprotKB. Of the 20,454 human translations 16,209 aligned with a hit coverage of > 50% and a percent identity of > 50%. The single shrew protein also successfully aligned. This gave a total of 16,209 gene models based on the human data and a single model from the shrew protein.

In addition to these models, alignments of sequences from UniProt, UniGene and the ENA vertebrate RNA collection are also provided.


This species currently has no variation database. However you can process your own variants using the Variant Effect Predictor:

Variant Effect Predictor

Genome statistics

Assembly: SorAra2.0, May 2012
Database version: 70
Base Pairs: 2,192,103,426
Golden Path Length: 2,423,158,183
Genscan gene predictions: 51,734