Goal and current status
The overall aim of the Human Protein Atlas project is to map the expression of all human proteins in normal tissues, cancers and cell lines through a systems biology approach that integrate various omics technologies, including genomics, transcriptomics and antibody-based proteomics. The strategy to meet this aim involves large-scale, high throughput generation and validation of antibodies against at least one isoform of all the roughly 20.000 protein-coding genes in the human genome, and to use these antibodies in a variety of applications, e.g. immunohistochemical staining of normal tissues and cancers, immunofluorescent staining of cell lines and Western blot analysis. The effort to map the human proteome can be considered as a natural progression of the human genome project, and the Human Protein Atlas project is to a large extent based on the available information derived from the human genome.
The Human Protein Atlas project was initiated in 2003 and is funded by the Knut and Alice Wallenberg foundation. The first on-line version of the Atlas was made public in 2005, and then contained data from a little over 700 antibodies. In the latest version of the Human Protein Atlas (v.19, released September 2019), the expression profiles of more than 17.000 human proteins have been analyzed using over 26.000 antibodies, which corresponds to 87% of the human protein-coding genome.
Antigens and antibodies
As part of the Human Protein Atlas effort, more than 50,000 polyclonal antibodies towards various recombinant human Protein Epitope Signature Tag (PrEST) protein fragments (100-150 amino acids) have been generated. Each antibody has been affinity purified using the recombinant protein fragment as capture ligand, to ensure that all antibodies in the pool bind to epitopes present on the target protein. In addition to the in-house antibodies, the Human Protein Atlas has access to >12,000 antibodies from >100 commercial vendors, kindly provided as part of collaboration agreements.
Generation of data for Tissue Atlas, Pathology Atlas and the other sub-atlases
All antibodies that have been approved for protein expression profiling are stained on a series of 8 different tissue microarrays (TMAs) that altogether contain samples from 44 normal human tissues (in triplicate) and 20 different forms of cancer (typically 12 different patients per cancer form, in duplicate). This amounts to a total of 576 stained samples for each antibody, which are then scanned as high-resolution digital images. The tissue images are annotated manually by specially educated personnel, in order to determine the staining pattern of the evaluated antibodies.
After the annotation step, the quality and correlation of all available data regarding the gene of interest is taken into consideration to manually set a reliability score for a generated protein expression profile. The following data is evaluated:
Annotated staining patterns of all antibodies targeting the gene of interest.
Consensus normalized mRNA expression levels of each tissue/organ.
Available gene-related scientific literature.
Finally, the protein expression profile, along with all images, antibody annotation data and antibody information, are published in the up-coming version of The Human Protein Atlas database (www.proteinatlas.org).
In the last couple of years, large volumes of transcriptomics data have been generated and imported into the Human Protein Atlas database. Expression levels of mRNA for all protein coding genes in most human organs and tissues, originating from three projects (HPA, GTEx, FANTOM5), are now available in the Tissue Atlas. The mRNA data from the three projects have also been merged to form Consensus Normalized eXpression (NX) levels. All genes have been categorized based on their NX-based mRNA expression according to the degree of specificity and distribution of the expression in the different organs, tissues and cell types of the human body.
In the Pathology Atlas, mRNA expression data of each gene in 17 different main forms of cancer have been imported from The Cancer Genome Atlas (TCGA). The cancer-related mRNA expression of each gene is categorized according to distribution of the expression pattern across cancer types, as well as its association with cancer patient survival, visualized in Kaplan-Meier plots. Each gene is further categorized as favourable or unfavourable for each cancer type if the survival association is statistically significant (p<0.001). Other genes are labeled non-prognostic.
In parallel to the immunohistochemical analysis of tissues, immunofluorescence analyses of the protein's subcellular expression pattern is performed on 64 cell lines using confocal microscopy. The subcellular location(s) is manually annotated and the data and images are published in the Cell Atlas.
The Brain Atlas combines mRNA expression data and antibody-based protein localization to map the gene expression in the different regions of the mammalian brain. RNA-seq transcriptomics data for each gene in 10 different regions of the brain in human, pig and mouse, is combined with high-resolution spatial antibody-based protein expression data in mouse brain (whole mouse brain serially aligned sections) and human brain (tissue images/data from Tissue Atlas).
Different types of analyses have been combined to explore the gene expression of the cell types of human blood, plasma protein concentrations and the human secretome. The results are presented in the Blood Atlas. Transcriptomics data from three different projects (HPA, DICE, Monaco) of mRNA expression in different types of blood cells have been generated through a combination of cell sorting and RNA-seq. The blood cell type expression data is supplemented with proteomics data in the form of plasma and protein concentrations generated through mass spectrometry and/or antibody-based immunoassays. In addition to analysis of blood cells and plasma protein levels, distribution of secreted proteins in the human body, The Human Secretome, have been mapped and published in the Blood Atlas. The final location of secreted proteins have been annotated for 2641 candidate proteins through the use of available scientific literature.
Metabolic Atlas is an expansion of the Tissue Atlas, imported from metabolicatlas.org to enable exploration of gene expression and protein function in the context of the human metabolic network. Manually curated maps are available for over 120 different metabolic pathways or subsystems, each depicting the association of proteins with the involved biochemical reactions. For proteins involved in metabolism, a metabolic summary is provided that describes the metabolic subsystems/pathways, subcellular compartments, and number of reactions associated with the protein. Each metabolic pathway map can additionally be explored in its entirety together with a heatmap showing the mRNA expression (NX) of all proteins associated with the pathway across 37 different tissue types.
All data is freely and publically available on The Human Protein Atlas website (www.proteinatlas.org), a database with >150.000 unique visitors per month which is updated with new data and functionalities on a yearly basis. Data regarding a protein of interest can be accessed through an initial search. Possible search queries include the name of the protein of interest (simple search), but also conditional advanced queries which include or exclude tissue types, cell types, cell lines, protein classes, subcellular location, etc. A query using the search fields (Fig 2A) leads to a search result, summarizing the main findings of each gene matching the query (Fig 2B). Clicking on a certain gene leads to a gene-centered summary page that provides a general and large-scale overview on expression pattern both at the mRNA and protein level. From the summary page, one is then able to browse deeper into the specific expression patterns of all tissues (Tissue Atlas), cancers (Pathology Atlas) and cells (Cell Atlas), and to navigate among the images that form the basis for the annotation, as if one was looking through a virtual microscope. In addition, transcript and protein level expression data are provided for blood cells and plasma (Blood Atlas) and mammalian brain regions of human, pig and mouse brains (Brain Atlas), as well as metabolic data for genes involved in metabolism (Metabolic Atlas).
In addition to search queries, it is possible to navigate the Human Protein Atlas data by browsing knowledge pages showing comprehensive summaries of e.g. cell type or organ specific proteomes (Fig 2C), with clickable charts and examples. The database also provides >20 downloadable datasets for large-scale bioinformatic analyses or programmatic access to all Human Protein Atlas data.