Using Deep Learning to Extrapolate Protein Expression Measurements

Mitra Parissa Barzine; Karlis Freivalds; James C Wright; Mārtiņš Opmanis; Darta Rituma; Fatemeh Zamanzad Ghavidel; Andrew F Jarnuczak; Edgars Celms; Kārlis Čerāns; Inge Jonassen; Lelde Lace; Juan Antonio Vizcaíno; Jyoti Sharma Choudhary; Alvis Brazma; Juris Viksna

doi:10.1002/pmic.202000009

Using Deep Learning to Extrapolate Protein Expression Measurements

Proteomics. 2020 Nov;20(21-22):e2000009. doi: 10.1002/pmic.202000009. Epub 2020 Oct 16.

Authors

Mitra Parissa Barzine¹, Karlis Freivalds^{2

3}, James C Wright⁴, Mārtiņš Opmanis², Darta Rituma^{2

3}, Fatemeh Zamanzad Ghavidel⁵, Andrew F Jarnuczak¹, Edgars Celms^{2

3}, Kārlis Čerāns^{2

3}, Inge Jonassen⁵, Lelde Lace^{2

3}, Juan Antonio Vizcaíno¹, Jyoti Sharma Choudhary⁴, Alvis Brazma¹, Juris Viksna^{2

3}

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
² Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
³ Faculty of Computing, University of Latvia, Riga, LV1586, Latvia.
⁴ Institute of Cancer Research, London, SW3 6JB, UK.
⁵ Computational Biology Unit, Informatics Department, University of Bergen, Bergen, NO5020, Norway.

Abstract

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average $R^{2}$ scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a $R^{2} = 0.51$ when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.

Keywords: Gene Ontology; UniProt keywords; deep learning networks; mass spectrometry; protein abundance prediction.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Deep Learning*
Mass Spectrometry
Mice
Molecular Sequence Annotation
Proteins
Proteomics

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding