Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants

Shifeng Cheng; Bernard Gutmann; Xiao Zhong; Yongtao Ye; Mark F Fisher; Fengqi Bai; Ian Castleden; Yue Song; Bo Song; Jiaying Huang; Xin Liu; Xun Xu; Boon L Lim; Charles S Bond; Siu-Ming Yiu; Ian Small

doi:10.1111/tpj.13121

Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants

Plant J. 2016 Feb;85(4):532-47. doi: 10.1111/tpj.13121.

Authors

Shifeng Cheng^{1

2

3}, Bernard Gutmann⁴, Xiao Zhong², Yongtao Ye¹, Mark F Fisher⁵, Fengqi Bai², Ian Castleden⁴, Yue Song², Bo Song², Jiaying Huang², Xin Liu², Xun Xu², Boon L Lim^{1

3}, Charles S Bond⁵, Siu-Ming Yiu¹, Ian Small⁴

Affiliations

¹ HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, Department of Computer Science, The University of Hong Kong, Hong Kong, China.
² BGI-Shenzhen, Shenzhen, 518083, China.
³ School of Biological Sciences, The University of Hong Kong, Pokfulam, Hong Kong, China.
⁴ Australian Research Council Centre of Excellence in Plant Energy Biology, University of Western Australia, Crawley, 6009, Australia.
⁵ School of Chemistry and Biochemistry, The University of Western Australia, Crawley, Western Australia, Australia.

PMID: 26764122
DOI: 10.1111/tpj.13121

Abstract

The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.

Keywords: RNA binding; RNA editing; genome annotation; pentatricopeptide repeat motifs; pentatricopeptide repeat proteins; structural modelling.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Motifs
Amino Acid Sequence
Embryophyta / genetics*
Embryophyta / metabolism
Mitochondria / metabolism
Models, Molecular
Models, Structural*
Molecular Sequence Annotation
Plant Proteins / chemistry*
Plant Proteins / genetics
Plant Proteins / metabolism
Plastids / metabolism
Protein Transport
RNA Editing / genetics*
RNA Recognition Motif Proteins / chemistry
RNA Recognition Motif Proteins / genetics
RNA Recognition Motif Proteins / metabolism
RNA, Plant / genetics
Sequence Alignment

Substances

Plant Proteins
RNA Recognition Motif Proteins
RNA, Plant