Identification and characterization of an IgG sequence variant with an 11 kDa heavy chain C-terminal extension using a combination of mass spectrometry and high-throughput sequencing analysis

MAbs. 2019 Nov-Dec;11(8):1452-1463. doi: 10.1080/19420862.2019.1667740. Epub 2019 Oct 1.

Abstract

Protein primary structure is a potential critical quality attribute for biotherapeutics. Identifying and characterizing any sequence variants present is essential for product development. A sequence variant ~11 kDa larger than the expected IgG mass was observed by size-exclusion chromatography and two-dimensional liquid chromatography coupled with online mass spectrometry. Further characterization indicated that the 11 kDa was added to the heavy chain (HC) Fc domain. Despite the relatively large mass addition, only one unknown peptide was detected by peptide mapping. To decipher the sequence, the transcriptome of the manufacturing cell line was characterized by Illumina RNA-seq. Transcriptome reconstruction detected an aberrant fusion transcript, where the light chain (LC) constant domain sequence was fused to the 3' end of the HC transcript. Translation of this fusion transcript generated an extended peptide sequence at the HC C-terminus corresponding to the observed 11 kDa mass addition. Nanopore-based genome sequencing showed multiple copies of the plasmid had integrated in tandem with one copy missing the 5' end of the plasmid, deleting the LC variable domain. The fusion transcript was due to read-through of the HC terminator sequence into the adjacent partial LC gene and an unexpected splicing event between a cryptic splice-donor site at the 3' end of the HC and the splice acceptor site at the 5' end of the LC constant domain. Our study demonstrates that combining protein physicochemical characterization with genomic and transcriptomic analysis of the manufacturing cell line greatly improves the identification of sequence variants and understanding of the underlying molecular mechanisms.

Keywords: Fc-extension; LC/MS; RT-PCR; aberrant fusion protein; alternative splicing; expression vector; high throughput sequencing; monoclonal antibody; nanopore sequencing; sequence variant; splice variants.

MeSH terms

  • Animals
  • Antibodies, Monoclonal* / chemistry
  • Antibodies, Monoclonal* / genetics
  • Antibodies, Monoclonal* / immunology
  • CHO Cells
  • Chromatography, Liquid
  • Cricetulus
  • High-Throughput Nucleotide Sequencing
  • Immunoglobulin G* / chemistry
  • Immunoglobulin G* / genetics
  • Immunoglobulin G* / immunology
  • Immunoglobulin Heavy Chains* / chemistry
  • Immunoglobulin Heavy Chains* / genetics
  • Immunoglobulin Heavy Chains* / immunology
  • Mice
  • Protein Domains
  • Tandem Mass Spectrometry

Substances

  • Antibodies, Monoclonal
  • Immunoglobulin G
  • Immunoglobulin Heavy Chains