10 June 2021

FAIR MagLab Data Empowers 'Data Users'

Novel unique molecular forms of proteins - aka proteoforms - were discovered in cancer cells when researchers reanalyzed data from the ICR facility's 21T FT-ICR instrument (right) with new software (left; figure adapted from Wenrong Chen and Xiaowen Liu. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. Journal of Proteome Research, 2021, vol. 20, pp. 261-269) Novel unique molecular forms of proteins - aka proteoforms - were discovered in cancer cells when researchers reanalyzed data from the ICR facility's 21T FT-ICR instrument (right) with new software (left; figure adapted from Wenrong Chen and Xiaowen Liu. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. Journal of Proteome Research, 2021, vol. 20, pp. 261-269)

A new type of MagLab user has emerged: A Data User – who accesses MagLab data from public data repositories to advance independent research goals. In this highlight, we feature Data Users working to develop software for improved identification of intact proteins by high resolution mass spectrometry (AKA - top-down proteomics). The original dataset was published in 2017 as a benchmark study on the performance of the 21T FT-ICR system for top-down proteomic analysis of colorectal cancer cells. It has since been cited in a poster and two papers for testing new data analysis algorithms and software packages, demonstrating the enhanced impact realized by of FAIR (Findable, Accessible, Interoperable, & Reusable) MagLab data.

What are the developments?

Researchers from multiple institutions independently harnessed data that had been previously collected at the MagLab by other researchers and stored under practices consistent with FAIR principles. The data was used to perform statistical analysis of fragmentation patterns to optimize search algorithms for identifying intact proteins from mass spectrometry data, demonstrate the discovery-mode workflow of MASH Explorer software, and test use of TopPG software to discover novel proteoforms involved in colorectal cancer.

Why is this important?

Reuse of the MagLab's Ion Cyclotron Resonance facility data improved understanding of protein fragmentation and aided the design and release of new algorithms and software tools. When the data were reanalyzed using databases created with TopPG, hundreds of previously unidentified proteoforms were discovered which might have direct clinical relevance to this colorectal cancer case. 'Data Users' demonstrate that MagLab data that is Findable, Accessible, Interoperable, and Reusable (FAIR) fosters knowledge, discovery, and innovation. As FAIR data practices grow, the impact of data generated at the MagLab will be amplified in a self-perpetuating cycle of new discoveries.

Why did they need the MagLab?

High quality data from intact proteins requires ultrahigh mass resolving power, mass accuracy, sensitivity, and spectral acquisition rate. The 21 T FT-ICR mass spectrometer provides all these capabilities, and this particular colorectal cancer dataset is gaining notoriety as a “gold standard” to test algorithm and software performance.

The tool they used

Facility: Ion Cyclotron Resonance (21 T FT-ICR)

Original Publication: Journal of Proteome Research 2017 16 (2), 1087-1096 DOI: 10.1021/acs.jproteome.6b00696

FAIR Data Set: Mass Spectrometry Interactive Virtual Environment (MassIVE) repository;Accession Number (ID) MSV000079978

Impacted products of research

Ruixiang Sun, Ruimin Wang, Hao Chi, Chao Liu, Simin He, Ying Ge. TP 654: "Statistical Fragmentation Pattern Discovery of Intact Proteins Based on Their Large-scale Top-down MS/MS Spectra". 65th ASMS Conference on Mass Spectrometry and Allied Topics, Indianapolis, Indiana, June 4-8 (2017).

Zhijie Wu, David S. Roberts, Jake A. Melby, Kent Wenger, Molly Wetzel, Yiwen Gu, Sudharshanan Govindaraj Ramanathan, Elizabeth F. Bayne, Xiaowen Liu, Ruixiang Sun, Irene M. Ong, Sean J. McIlwain, and Ying Ge. "MASH Explorer: A Universal Software Environment for Top-Down Proteomics." Journal of Proteome Research. 2020, vol 19, pp 3867-3876. DOI: 10.1021/acs.jproteome.0c00469

Wenrong Chen and Xiowen Liu. "Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry." Journal of Proteome Research. 2021, vol 20, pp 261-269. DOI: 10.1021/acs.jproteome.0c00369

Details for scientists

Funding

Creation of the original dataset was supported by the MagLab (G.S. Boebinger, NSF DMR-1644779, and the State of Florida) and by National Resource for Translational and Developmental Proteomics based at Northwestern University (N.L. Kelleher, NIH P41GM108569).

Data User research was funded by grants awarded to Ruixiang Sun1,3 (China '973' fund 2013CB911200, NSFC 31670837); Ying Ge2 (NIH R01GM125085, R01HL096971, GM117058, S10OD018475); Xiaowen Liu4 (NIH R01GM118470, R01GM125991, R01AI14625)

1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 2University of Wisconsin, Madison 3National Institute of Biological Sciences, Beijing, 4Indiana University—Purdue University Indianapolis; multiple departments


For more information, contact This email address is being protected from spambots. You need JavaScript enabled to view it..

Details

  • Research Area: Biochemistry
  • Facility / Program: ICR
  • Year: 2021
Last modified on 1 July 2021