Institute of Organic Chemistry
and Biochemistry of the CAS

Take the step towards a career in organic chemistry and biochemistry...

Publications

All publications
Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS
Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS
Nature Biotechnology 2025: Early View
Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the emergence of rich representations of molecular structures, which we named Deep Representations Empowering the Annotation of Mass Spectra (DreaMS). Further fine-tuning the neural network yields state-of-the-art performance across a variety of tasks. We make our new dataset and model…
Nucleophilic aromatic substitutions enable diversity-oriented synthesis of heterocyclic atropisomers via non-atropisomeric intermediates
Nature Communications 16: 4856 (2025)
Multiscale Computational Protocols for Accurate Residue Interactions at the Flexible Insulin–Receptor Interface
Journal of Chemical Information and Modeling 2025: Early View
Conformational landscape of the mycobacterial inosine 5′-monophosphate dehydrogenase octamerization interface
Journal of Structural Biology 217 (2): 108198 (2025)

Didn’t find what you expected?