Safetensors

1. Model Overview

  • Model Name: MMPT-FM & its MMP variants
  • Summary: MMPT-FM (Matched Molecular Pair Transformation Foundation Model) and its MMP (Matched Molecular Pair) variants – MMP-M2M (molecule-to-molecule), MMP-M2T (molecule-to-transformation), MMP-C2V (constant-to-variable) – are generative foundation model designed to support medicinal chemistry analog design. The model learns from matched molecular pair transformations (MMPTs) or MMPs, i.e., context-independent variable-to-variable chemical modifications or matched molecular pairs derived from large-scale matched molecular pair data. This formulation enables scalable, interpretable, and generalizable encoding of medicinal chemistry intuition across diverse chemical series.
  • Model Specification: Encoder–decoder Transformer. 220M parameters for each model.
  • Developed by: Merck & Co., Inc. (Rahway, NJ, USA) and Emory University.
  • License: MIT license.
  • Base Model: ChemT5 (chemistry-domain pretrained T5).
  • Model Type: Transformer
  • Languages: SMARTS & SMILES (chemical substructure representation)
  • Pipeline Tag: text2text-generation for MMP transformation
  • Library: Transformers, PyTorch

2. Intended Use

  • Direct Use:
    • MMPT-FM:
      • Generation of chemically valid matched molecular pair transformations (MMPTs)
      • Analog design at a user-specified edit site.
    • MMP-M2M:
      • Generation of chemically valid matched molecular pairs (MMPs)
    • MMP-M2T:
      • Generation of chemically valid matched molecular pair transformations
      • Analog design at a user-specified edit site
    • MMP-C2V:
      • Analog design at a user-specified edit site
  • Downstream Use:
    • MMPT-FM:
      • Integration into analog enumeration pipelines
      • Integration into high-throughput virtual screening pipelines
      • Serve as the base model for retrieval-augmented generation (MMPT-RAG).
    • MMP-M2M:
      • Integration into analog enumeration pipelines
      • Integration into high-throughput virtual screening pipelines
    • MMP-M2T:
      • Integration into analog enumeration pipelines
      • Integration into high-throughput virtual screening pipelines
    • MMP-C2V:
      • Integration into analog enumeration pipelines
      • Integration into high-throughput virtual screening pipelines

3. Bias, Risks, and Limitations

  • Known Limitations: The models rely on the availability and coverage of large historical transformation datasets, and its performance may vary in underrepresented chemical domains.
  • Biases: Inherits biases from ChEMBL-derived medicinal chemistry literature.
  • Risk Areas: Our framework is intended for research use and does not introduce specific ethical concerns.
  • Recommendations: None

4. Training Details

5. Evaluation

  • Metrics:
    • Validity
    • Novelty (Novel/valid, Novel/all)
    • Recall (overall, in-training, out-of-training)
  • Benchmarks:
    • Held-out ChEMBL MMPT test set (in-distribution)
    • Within-patent analog generation (PMV17)
    • Cross-patent analog generation (PMV17 → PMV21)
  • Testing Data: Patent-derived datasets from PMV Pharmaceuticals (2017, 2021)

6. Usage

7. Citation

BibTeX:

@article{pang2026scalable,
  title={Scalable and Generalizable Analog Design via Learning Medicinal Chemistry Intuition from Matched Molecular Pair Transformations},
  author={Pang, Hao-Wei and Zhang, Peter Zhiping and Pan, Bo and Zhao, Liang and Yu, Xiang and Zhang, Liying},
  journal={ChemRxiv},
  doi={10.26434/chemrxiv.15001722},
  year={2026}
}

@article{pan2026retrieval,
  title={Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition},
  author={Pan, Bo and Zhang, Peter Zhiping and Pang, Hao-Wei and Zhu, Alex and Yu, Xiang and Zhang, Liying and Zhao, Liang},
  journal={arXiv preprint arXiv:2602.16684},
  year={2026}
}

@article{pan2026transformer,
  title={Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds},
  author={Pan, Bo and Zhang, Zhiping and Spiekermann, Kevin and Chen, Tianchi and Yu, Xiang and Zhang, Liying and Zhao, Liang},
  journal={arXiv preprint arXiv:2601.07930},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for Merck/MMPT-FM