Talk Intermediate

AI-Driven Drug Repurposing Using Multi-Modal Deep Learning & Graph Neural Networks

Approved

Session Description

Drug repurposing offers a promising path to accelerate therapeutic discovery by identifying
new uses for approved drugs, thereby reducing cost, risk, and development timelines (1, 2). We
present a multi-modal, uncertainty-aware deep learning framework that integrates structured
biomedical knowledge graphs, unstructured literature, and molecular structure data to predict
drug potency and prioritize repurposing candidates. Our system, centered on the Calibrated-
StoNet architecture, fuses embeddings from relational graph neural networks (GNNs), BioBERT
(9), and MolBERT (8), and models predictive uncertainty using both heteroscedastic loss and
Monte Carlo Dropout. An Engression module further calibrates uncertainty by modeling full
conditional distributions. Evaluated on benchmark datasets, our model achieves strong regres-
sion performance (RMSE: 0.85), robust uncertainty calibration (ECE: 4.3%), and perfect top-10
ranking accuracy for candidate selection. These results demonstrate the value of principled un-
certainty estimation and multi-source data fusion for high-confidence, interpretable AI-driven
drug repurposing.

Drug repurposing — identifying new therapeutic uses for existing drugs — is an efficient strategy to reduce the time, cost, and risk associated with traditional drug discovery pipelines. In this talk, I’ll present an AI-based framework that leverages the power of open-source tools and multi-modal data fusion to predict drug potency and prioritize repurposing candidates with high confidence.

Our approach integrates three diverse data modalities: (1) structured biomedical knowledge graphs (e.g., drug-disease-gene relationships), (2) unstructured literature (scientific publications and clinical studies), and (3) molecular structure data. The model architecture — based on a calibrated, uncertainty-aware deep learning stack — fuses embeddings from Relational Graph Neural Networks (R-GNNs), BioBERT (for literature-based context), and MolBERT (for SMILES-based molecular representations).

A key innovation is the use of uncertainty modeling to improve interpretability and trustworthiness of predictions. We incorporate heteroscedastic loss functions and Monte Carlo Dropout to estimate predictive variance, while an additional “Engression” module calibrates these uncertainty estimates by modeling full conditional distributions. This ensures not only accuracy but also well-calibrated confidence levels — critical for high-stakes domains like drug discovery.

All components are built using open-source scientific computing tools in Python, including PyTorch, DGL, HuggingFace Transformers, and Scikit-learn, allowing for full reproducibility and community-driven development. Our model is evaluated on benchmark datasets, showing promising results: strong regression accuracy (RMSE: 0.85), robust uncertainty calibration (ECE: 4.3%), and perfect top-10 ranking precision in repurposing candidate identification.

This work highlights the value of principled uncertainty estimation and data integration across modalities for biomedical AI. The talk will also cover challenges in working with heterogeneous biomedical data, lessons learned in model calibration, and future directions for extending this research in the open-source ecosystem.

Attendees interested in machine learning, graph modeling, or scientific applications of NLP will gain insights into how FOSS tools can be orchestrated to tackle impactful real-world problems in healthcare and drug discovery.

Key Takeaways

• Learn how multi-modal AI models can combine structured knowledge graphs, scientific literature, and molecular data to predict drug repurposing opportunities.

• Understand the importance of uncertainty modeling in biomedical AI, and how techniques like heteroscedastic loss, Monte Carlo Dropout, and Engression improve trust and interpretability.

• Explore how open-source Python tools (e.g., PyTorch, DGL, HuggingFace, Scikit-learn) enable scalable and reproducible pipelines for drug discovery research.

• Gain practical insights into designing and evaluating uncertainty-aware regression models for high-stakes scientific domains.

• Discover how principled model calibration can improve decision-making confidence in applications involving noisy or limited biomedical data.

• Take away a reusable architecture and workflow that can be adapted for similar AI-for-science challenges, including material science, genomics, or healthcare.

References

https://arxiv.org/abs/2011.13230

https://arxiv.org/abs/1901.08746

Session Categories

Other

Knowledge Commons (Open Hardware, Open Science, Open Data etc.)