At the UCF Institute of Artificial Intelligence (IAI), researchers have developed MolVision, a new artificial intelligence (AI) vision language (VLM) model capable of accurately viewing a molecule’s structure. The project was launched from a bold idea, to make AI models learn scientific principles the same way students do. Leading the study is Assistant Professor of Materials Science and Engineering Shruti Vyas. The MolVision research team includes Associate Professor of Computer Science and IAI member Yogesh Singh Rawat and Deepan Adak, a researcher from the National Institute of Technology, Kurukshetra.
“AI should learn chemistry the way humans do — by seeing molecular structures, not just reading linear strings,” Vyas says. “While large language models have shown promise for molecular property prediction, their reliance on representations like SMILES or SELFIES [textual representations] limits their ability to capture the rich structural cues chemists rely on.”
According to Vyas, this work opens a new pathway for chemical predictions and molecular analysis, by creating an AI system that operates more intuitively.
A Challenging Vision
According to Vyas, one of the biggest challenges facing the field of artificial intelligence and computer vision is in shifting AI models from a textual to a visual understanding of chemical reactions.
“Molecular images represent a very different data domain compared to the natural images or text that vision-language models are typically trained on.” Vyas says, “Molecules contain highly specific structural relationships — bonding patterns, stereochemistry, and functional group arrangements — that are subtle yet crucial for property prediction.”
Many VLM models have limited exposure to visual representations of scientific data, which makes training and adapting them to understand the nuances of molecules and their atomic structure a primary challenge.
Transforming How Scientists and AI See Chemistry
To address these challenges, Vyas and her research team developed a multi-modal data set for MolVision to refer to during its training. The data set pairs 2D diagrams with text-based descriptions on a variety of molecules and different atomic structures. Using this data set was crucial for training the MolVision VLM to integrate textual and visual information effectively. Using a LoRA (low rank adaptation) algorithm, the MolVision VLM is able to engage in billions of parameters worth of data enabling it to complete complex tasks such as molecular property prediction or chemical description without the cost of full retraining.
“Recent advances in vision–language models have transformed how AI understands the world, but most of that progress has focused on natural images and everyday language,” says Yogesh Singh Rawat. “With MolVision, we’re bringing those same AI capabilities into chemistry — allowing models to reason about molecules visually, in ways that are much closer to how scientists actually think.”
This work has the potential to transform drug discovery, the personalization of medicine, and even sustainable design and engineering. The research team also expects that “over the next few years we can expect this multimodal approach to reduce experimental screening burdens, support faster identification of promising drug candidates and materials, and offer more interpretable insights into structure-property relationships,” Vyas says.
Vyas and her team here at UCF plan to scale up the MolVision VLM project in terms of its data set and capabilities. The team plans to integrate the VLM model in chemistry with technologies using current AI neural networks and large molecular simulators to create hybrid systems that can combine symbolic, visual and physical reasoning.
Vyas will also participate in the upcoming SPARK STEM Fest at the Orlando Science Center where she will be presenting an exhibit on AI for chemistry and molecules. Those interested in viewing the exhibit can attend from 7:45 to 11:00 p.m. this Saturday on 4th Floor.