TL;DR, I throw a bunch of molecules at a pile of linear algebra, and hope predicted values line up with known experimental values; then I use the pile of linear algebra on novel molecules.
There's a bit more to it than that, like how to represent molecules in a computer-readable format, generating additional input variables (molecular characteristics), input variable down-selection and/or dimensionality reduction, the specific ML models we use (feed-forward MLPs and graph convolution nets), and how to interpret results as they relate back to combustion.
From a broad perspective, our work is just a small part of a larger push from the Department of Energy to find economically-viable alternative liquid fuels. ML speeds up the process of screening candidate molecules, for example those found in bio-oil resulting from pyrolizing and catalytically-upgrading lignocellulosic biomass or other renewable sources. Our colleagues don't have to synthesize large samples of many molecules just to test their properties and determine how they will behave in existing engines (a very costly and time-consuming process), instead we predict the properties and behaviors to highlight viable candidates so our colleagues can focus on analyzing those.
These papers (1, 2, 3) best outline the procedures and motivations for this work. PM me if you can't get access and I'll send you them!
Thanks for sharing! These seem to focus on LLMs/transformers, but since they use MLPs I should be able to find a way to adapt them for my use!