Impact of Large Language Models on the Material Development Landscape

Cells and biological chain,molecules and abstract conception,3d rendering. Computer digital drawing.
Materials informatics applies data-driven strategies to materials R&D. Long before generative AI technology reached peak hype, it had a long history of success in this field. A common approach is to use machine learning models trained on databases of material structures and properties, which then capture the underlying structure-property relationship. By inverting these models with optimized properties, new potential materials can be suggested for further study. Large Language Models (LLMs) like the GPT3.5/4 models behind ChatGPT and Microsoft's Copilot use similar tactics to model language: in 2024, their power to enhance material development is becoming clear.
 
As detailed in IDTechEx's recent report, "Materials Informatics 2024-2034: Markets, Strategies, Players", a significant barrier to profitability in materials informatics software is the level of human involvement required in onboarding new clients to a platform and getting their data into a usable format. This can make the activities of a SaaS firm look more like a consulting outfit, reducing the capacity to scale. LLMs offer a lifeline here for software providers and end-users alike.
 
Enhancing the power of LLMs
 
Potential impacts of LLMs in materials informatics. Source: IDTechEx
 
Using retrieval-augmented generation (RAG), an LLM can be made to act as a subject matter expert by giving it access to a library of text and other data that it can query without the owner of the LLM being able to see this data. The analogy is turning an exam from closed- to open-book, with the model not being retrained on new data. This is the essential tool giving LLMs the power to transform materials informatics, with one key factor being the ability to set out approaches to solving materials informatics problems.
 
An early commercial example here comes from FEHRMANN MaterialsX, the materials technology division of a longstanding German alloy company. MaterialsX initially supplied around 40,000 pages of books, these and other specialist material development and alloy-related information to OpenAI's GPT-4 model through RAG, with many more added since. MaterialsX quoted a researcher at a German technical university who posed a complex technical alloys question to the model that took the team around ten days to answer: the model took only 30 seconds. The company says it can help set out an entire research methodology to solving alloy development problems, interfacing with other machine learning models and a range of datasets to suggest new material candidates. Following a similar example, RAG could be used to enhance an LLM's ability to understand any area of materials science, with the potential of customizing the information supplied to the LLM using the customer's own internal data.
 
Flattening learning curves
 
Using LLMs enhanced by RAG, barriers to entry in materials informatics can be reduced: instead of having to train materials scientists to use a new graphical user interface or use code to pose problems to a computer, natural language can become the interface instead. This could help increase the total addressable market for materials informatics firms: earlier-stage organizations and firms with smaller material development departments, for example, could suddenly become viable customers.
 
The role of a materials informatics firm is to connect the expertise of materials scientists and data scientists/engineers to drive material development. The Catalyst feature of Citrine Informatics' platform uses LLMs to ease this connection in many ways. One key facet is Catalyst Model Expert, which allows the use of natural language to inject knowledge of relationships between properties into machine learning models. This makes it easier for materials scientists to fully use their domain knowledge to get the best results out of materials informatics software.
 
Of course, all of these benefits are useless without a dataset to train models of material behavior on. Pulling together and cleaning data from a variety of sources is frequently a time-consuming element of materials informatics projects, especially given the difficulty of standardizing data in the materials industry. LLMs could help organizations here, too, as well by being used to build pipelines and extract data from the isolated Excel sheets and various cloud files that many materials firms still use to store data. Although manual verification is still an important step here, LLMs could provide a major tool to ease the data-cleaning dilemma in materials informatics.
 
Future outlook
 
Data security concerns form the major headwind against the adoption of LLMs in materials informatics for many organizations. One worry is that the providers of the LLMs could access proprietary data used in RAG. While one approach could be to use an open-source LLM running locally, matching the capabilities of proprietary models here would likely be difficult.
 
The challenge for materials informatics SaaS players and LLM providers alike is to reassure their customers of their data security practices. Given the news of accelerating adoption of LLMs in other data-sensitive industries, like the collaboration between PwC, OpenAI, and Harvey to train and deploy foundation models for tax, legal, and HR applications, it seems likely that trust here may grow over time.
 
Overall, it's clear that LLMs will have a significant effect on the materials informatics market, making software easier to use, improving machine learning models' incorporation of materials scientists' knowledge, and easing the process of gathering data. These represent a small selection of the benefits, with the true effects of these tools expected to emerge in the next few years.
 
Further insights
 
IDTechEx's report, "Materials Informatics 2024-2034: Markets, Strategies, Players", is now in its fourth edition since IDTechEx began covering the field in 2020. Informed by first-hand interviews with the industry's major players, the report provides market forecasts, player profiles, investments, roadmaps, and comprehensive company lists, making this essential reading for anyone wanting to get ahead in this field.
 
To find out about this IDTechEx report, including downloadable sample pages, please visit www.IDTechEx.com/MaterialsInformatics.
 
For the full portfolio of advanced materials and critical minerals market research from IDTechEx, please see www.IDTechEx.com/Research/AM.
 
IDTechEx provides trusted independent research on emerging technologies and their markets. Since 1999, we have been helping our clients to understand new technologies, their supply chains, market requirements, opportunities and forecasts. For more information, contact research@IDTechEx.com or visit www.IDTechEx.com.