Session: Artificial Intelligence and Machine Learning Models
Paper Number: 158128
158128 - Credibility Assessment Framework for Machine Learned Models, Revisited
Abstract:
With the rapid advancement of Artificial Intelligence (AI), significant attention has been directed toward developing robust and efficient methods for evaluating the credibility of machine learning (ML) models. This topic has been a key focus of the ASME’s VVUQ-70 subcommittee since its inception. Recently, the subcommittee has shifted its priorities toward creating a Credibility Assessment Framework (CAF) specifically for ML models. However, this endeavor faces several critical challenges.
First, since ML models can be applied across a vast array of domains, how can a single framework remain both broadly applicable and specifically useful for individual models? Second, while data-driven modeling has been employed by engineers for over a century, why does ML modeling introduce unique challenges? Third, given the existence of credibility frameworks for physics-based models, it remains unclear whether ML models require a distinct framework, and, if so, why physics-based and ML models should be treated differently. Finally, most current credibility frameworks rely heavily on prior experience and expert judgment. While this approach leverages historical knowledge, it is inherently "data-driven," potentially limiting its ability to identify novel errors. An ideal framework would adopt a more "physics-based" approach, representing all possible modeling errors independently of whether those errors have been encountered before.
The ML CAF proposed in this work aims to address these challenges. First, the framework was designed specifically for ML models used to predict physical values, thereby narrowing its scope to models of particular interest to engineers in scientific computing. This approach also acknowledges the shared characteristics of such models across engineering disciplines. Second, the framework explicitly defines foundational assumptions, which, though previously unstated due to their apparent obviousness, provide clarity. This explicitness establishes a clear relationship between modern ML models and historical data-driven models, highlighting both their similarities and differences. Third, the framework was built with the assumption that any framework that could be used to determine the credibility of an ML could also be used to determine the credibility of a physics-based model. That is, the framework treats physics-based models as a special case of data-driven models, a case where more information is known than the current data reveals.
Finally, to avoid a purely "data-driven" development process, the framework was constructed in a manner akin to deriving a mathematical theorem. Starting with a set of axioms, the logical progression from these axioms was rigorously analyzed. The axioms articulate the fundamental truths required of a credible ML model, and the framework assesses each axiom to determine whether it can be reasonably demonstrated as true. In doing so, this proposed ML CAF provides a systematic and rigorous approach to assessing model credibility, addressing critical gaps in the existing literature and offering a foundation for future advancements.
Presenting Author: Joshua Kaizer U.S. Nuclear Regulatory Commission
Presenting Author Biography: Dr. Joshua Kaizer is a Senior Nuclear Engineer at the U.S. Nuclear Regulatory Commission where he has been focused on the review of Verification, Validation, and Uncertainty Quantification (VVUQ) analyses that support reactor safety models and simulations since 2006. He has performed over 30 such reviews with a focus on thermal-hydraulics, data-driven modeling, and uncertainty analysis. Dr. Kaizer is Vice-Chair of the ASME VVUQ Standards Committee, chairs the subcommittee on AI/ML, and is a member of both the nuclear power subcommittee and the NAFEMS working group on simulation governance.
Authors:
Joshua Kaizer U.S. Nuclear Regulatory CommissionCredibility Assessment Framework for Machine Learned Models, Revisited
Paper Type
Technical Presentation Only