Session: Artificial Intelligence and Machine Learning Models
Paper Number: 151639
151639 - Softmax-Based Deep Neural Network in Regression
Abstract:
Regression is a vital problem in statistics and machine learning wherein the true output is a continuous and stochastic function of the input. Regression methods aim to model the output variable(s) using a training dataset composed of numerous input pairs collected from real-world scenarios. The input pairs can be divided into two broad categories: one is the known inputs and the other is the unknown inputs. Typically, regression methods model the unknown inputs as the noise in an additive manner, which may not always hold true in practical applications. Specifically, most of current models operate under two critical assumptions: Gaussian residuals and homoscedasticity.
Gaussian residuals assume that errors follow a normal distribution, ensuring most predictions are close to the true value with a symmetric spread around its mean value. Homoscedasticity assumes that the variance of these errors remains constant across all input values, ensuring consistent model reliability. These assumptions, while foundational, can be restrictive in real-world applications. Traditional machine learning methods may struggle to address these assumptions effectively. Therefore, this paper aims to resolve these issues by leveraging the flexibility of deep neural networks.
A cornerstone of neural networks theory is the Universal Approximation Theorem, which states that a neural network can approximate any continuous function given a sufficient number of hidden units. This theorem underscores the impressive expressive power and adaptability of neural networks in capturing complex relationships within data. Regression methodologies with neural networks has made significant efforts to leverage the characteristics of neural networks to understand the complex relationships between input and output variables, yet they have remained committed to using MSE or MAE as their objective function for model training.
While MSE and MAE have long been a guiding principle, they still do not break free from the assumptions of Gaussian or uniform residuals and homoscedasticity. Therefore, this study critically examines the challenges these aforementioned assumptions present when applied to regression tasks, focusing on the use of different activation and objective functions. The inspiration for this study stems from classification problems rather than regression. Just as labels in classification are encoded into numbers, the continuous output in regression is divided into small bins, and the probability of falling into each bin is calculated. Consequently, Softmax is used as the activation function in the final output layer, and cross-entropy is employed as the objective function, similar to how it's applied in multi-class classification.
In order to verify the performance of the proposed methodology, a comparative analysis is conducted against linear methodologies, Gaussian processes, and conventional MSE-based DNN. Our findings illustrate the potential for significantly improved handling of real-world regression tasks, demonstrating the advantages of this innovative approach.
Presenting Author: Jeongwon Seo University of Texas at Austin
Presenting Author Biography: Jeongwon Seo is a postdoctoral research assistant at University of Texas as Austin. He completed his B.S. in Applied Chemistry at the Korean Military Academy in Seoul, South Korea, followed by his M.S. in Nuclear Physics at the Moscow State University in Moscow, Russia. He finished his Ph.D at Purdue, December 2023. His current research delves into the areas of SA/UQ and ROM with machine learning and artificial intelligent.
Authors:
Jeongwon Seo University of Texas at AustinKevin T. Clarno University of Texas at Austin
Softmax-Based Deep Neural Network in Regression
Paper Type
Technical Paper Publication