A Comparison of Methods for Automatic Outlier Detection in Ergospirometric Data and Their Effect on the Performance of Predictive Models
Nina Baumgartner, Christina Kranzinger, Stefan Kranzinger, Cory Snyder, Thomas Stöggl, Bernd Resch (2023): A Comparison of Methods for Automatic Outlier Detection in Ergospirometric Data and Their Effect on the Performance of Predictive Models In: 13th World Congress of Performance Analysis of Sport and 13th International Symposium on Computer Science in Sport.
Features obtained from cardiopulmonary exercise testing provide useful information for predicting target values of high interest in the sports and healthcare industry. However, these sensor-generated data are susceptible to different factors that may cause imprecise measurements. This paper applies three selected outlier detection methods to ergospirometric data of junior ski racers and compares them in terms of detected points and the effect of outlier removal on descriptive measures. Further, we examine the effect of outlier removal on the predictive performance for assessing perceived fatigue by leave-one-subject-out cross-validation. A common feature of all inspected methods is the reliable detection of extreme unrealistic values, causing averages to shift in the same direction after removing outliers and decreasing the standard deviation of values. Differences of the produced results lie in the type of detection algorithm chosen, as some consider the seasonality and trend component of the time series, while others do not. Evaluation through fatigue prediction showed higher predictive performance for cleaned compared to raw data and should therefore be considered when pre-processing data for model development.