On the Impact of Pretraining with Simulated Data on Anomaly Detection in CPS – A Case Study
Abstract
Anomaly detection and diagnosis algorithms for Cyber-Physical Systems, especially Cyber-Physical Production Systems, are increasingly data-driven and rely on sufficient and representative data to be fitted. However, recording such data comes with cost, especially for rare or unsafe operational states. While simulation offers a scalable solution, by generating synthetic data, it often comes with a gap between simulated and real-world environments. In this paper, we investigate the impact of a pretraining with simulation-generated data on anomaly detection algorithms in a case study of a model Cyber-Physical Production Systems and its simulation. In a first step, we therefore train different anomaly detection algorithms on simulated data, and subsequently continue training with real data. We examine, (i) whether pretraining with additional synthetic data enhances the performance of anomaly detection algorithms, and (ii) how the proportion of real versus synthetic data affects a model’s effectiveness when operating on a fixed data budget. Our findings show that pretraining on simulation generated data can increase the performance of anomaly detection algorithms, however, solely training on simulated data is leading to a decrease in performance.
Citation: R. Jaufmann, N. Widulle, J. Ehrhardt, D. Vranjes, O. Niggemann, “On the Impact of Pretraining with Simulated Data on Anomaly Detection in CPS – A Case Study,” ETFA - IEEE Conference on Emerging Technologies and Factory Automation, 2025. doi:http://dx.doi.org/10.1109/ETFA65518.2025.11205786.