Detection of anomalies and Data Drift in a time-series dismissal prediction system

Nataliya Boyko; Roman  Kovalchuk

doi:10.52866/ijcsm.2024.05.03.012

Authors

Nataliya Boyko Lviv Polytechnic National University https://orcid.org/0000-0001-9039-125X
Roman Kovalchuk Lviv Polytechnic National University https://orcid.org/0000-0002-6962-9363

DOI:

https://doi.org/10.52866/ijcsm.2024.05.03.012

Keywords:

Data Quality pipeline, multimodal data, logical data, numerical data, machine learning algorithm

Abstract

The purpose of the study is to develop a system that automatically processes data based on existing
and newly entered data, especially with the aim of ensuring high data quality by detecting and eliminating
anomalies. The quantile filtering method, Chebyshev’s inequality, Kolmogorov-Smirnov two-sample test, and
others should be noted among the methods used. In the course of the research, the theoretical aspects of the
methods, various principles of detecting anomalies for different types of data were considered and analysed.
Different principles and approaches applied to anomaly detection in different contexts were explored. The results
of the analysis and the selection of optimal methods for detecting anomalies in various types of data are important
for the effective functioning of the automatic data processing system. This will make it possible to achieve
accuracy and reliability in the detection of anomalies and ensure high quality of data used in the machine learning
system.