How to Quickly Identify Outliers in Air Quality Monitoring Data

Typography

Ambient air quality monitoring data are the most important source for public awareness regarding air quality and are widely used in many research fields, such as improving air quality forecasting and the analysis of haze episodes. However, there are outliers among such monitoring data, due to instrument malfunctions, the influence of harsh environments, and the limitation of measuring methods.

Ambient air quality monitoring data are the most important source for public awareness regarding air quality and are widely used in many research fields, such as improving air quality forecasting and the analysis of haze episodes. However, there are outliers among such monitoring data, due to instrument malfunctions, the influence of harsh environments, and the limitation of measuring methods.

In practice, manual inspection is often applied to identify these outliers. However, as the amount of data grows rapidly, this method becomes increasingly cumbersome.

To deal with the problem, Dr. WU Huangjian and Associate Professor TANG Xiao from the Institute of Atmospheric Physics, Chinese Academy of Sciences, propose a fully automatic outlier detection method based on the probability of residuals. The method adopts multiple regression methods, and the regression residuals are used to discriminate outliers. Based on the standard deviations of the residuals, probabilities of the residuals can be calculated, and the observations with small probabilities are tagged as outliers and removed by a computer program. Their findings are published in Advances in Atmospheric Sciences.

Read more at Institute of Atmospheric Physics, Chinese Academy of Sciences

Image: The PM2.5 monitoring instruments at State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry (LAPC), Institute of Atmospheric Physics, Chinese Academy of Sciences. (Credit: Image by TANG Xiao)