Comparison of Regression and Deep Learning Approaches in Modeling Time Series to Predict Air Pollutant Concentration in City of Tehran

Document Type : Research Paper


Imam Khomeini International University, College of Engineering, Qazvin, Iran


The rapid growth of urbanization and the global population have resulted in climate change, air contamination, and various human health problems. Thus, estimating air pollution indices has become important to environmental science studies. With relevant data increasingly available, machine learning frameworks have been proposed as a particularly useful method to predict air pollution. Based on four years of Tehran’s neighborhood air pollution data analysis, this paper proposes three machine learning approaches to predict NO2 and CO concentration: Autoregressive Integrated Moving Average (ARIMA), Long Short-Term Memory Networks (LSTM), and Multiple Linear Regression (MLR). This paper compared the ability of the ARIMA, LSTM, and MLR machine learning methods to forecast the daily concentrations of NO2 and CO at Punak air quality monitoring station, from 2017 to 2020. By applying four performance measurements, the ARIMA model displays the worst performance among the three models in all datasets with RMSE values of 47.39 and 1.29, and 0.012 and 0.01 for NO2 and CO respectively. The LSTM and MLR models achieve the best forecasting result with RMSE = 17.6 and 6.41, MAE = 10.59 and 4.33, = 0.458 and 0.46, and RRSE =1.06 and 1.10 for NO2 forecasting and RMSE = 0.42 and 0.32, MAE = 0.24 and 0.25, 0.96 and 0.98, and RRSE = 0.43 and 0.44 for CO forecasting.


Abdullah, S., Ismail, M., & Fong, S. Y. (2017). Multiple Linear Regression (MLR) models for long term Pm 10 concentration forecasting during different monsoon seasons. Journal of Sustainability Science and Management, 12(1), 60–69.
Athira, V., Geetha, P., Vinayakumar, R., & Soman, K. P. (2018). DeepAirNet: Applying Recurrent Networks for Air Quality Prediction. Procedia Computer Science, 132, 1394–1403.
Brunekreef, B., & Holgate, S. T. (2002). Air pollution and health. Lancet, 360(9341), 1233–1242.
Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), 240–254.
(2002). The Ongoing Challenge of Managing Carbon Monoxide Pollution in Fairbanks, Alaska. In The Ongoing Challenge of Managing Carbon Monoxide Pollution in Fairbanks, Alaska. National Academies Press.
Dey, S., Sibanda, P., Gupta, S., & Chakraborty, A. (2009). Analyzing and predicting the criteria pollutants over a tropical urban area by using statistical models. 2.
Dragomir, C. M., Voiculescu, M., Constantin, D. E., & Georgescu, L. P. (2015). Prediction of the NO2 concentration data in an urban area using multiple regression and neuronal networks. AIP Conference Proceedings, 1694(2).
(2001). Latest findings on national air quality: 2000 status and trends. EPA Publications, 454 K-01–002, 2–26.
Figueiredo Filho, D. B., Silva Júnior, J. A., & Rocha, E. C. (2011). What is R2 all about? Leviathan (São Paulo), 3, 60.
Frank R. Giordano, W. P. F. and, & Horton, S. B. (2000). A course in mathematical modeling. In Richard Stratton (Vol. 37, Issue 05). Richard Stratton.
Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM. IEE Conference Publication, 2(470), 850–855.
Hamzaçebi, C. (2008). Improving artificial neural networks’ performance in seasonal time series forecasting. Information Sciences, 178(23), 4550–4559.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
Jenkins, B. O. X., Approach, A., Predicting, T. O., Fdi, N. E. T., & In, I. (2011). Box-Jenkins ARIMA approach to predicting net FDI inflows in Zimbabwe. 87737.
Juhos, I., Makra, L., & Tóth, B. (2008). Forecasting of traffic origin NO and NO2 concentrations by Support Vector Machines and neural networks using Principal Component Analysis. Simulation Modelling Practice and Theory, 16(9), 1488–1502.
Kumar, K., Yadav, A. K., Singh, M. P., Hassan, H., & Jain, V. K. (2004). Forecasting daily maximum surface ozone concentrations in brunei darussalam—an ARIMA modeling approach. Journal of the Air and Waste Management Association, 54(7), 809–814.
Mohammadi-Zadeh, M. J., Karbassi, A., Bidhendi, G. N., Abbaspour, M., & Padash, A. (2017). An Analysis of Air Pollutants’ Emission Coefficient in the Transport Sector of Tehran. Open Journal of Ecology, 07(05), 309–323.
Roberts, S., Arseneault, L., Barratt, B., Beevers, S., Danese, A., Odgers, C. L., Moffitt, T. E., Reuben, A., Kelly, F. J., & Fisher, H. L. (2019). Exploration of NO 2 and PM 2.5 air pollution and mental health problems using high-resolution data in London-based children from a UK longitudinal cohort study. Psychiatry Research, 272(2), 8–17.
Safriet, D. W., & Brooks, G. (1989). Estimating air toxics emissions from coal and oil combustion sources.
Srivastava, C., Singh, S., & Singh, A. P. (2019). Estimation of air pollution in Delhi using machine learning techniques. 2018 International Conference on Computing, Power and Communication Technologies, GUCON 2018, 304–309.
Torkian, A., Bayat, R., Najafi, M. A., Arhami, M., & Askariyeh, M. H. (2012). Source Apportionment of Tehran ’ s Air Pollution by Emissions Inventory. International Emission Inventory Conference, August, 41.
Wark, K., Warner, C. F., & Davis, W. T. (1998). Air Pollution: Its Origin and Control (3rd Edition) (3rd Editio).
Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82.
Xayasouk, T., Lee, H. M., & Lee, G. (2020). Air pollution prediction using long short-term memory (LSTM) and deep autoencoder (DAE) models. Sustainability (Switzerland), 12(6).
Zhang, G. P. (2007). A neural network ensemble method with jittered training data for time series forecasting. Information Sciences, 177(23), 5329–5346.
Zhao, J., Deng, F., Cai, Y., & Chen, J. (2019). Long short-term memory - Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere, 220, 486–492.
Zohdirad, H., Montazeri Namin, M., Ashrafi, K., Aksoyoglu, S., & Prévôt, A. S. H. (2022). Temporal variations, regional contribution, and cluster analyses of ozone and NOx in a middle eastern megacity during summertime over 2017–2019. Environmental Science and Pollution Research, 29(11), 16233–16249.