New Optimization Approach for Handling Imbalanced Data in Road Crash Severity

Document Type : Research Paper


1 Ph.D. Candidate, School of Civil Engineering, Shomal University, Mazandaran, Amol, Iran

2 Assistant Professor, School of Civil Engineering, Shomal University, Mazandaran, Amol, Iran


Accidents are a major problem that claim the lives of many people in the world each year. Fatalities and severe injuries could leave adverse and irreversible impacts on public health and economic prospects. A review of the variables affecting the severity of crash injuries can help reduce fatal accidents. However, a detailed prediction of fatal crashes as a smaller-data class than other classes is seen as a challenge. This study uses three robust machine learning such as Bayesian classifier, random forest, and support vector machine techniques. First, three imbalanced data prediction models were developed, suggesting they could not differentiate fatal data from injury data. To address this problem, three random, k-means clustering, meta-heuristic algorithms clustering techniques were used to balance the data. It should be noted that the genetic algorithm performed better than the particles swarm. Models developed by intelligent optimization methods, k-means clustering, and random methods were found to be more accurate, respectively. These criteria helped evaluate the models developed, which yielded the best model. The support vector machine method for genetic clustering-balanced data could predict fatal, and injury crashes with a 0.96% accuracy, becoming the best model. Finally, sensitivity analysis was performed on the best model, indicating that the highway, horizontal curves, and head-on variables contributed to fatal accidents.


-  Abd Elrahman, S. M.  and Abraham, A. (2013) "A review of class imbalance problem", Journal of  Network and Innovative Computing, vol. 1, no. 2013, pp. 332–340.
- Abdelwahab, H. T. and Abdel-Aty, M. A. (2001) "Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections", Transportation Resarch Record, vol. 1746, no. 1, 2001, pp. 6–13.
- Abou Elassad, Z. E., Mousannif, H. and H. Al Moatassime, (2020) "A real-time crash prediction fusion framework: An imbalance-aware strategy for collision avoidance systems", Transportation Research Part C: Emerging Technologies, vol. 118, 2020, p. 102708.
- Ahmed, M.M. and Abdel-Aty, M. A. (2011) “The viability of using automatic vehicle identification data for real-time crash prediction” IEEE Transactions On Intelligent Transportation Systems, vol. 13, no. 2, 2011, pp. 459–468.
- Ba, Y., Zhang, W., Wang, Q., Zhou, R. and C. Ren, (2017) "Crash prediction with behavioral and physiological features for advanced vehicle collision avoidance system", Transportation Research Part C: Emerging Technologies, vol. 74, 2017, pp. 22–33.
- Basso, F., Basso, L. J., Bravo, F. and Pezoa, R. (2018) "Real-time crash prediction in an urban expressway using disaggregated data", Transportation Research Part C: Emerging Technologies, vol. 86, 2018, pp. 202–219.
- Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002)  "SMOTE: synthetic minority over-sampling technique", Journal of Artificial Intelligence Resaerch, vol. 16, 2002, pp. 321–357.
- Cortes, C. and Vapnik, V. (1995) "Support-vector networks", Machine Learning, vol. 20, no. 3, 1995, pp. 273–297.
- Elamrani, Z., Abou Elassad, Mousannif, H.  and Al Moatassime, H. (2020) "Class-imbalanced crash prediction based on real-time traffic and weather data: a driving simulator study", Traffic Injury Prevention, vol. 21, no. 3, 2020, pp. 201–208.
- Holland, J. H. (1992) "Genetic algorithms", Scientific American, vol. 267, no. 1, 1992, pp. 66–73.
- Karami, Ali., Hadji Hosseinlou, Mansour., Abbasi, Mohammad Hossein. and Figuerira, Monteiro. (2020) "Priority Order for Improvement of Intersections using Pedestrian Crash Prediction Model", International Journal of Transportation Engineering, vol.7, 2020, pp. 297-313.
- Keogh, E. (2017) "Naive bayes classifier", Accessed Nov, vol. 5, 2006, p. 2017.
- Le Yu, Bowen Du, Xiao Hu, Leilei Sun, Liangzhe Han, Weifeng Lv, ( 2021) "deep spatio-temporal graph convolutional network for traffic accident prediction", Neurocomputing, vol.423, 2021,pp. 135-147.
- Li, L., He, S., Zhang, J. and Ran, B. (2016) "Short‐term highway traffic flow prediction based on a hybrid strategy considering temporal–spatial information", Journal Of Advanced Transportation, vol. 50, no. 8, 2016, pp. 2029–2040.
- Li, Y., Ma, D., Zhu, M., Zeng, Z. and Wang, Y. (2018) "Identification of significant factors in fatal-injury highway crashes using genetic algorithm and neural network", Accident Analysis & Prevention, vol. 111, 2018, pp. 354–363.
- Li, X., Lord, D., Zhang, Y. and Xie, Y. (2008) "Predicting motor vehicle crashes using support vector machine models", Accident Analysis & Prevention, vol. 40, no. 4, 2008, pp. 1611–1618.
- Likas, A., Vlassis, N. and Verbeek, J. J. (2003) "The global k-means clustering algorithm", Pattern Recognition, vol. 36, no. 2, 2003, pp. 451–461.
- Mirbaha, Babak., Saffarzadeh, Mahmoud. and Noruzoliaee, Mohammad Hossein. (2013) "A Model for Predicting Schoolchildren Accidents in the Vicinity of Rural Roads based on Geometric Design and Traffic Conditions", International Journal of Transportation Engineering, vol.1, 2013, pp. 25-33.
- Mirjalili, S. (2019) "Genetic algorithm", Evolutionary Algorithms and Neural Networks, vol. 780, Springer, 2019, pp. 43–55.
- Nguyen, G. H., Bouzerdoum, A. and Phung, S. L. (2009) "Learning pattern classification tasks with imbalanced data sets", Pattern Recognit, 2009, pp. 193–208.
- Niveditha, V., Ramesh, A., Kumar, M. (2015) "Development of Models for Crash Prediction and Collision Estimation- A Case Study for Hyderabad City", International Journal of Transportation Engineering, vol.3, 2015, pp. 143-150.
- Safak, V. (2020) "Min-Mid-Max Scaling, Limits of Agreement, and Agreement Score", arXiv, 2020.
- Sinaga, K. P. and Yang, M.-S. (2020) "Unsupervised K-means clustering algorithm", IEEE Access, vol. 8, 2020, pp. 80716–80727.
-  Tabachnick, B. G., Fidell, L. S. and Ullman, J. B. (2007) “Using multivariate statistics”, vol. 5. Pearson Boston, MA, 2007.
- Theofilatos, A.,  Chen, C. and Antoniou, C. (2019) "Comparing machine learning and deep learning methods for real-time crash prediction", Transportation Resarch Record, vol. 2673, no. 8, 2019, pp. 169–178.
- Vapnik, V., Guyon, I.  and Hastie, T. (1995) "Support vector machines", Machine Learning, vol. 20, no. 3, 1995, pp. 273–297.
- W. H. Organization, Global status report on road safety 2018. World Health Organization, 2018.
- Wang, C., Xu, C. and Dai, Y. (2019) "A crash prediction method based on bivariate extreme value theory and video-based vehicle trajectory data", Accident Analysis & Prevention, vol. 123, 2019, pp. 365–373.
- Wang, D., Tan, D. and Liu, L. (2018) "Particle swarm optimization algorithm: an overview", Soft Computing, vol. 22, no. 2, 2018, pp. 387–408.
- Washington, S., Haque, M. M., Oh, J. and Lee, D. (2014) "Applying quantile regression for modeling equivalent property damage only crashes to identify accident blackspots", Accident Analysis & Prevention, vol. 66, 2014, pp. 136–146.
- Whitley, D. (1994) "A genetic algorithm tutorial", Statistics and Computing, vol. 4, no. 2,1994, pp. 65–85.
- Yen, S.-J. and Lee, Y.-S. (2009) "Cluster-based under-sampling approaches for imbalanced data distributions", Expert Systems With Applications, vol. 36, no. 3, 2009, pp. 5718–5727.
- Yang, I.T. (2007) "Performing complex project crashing analysis with aid of particle swarm optimization algorithm", International Journal of Projoject Management, vol. 25, no. 6, 2007, pp. 637–646.