Analyzing and Predicting Fatal Road Traffic Crash Severity Using Tree-Based Classification Algorithms

Document Type : Research Paper


1 School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran

2 School of Surveying and Geospatial Eng., College of Eng., University of Tehran

3 School of Civil Engineering, Shahrood University of Technology, Shahrood, Iran


Nowadays, a significant part of goods and passengers are transported on suburban highways with mainly high-speed vehicles. Hence, these highways are very prone to accidents with different injuries. Due to the high fatality or severe physical/mental injury rates caused by car crashes, analyzing these accident-prone areas and identifying the factors affecting their occurrences is crucial. The specific objective of the study was to compare Chi-square Automatic Interaction Detector (CHAID), Classification and Regression Tree (CART), C4.5 and C5.0 decision tree data mining classification algorithms in building classification models for the fatality severity of 2355 fatal crash data records during 2007-2009 occurred in the roadways of 8 states in the USA. The results were evaluated using the accuracy metrics such as overall accuracy, kappa rate, precision, recall, and F-measure. The investigations confirmed that C5.0 had the best performance with the overall accuracy, and kappa rates of 94% and 92%, respectively. Additionally, classified fatality severity levels of the crashes were proposed for each algorithm to generate risk maps on the roads, to create potential accident risk spots. Decision tree models can be used for real-time data to find invariants in the tree over a period of time, which would be beneficial for policymakers.


- Adhatrao, K., Gaykar, A., Dhawan, A., Jha, R., & Honrao, V. (2013). Predicting students’ performance using ID3 and C4. 5 classification algorithms. ArXiv Preprint ArXiv:1310.2071.
- Ahmed, A. M., Rizaner, A., & Ulusoy, A. H. (2018). A novel decision tree classification based on post-pruning with Bayes minimum risk. Plos One, 13(4), e0194168.
- Bahiru, T. K., Singh, D. K., & Tessfaw, E. A. (2018). Comparative study on data mining classification algorithms for predicting road traffic accident severity. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 1655–1660.
- Blazquez, C. A., & Celis, M. S. (2013). A spatial and temporal analysis of child pedestrian crashes in Santiago, Chile. Accident Analysis & Prevention, 50, 304–311.
- Chang, L.-Y., & Wang, H.-W. (2006). Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accident Analysis & Prevention, 38(5), 1019–1027.
- Choi, J., Gu, B., Chin, S., & Lee, J.-S. (2020). Machine learning predictive model based on national data for fatal accidents of construction workers. Automation in Construction, 110, 102974.
- de Oña, J., López, G., & Abellán, J. (2013). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention, 50, 1151–1160.
- Delen, D., Tomak, L., Topuz, K., & Eryarsoy, E. (2017). Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. Journal of Transport & Health, 4, 118–131.
- Diez, P. (2018). Chapter 1—Introduction. In P. Diez (Ed.), Smart Wheelchairs and Brain-Computer Interfaces (pp. 1–21). Academic Press.
- Effati, M., Thill, J.-C., & Shabani, S. (2015). Geospatial and machine learning techniques for wicked social science problems: Analysis of crash severity on a regional highway corridor. Journal of Geographical Systems, 17, 107–135.
- Fatality Analysis Reporting System. (2019, July17). NHTSA.
- Hand, D. J., & Till, R. J. (2001). A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186.
- Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1.
- Hssina, B., Merbouha, A., Ezzikouri, H., & Erritali, M. (2014). A comparative study of decision tree ID3 and C4. 5. International Journal of Advanced Computer Science and Applications, 4(2), 13–19.
- Janney, J. B., Roslin, S. E., & Kumar, S. K. (2020). 6—Analysis of skin lesions using machine learning techniques. In J. K. Verma, S. Paul, & P. Johri (Eds.), Computational Intelligence and Its Applications in Healthcare (pp. 73–90). Academic Press.
- Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2), 119–127.
- Kumar, S., & Toshniwal, D. (2017). Severity analysis of powered two wheeler traffic accidents in Uttarakhand, India. European Transport Research Review, 9(2), 24.
- Lee, J., Yoon, T., Kwon, S., & Lee, J. (2020). Model evaluation for forecasting traffic accident severity in rainy seasons using machine learning algorithms: Seoul city study. Applied Sciences, 10(1), 129.
- Lin, C.-L., & Fan, C.-L. (2019). Evaluation of CART, CHAID, and QUEST algorithms: A case study of construction defects in Taiwan. Journal of Asian Architecture and Building Engineering, 18(6), 539–553.
- López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.
- Mansouri, M., & Javad Kargar, M. (2014). Analysis and monitoring of the traffic suburban road accidents using data mining techniques; a case study of Isfahan Province in Iran. The Open Transportation Journal, 8(1).
- Mienye, I. D., Sun, Y., & Wang, Z. (2019). Prediction performance of improved decision tree-based algorithms: A review. Procedia Manufacturing, 35, 698–703.
- Okasha, M. K., & Abu-Saada, A. H. (2014). Modeling violence against women in Palestinian society. American International Journal of Contemporary Research, 4(1), 209–220.
- Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
- Quinlan, J. R. (1993). C4. 5: Programming for machine learning. Morgan Kauffmann, 38, 48.
- Rhys, H. (2020). Machine Learning with R, the tidyverse, and mlr. Manning Publications.
- Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). PROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 1–8.
- Rovšek, V., Batista, M., & Bogunović, B. (2017). Identifying the key risk factors of traffic accident injury severity on Slovenian roads using a non-parametric classification tree. Transport, 32(3), 272–281.
- Shanthi, S., & Ramani, R. G. (2011). Classification of vehicle collision patterns in road accidents using data mining algorithms. International Journal of Computer Applications, 35(12), 30–37.
- Susanti, Y., Zukhronah, E., Pratiwi, H., & Sri Sulistijowati, H. (2017). Analysis of Chi-square Automatic Interaction Detection (CHAID) and Classification and Regression Tree (CRT) for Classification of Corn Production. JPhCS, 909(1), 012041.
- Tallón-Ballesteros, A. J., & Riquelme, J. C. (2014). Data mining methods applied to a digital forensics task for supervised machine learning. In Computational Intelligence in Digital Forensics: Forensic Investigation and Applications (pp. 413–428). Springer.
- Thakali, L., Kwon, T. J., & Fu, L. (2015). Identification of crash hotspots using kernel density estimation and kriging methods: A comparison. Journal of Modern Transportation, 23(2), 93–106.
- United States. National Highway Traffic Safety Administration. (2006). This is NHTSA : people saving people. Washington, D.C. : U.S. Dept. of Transportation, National Highway Traffic Safety Administration, 2006.
- Wang, S., & Li, Z. (2019). Exploring the mechanism of crashes with automated vehicles using statistical modeling approaches. PloS One, 14(3), e0214550.
- Wang, Y., Li, Y., Song, Y., Rong, X., & Zhang, S. (2017). Improvement of ID3 algorithm based on simplified information entropy and coordination degree. Algorithms, 10(4), 124.
- World Health Organization, W. H. (2018). Global status report on road safety 2018: Summary. World Health Organization.
- Xing, L., He, J., Li, Y., Wu, Y., Yuan, J., & Gu, X. (2020). Comparison of different models for evaluating vehicle collision risks at upstream diverging area of toll plaza. Accident Analysis & Prevention, 135, 105343.
- Yuan, Y., Wang, S., Liu, Z., Cui, G., & Wang, Y. (2020). Influencing factors analysis of side right-angle collisions severity at intersections based on decision tree. International Journal of Crashworthiness, 1–11.
- Zhang, J., Li, Z., Pu, Z., & Xu, C. (2018). Comparing prediction performance for crash injury severity among various machine learning and statistical methods. IEEE Access, 6, 60079–60087.