International Journal of Transportation Engineering

International Journal of Transportation Engineering

How Threshold-Moving Technique May Change the Performance of Different Machine Learning Models in Crash Severity Prediction Problems

Document Type : Research Paper

Authors
1 Faculty of Civil, Water, and Environmental Engineering, Shahid Beheshti University, Tehran, Iran
2 PhD, Faculty of Civil, and Environmental Engineering, Tarbiat Modares University, Tehran, Iran
3 Professor, Faculty of Civil, and Environmental Engineering, Tarbiat Modares University, Tehran, Iran
Abstract
To predict crash severity using Machine Learning (ML) models, dealing with imbalanced classification problems could be inevitable. Threshold-moving can address such problems. Based on a review of the literature, this technique seems to be underutilized. Also, the issue of comparing the performance of different machine learning models in the prediction of crash severity seems to be an open one. Thus, this research focuses on comparing the performance of Random Forest (RF), Logistic Regression (LR) and Naïve Bayes (NB) models by analyzing the trade-off between accuracy and recall for the minority class (both measures change as a result of thresholding). The minority class in our problem is fatal and serious injuries crashes. We use a state-wide crash database from California which contains 143310 records in order to address this issue. Various thresholds are used in the comparison, which are determined by Receiver Operating Characteristic Curves (ROC) and Precision-Recall Curves. There are three thresholds chosen for this study: 0.05, 0.10, and 0.15. Based on the results, the LR with a threshold of 0.1, the RF with 250 trees and the Bernoulli Naive Bayes (BNB) with a threshold of 0.05 are the best models. In addition, LR outperforms the rest of these three models. After threshold moving is employed, even simple models such as the LR can outperform more complicated ones like the RF in this paper, contradicting several previous studies in which the RF is found to be the best model.
Keywords

  • Laskaris, R. (2015). Artificial Intelligence: a modern approach.

 

  • Mokhtarimousavi, S., Anderson, J. C., Azizinamini, A., & Hadi, M. (2020). Factors affecting injury severity in vehicle-pedestrian crashes: A day-of-week analysis using random parameter ordered response models and Artificial Neural Networks. International journal of transportation science and technology, 9(2), 100-115.

 

  • Zhang, H. (2004). The optimality of naive Bayes. Aa, 1(2), 3.

 

  • Ahadi, M. R., Mahpour, A. R., & Taraghi, V. (2018). A Combined Fuzzy Logic and Analytical Hierarchy Process Method for Optimal Selection and Locating of Pedestrian Crosswalks. Journal of Optimization in Industrial Engineering, 11(2), 79-89.

 

  • Ahmed, S. S., Corman, F., & Anastasopoulos, P. C. (2023). Accounting for unobserved heterogeneity and spatial instability in the analysis of crash injury-severity at highway-rail grade crossings: A random parameters with heterogeneity in the means and variances approach. Analytic methods in accident research, 37, 100250.

 

  • AlMamlook, R. E., Kwayu, K. M., Alkasisbeh, M. R., & Frefer, A. A. (2019, April). Comparison of machine learning algorithms for predicting traffic accident severity. In 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT) (pp. 272-276). IEEE.

 

  • Al-Moqri, T., Haijun, X., Namahoro, J. P., Alfalahi, E. N., & Alwesabi, I. (2020). Exploiting Machine Learning Algorithms for Predicting Crash Injury Severity in Yemen: Hospital Case Study. Appl. Comput. Math, 9(5), 155-164.

 

  • Amiri, A. M., Nadimi, N., & Yousefian, A. (2020). Comparing the efficiency of different computation intelligence techniques in predicting accident frequency. IATSS research, 44(4), 285-292.

 

  • Azhar, A., Ariff, N. M., Bakar, M. A. A., & Roslan, A. (2022). Classification of driver injury severity for accidents involving heavy vehicles with decision tree and random forest. Sustainability, 14(7), 4101.

 

  • Beshah, T., & Hill, S. (2010, March). Mining road traffic accident data to improve safety: role of road-related factors on accident severity in Ethiopia. In 2010 AAAI Spring symposium series.

 

  • Beshah, T., Ejigu, D., Abraham, A., Snasel, V., & Kromer, P. (2013). Mining pattern from road accident data: role of road user’s behaviour and implications for improving road safety. International journal of tomography and simulation, 22(1), 73-86.

 

  • Bokaba, T., Doorsamy, W., & Paul, B. S. (2022). Comparative study of machine learning classifiers for modelling road traffic accidents. Applied Sciences, 12(2), 828.

 

  • Chakraborty, M., Gates, T., & Sinha, S. (2021). Causal Analysis and Classification of Traffic Crash Injury Severity Using Machine Learning Algorithms. arXiv preprint arXiv:2112.03407.

 

  • Chen, C., Zhang, G., Qian, Z., Tarefder, R. A., & Tian, Z. (2016). Investigating driver injury severity patterns in rollover crashes using support vector machine models. Accident Analysis & Prevention, 90, 128-139.

 

  • Chen, M. M., & Chen, M. C. (2020). Modeling road accident severity with comparisons of logistic regression, decision tree and random forest. Information, 11(5), 270.

 

  • Eluru, N., Bhat, C. R., & Hensher, D. A. (2008). A mixed generalized ordered response model for examining pedestrian and bicyclist injury severity level in traffic crashes. Accident Analysis & Prevention, 40(3), 1033-1054.

 

  • Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Machine learning, 31(1), 1-38.

 

  • Feknssa, N., Venkataraman, N., Shankar, V., & Ghebrab, T. (2023). Unobserved heterogeneity in ramp crashes due to alignment, interchange geometry and truck volume: Insights from a random parameter model. Analytic Methods in Accident Research, 37, 100254.

 

  • Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Cham: Springer.

 

  • Fiorentini, N., & Losa, M. (2020). Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures, 5(7), 61.

 

  • Gan, X., & Weng, J. (2020). Predicting Crash Injury Severity for the Highways Involving Traffic Hazards and Those Involving No Traffic Hazards. In CICTP 2020 (pp. 4195-4206).

 

  • Haery, S., Mahpour, A. and Vafaeinejad, A., 2024. Forecasting urban travel demand with geo-AI: a combination of GIS and machine learning techniques utilizing uber data in New York City. Environmental Earth Sciences, 83(20), p.594.

 

  • He, H., & Ma, Y. (Eds.). (2013). Imbalanced learning: foundations, algorithms, and applications.

 

  • Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.

 

  • Ijaz, M., Zahid, M., & Jamal, A. (2021). A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. Accident Analysis & Prevention, 154, 106094.

 

  • Iranitalab, A., & Khattak, A. (2017). Comparison of four statistical and machine learning methods for crash severity prediction. Accident Analysis & Prevention, 108, 27-36.

 

  • Islam, A. M., Shirazi, M., & Lord, D. (2023). Grouped Random Parameters Negative Binomial-Lindley for accounting unobserved heterogeneity in crash data with preponderant zero observations. Analytic Methods in Accident Research, 37, 100255.

 

  • Jeong, H., Jang, Y., Bowman, P. J., & Masoud, N. (2018). Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accident Analysis & Prevention, 120, 250-261.

 

  • Kabli, A., Bhowmik, T., & Eluru, N. (2023). Exploring the temporal variability of the factors affecting driver injury severity by body region employing a hybrid econometric approach. Analytic Methods in Accident Research, 37, 100246.

 

  • Krishnaveni, S., & Hemalatha, M. (2011). A perspective analysis of traffic accident using data mining techniques. International Journal of Computer Applications, 23(7), 40-48.

 

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26, p. 13). New York: Springer.

 

  • Lee, J., Yoon, T., Kwon, S., & Lee, J. (2019). Model evaluation for forecasting traffic accident severity in rainy seasons using machine learning algorithms: Seoul city study. Applied Sciences, 10(1), 129.

 

  • Li, Z., Liu, P., Wang, W., & Xu, C. (2012). Using support vector machine models for crash injury severity analysis. Accident Analysis & Prevention, 45, 478-486.

 

  • Liu, D. X. (2022). A spatial data statistical model of urban road traffic accidents. Advances in transportation studies, 1.

 

  • Mahpour, A. and Kazemi Naeini, K., 2021. Investigating the social effects of Covid-19 pandemic in the passenger sector of railroad transportation (Case study: Railways of the Islamic Republic of Iran). International Journal of Railway Research, 8(1), pp.43-52.

 

  • Mahpour, A. and Shafaati, M., 2024. Developing a Framework for Selecting an Appropriate Model based on the Ensemble Learning. International Journal of Transportation Engineering, 12(2), pp.1719-1745.

 

  • Mahpour, A., Forsi, H., Vafaeenejad, A. and Saffarzadeh, A., 2022. An improvement on the topological map matching algorithm at junctions: a heuristic approach. International journal of transportation engineering, 9(4), pp.749-761.

 

  • Mahpour, A., Hashemi, M., Asadi, I., Yan, K., You, L., Maghfouri, M. and Haerinia, B., 2023. Evaluation of the optimum value of lightweight expanded clay aggregate incorporation into the roller-compacted concrete pavement through experimental measurement of mechanical and thermal properties. International Journal of Pavement Engineering, 24(2), p.2065489.

 

  • Mahpour, A., Mamdoohi, A. and Hakimelahi, A., 2020. A heuristic technique for traffic assignment with variable step size and number of iterations. Transportation Research Procedia, 48, pp.2569-2579.

 

  • Mahpour, A.R., Amiri, A. and Ebrahimi, E.S., (2019). Do drivers have a good understanding of distraction by wrap advertisements? Investigating the impact of wrap advertisement on distraction-related driver's accidents. Advances in transportation studies, 48, 19-30.

 

  • Mamdoohi, A., Axhausen, K.W., Mahpour, A., Rashidi, T.H. and Saffarzadeh, M., 2016. Are there latent effects in shopping destination choice?: survey methods and response behavior. In 16th Swiss Transport Research Conference (STRC 2016). Swiss Transport Research Conference (STRC).

 

  • Mamdoohi, A.R., Yousefikia, M. and Mahpour, A.R., 2013. Increasing Minimum Spanning Tree estimation precision; implemented for Tehran province. Advances in Civil Engineering & Building Materials, Routledge Taylor & Francis Group, pp.879-882.

 

  • Tayarani Yousefabadi, A., Mahpour, A. and Javanshir, H., 2020. Modeling share change of non-public vehicles and the rate of emissions due to the implementation of demand management policies. Journal of Transportation Research, 17(3), pp.203-216.

 

  • Mannering, F. L., Shankar, V., & Bhat, C. R. (2016). Unobserved heterogeneity and the statistical analysis of highway accident data. Analytic methods in accident research, 11, 1-16.

 

  • Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006, July). Spam filtering with naive bayes-which naive bayes?.bayes? In CEAS (Vol. 17, pp. 28-69).

 

  • Murty, M. N., & Devi, V. S. (2011). Pattern recognition: An algorithmic approach. Springer Science & Business Media.

 

  • Nujjetty, A. P., Mohamedshah, Y. M., & Council, F. M. (2014). Highway safety information system: Guidebook for data files California. Washington, DC: Federal Highway Administration.

 

  • Provost, F. (2000, July). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data sets (Vol. 68, No. 2000, pp. 1-3). AAAI Press.

 

  • Ryu, J. W., Kantardzic, M., & Walgampaya, C. (2010). Ensemble classifier based on misclassified streaming data. In Proc. of the 10th IASTED int. Conf. on artificial intelligence and applications, austria (pp. 347-354).

 

  • Sahebi, S., Mirbaha, B., Mahpour, A. and Norouz Oliaee, M., (2015). Predicting pedestrian accidents in rural roads using ordered logit model. Quarterly Journal of Transportation Engineering, 6(4), pp.581-592.

 

  • Santos, K., Dias, J. P., & Amado, C. (2022). A literature review of machine learning algorithms for crash injury severity prediction. Journal of safety research, 80, 254-269.

 

  • Schutze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval. Cambridge University Press.

 

  • Singh, G., Sachdeva, S. N., & Pal, M. (2018). Comparison of three parametric and machine learning approaches for modeling accident severity on non-urban sections of Indian highways. Advances in transportation studies, 45.

 

  • Studer, M., Struffolino, E., & Fasang, A. E. (2018). Estimating the relationship between time-varying covariates and trajectories: The sequence analysis multistate model procedure. Sociological Methodology, 48(1), 103-135.

 

  • Tang, J., Liang, J., Han, C., Li, Z., & Huang, H. (2019). Crash injury severity analysis using a two-layer Stacking framework. Accident Analysis & Prevention, 122, 226-238.
  • Tselentis, D. I., Papadimitriou, E., & van Gelder, P. (2023). The usefulness of artificial intelligence for safety assessment of different transport modes. Accident Analysis & Prevention, 186, 107034.

 

  • Umer, M., Sadiq, S., Ishaq, A., Ullah, S., Saher, N., & Madni, H. A. (2020). Comparison analysis of tree based and ensembled regression algorithms for traffic accident severity prediction. arXiv preprint arXiv:2010.14921.

 

  • Vajari, M. A., Aghabayk, K., Sadeghian, M., & Shiwakoti, N. (2020). A multinomial logit model of motorcycle crash severity at Australian intersections. Journal of safety research, 73, 17-24.

 

  • Wahab, L., & Jiang, H. (2019). A comparative study on machine learning based algorithms for prediction of motorcycle crash severity. PLoS one, 14(4), e0214966.

 

  • Wang, X., & Kim, S. H. (2019). Prediction and factor identification for crash severity: Comparison of discrete choice and tree-based models. Transportation research record, 2673(9), 640-653.

 

  • (2018). Global status report on road safety 2018. Geneva: World Health Organization; 2018. Licence: CC BYNC-SA 3.0 IGO.
  • Yan, X. T., & Shang, Z. L. (2023). Vehicle lane change behavior detection method based on machine learning. Advances in Transportation Studies.

 

  • Yang, J., Han, S., & Chen, Y. (2023). Prediction of Traffic Accident Severity Based on Random Forest. Journal of Advanced Transportation, 2023.

 

  • Zhou, Z. H., & Liu, X. Y. (2005). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on knowledge and data engineering, 18(1), 63-77.