Developing a Framework for Selecting an Appropriate Model based on the Ensemble Learning

Mahpour, Alireza; Shafaati, Mostafa

doi:10.22119/ijte.2024.440299.1660

Developing a Framework for Selecting an Appropriate Model based on the Ensemble Learning

Document Type : Research Paper

Authors

Alireza Mahpour ¹

Mostafa Shafaati ²

¹ Faculty of Civil, Water and Environmental Engineering, Shahid Beheshti University, Tehran, Iran

² Faculty of Civil and Environmental Engineering, Tarbiat Modares University, Tehran, Iran

10.22119/ijte.2024.440299.1660

Abstract

We present a framework for selecting the optimal ensemble learning model based on 143310 crash observations with five classes. For non-ensemble models, we use five common models. 26 ensemble learning models are derived from these five models. We suggest Diff2 and Diff3 measures for choosing the right model. The diff2 is the difference between observations classified incorrectly as class 1 and incorrectly classified as class 3, 4, or 5. In Diff3, we compare observations misclassified as class 1 or 2 with observations misclassified as class 4 or 5. We select the best model based on the following criteria: for class 1, the largest R1, for class 2, the largest "Diff2", for class 3, a negative "Diff3", and for classes 4 and 5, the highest "F1-score". The paper ranks 31 models based on its criteria. There are five ranking series. By comparing these rankings, we can determine, for example, whether the 3rd best model for class 1 corresponds to the best model for class 2. For each model, 5 "Ranks" are determined. Relationships between the ranks were then evaluated. Rank1 and Rank2, Rank3 and 5 have a relatively strong relationship. A negative and relatively strong correlation exists between Rankings 2 and 3, as well as Rankings 2 and 5.

Keywords

Crash Severity Prediction

Machine Learning Model

Ensemble Voting Classifier

Imbalanced Multi-Class Classification

Abdulazeez, M.U., Khan, W. and Abdullah, K.A., 2023. Predicting child occupant crash injury severity in the United Arab Emirates using machine learning models for imbalanced dataset. IATSS Research, 47(2), pp.134-159.

Ahmed, S.S., Corman, F. and Anastasopoulos, P.C., 2023. Accounting for unobserved heterogeneity and spatial instability in the analysis of crash injury-severity at highway-rail grade crossings: A random parameter with heterogeneity in the means and variances approach. Analytic methods in accident research, 37, p.100250.

Azhar, A., Ariff, N.M., Bakar, M.A.A. and Roslan, A., 2022. Classification of driver injury severity for accidents involving heavy vehicles with decision tree and random forest. Sustainability, 14(7), p.4101.

Bokaba, T., Doorsamy, W. and Paul, B.S., 2022. Comparative study of machine learning classifiers for modelling road traffic accidents. Applied Sciences, 12(2), p.828.

Chakraborty, M., Gates, T. and Sinha, S., 2021. Causal Analysis and Classification of Traffic Crash Injury Severity Using Machine Learning Algorithms. arXiv preprint arXiv:2112.03407.

Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, pp.321-357.

Chen, M.M. and Chen, M.C., 2020. Modeling road accident severity with comparisons of logistic regression, decision tree and random forest. Information, 11(5), p.270.

Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).

Eluru, N., Bhat, C.R. and Hensher, D.A., 2008. A mixed generalized ordered response model for examining pedestrian and bicyclist injury severity level in traffic crashes. Accident Analysis & Prevention, 40(3), pp.1033-1054.

Feknssa, N., Venkataraman, N., Shankar, V. and Ghebrab, T., 2023. Unobserved heterogeneity in ramp crashes due to alignment, interchange geometry and truck volume: Insights from a random parameter model. Analytic methods in accident research, 37, p.100254.

Fiorentini, N. and Losa, M., 2020. Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures, 5(7), p.61.

Gan, X. and Weng, J., 2020. Predicting Crash Injury Severity for the Highways Involving Traffic Hazards and Those Involving No Traffic Hazards. In CICTP 2020 (pp. 4195-4206).

Goswamy, A., Abdel-Aty, M. and Islam, Z., 2023. Factors affecting injury severity at pedestrian crossing locations with Rectangular RAPID Flashing Beacons (RRFB) using XGBoost and random parameters discrete outcome models. Accident Analysis & Prevention, 181, p.106937.

Guo, M., Yuan, Z., Janson, B., Peng, Y., Yang, Y. and Wang, W., 2021. Older pedestrian traffic crashes severity analysis based on an emerging machine learning XGBoost. Sustainability, 13(2), p.926.

Haeri, S., Mahpour, A., Vafaeinejad, A., 2024, Forecasting urban travel demand with geo-AI: a combination of GIS and machine learning techniques utilizing Uber data in New York City, Environmental Earth Sciences, In press.

Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann.

Ho, T.K., 1995, August. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.

Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational statistics & data analysis, 52(12), 5186-5201.

Ijaz, M., Zahid, M. and Jamal, A., 2021. A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. Accident Analysis & Prevention, 154, p.106094.

Islam, A.M., Shirazi, M. and Lord, D., 2023. Grouped Random Parameters Negative Binomial-Lindley for accounting unobserved heterogeneity in crash data with preponderant zero observations. Analytic methods in accident research, 37, p.100255.

Jamal, A., Zahid, M., Tauhidur Rahman, M., Al-Ahmadi, H.M., Almoshaogeh, M., Farooq, D. and Ahmad, M., 2021. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. International journal of injury control and safety promotion, 28(4), pp.408-427.

Jeong, H., Jang, Y., Bowman, P.J. and Masoud, N., 2018. Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accident Analysis & Prevention, 120, pp.250-261.

Kabli, A., Bhowmik, T. and Eluru, N., 2023. Exploring the temporal variability of the factors affecting driver injury severity by body region employing a hybrid econometric approach. Analytic methods in accident research, 37, p.100246.

Krishnaveni, S. and Hemalatha, M., 2011. A perspective analysis of traffic accident using data mining techniques. International Journal of Computer Applications, 23(7), pp.40-48.

Laskaris, R., 2015. Artificial Intelligence: a modern approach.

Lee, J., Yoon, T., Kwon, S. and Lee, J., 2019. Model evaluation for forecasting traffic accident severity in rainy seasons using machine learning algorithms: Seoul city study. Applied Sciences, 10(1), p.129.

Liu, D.X., 2022. A spatial data statistical model of urban road traffic accidents. Advances in transportation studies, 1.

Ma, J., Ding, Y., Cheng, J.C., Tan, Y., Gan, V.J. and Zhang, J., 2019. Analyzing the leading causes of traffic fatalities using XGBoost and grid-based analysis: a city management perspective. IEEE Access, 7, pp.148059-148072.

Mahpour, A., Farzin, I., Izadi, A.R. and Ashouri, S., 2023. Expanding the VBN theory on succeeding the transportation demand management policies. Transportation Research Interdisciplinary Perspectives, 21, p.100903.

Mahpour, A., Forsi, H., Vafaeenejad, A. and Saffarzadeh, A., 2022. An improvement on the topological map matching algorithm at junctions: a heuristic approach. International journal of transportation engineering, 9(4), pp.749-761.

Mahpour, A., Shafaati, M., & Mohammadian Amiri, A. (2021). The effective factors on the safety culture of HAZMAT drivers. AUT Journal of Civil Engineering, 5(1), 69-78.

Mannering, F.L., Shankar, V. and Bhat, C.R., 2016. Unobserved heterogeneity and the statistical analysis of highway accident data. Analytic methods in accident research, 11, pp.1-16.

Metsis, V., Androutsopoulos, I. and Paliouras, G., 2006, July. Spam filtering with naive bayes-which naive bayes?. In CEAS (Vol. 17, pp. 28-69).

Miqdady, T., de Oña, R. and de Oña, J., 2023. In search of severity dimensions of traffic conflicts for different simulated mixed fleets involving connected and autonomous vehicles. Journal of Advanced Transportation, 2023.

Mokhtarimousavi, S., Anderson, J.C., Azizinamini, A. and Hadi, M., 2020. Factors affecting injury severity in vehicle-pedestrian crashes: A day-of-week analysis using random parameter ordered response models and Artificial Neural Networks. International journal of transportation science and technology, 9(2), pp.100-115.

Mokoatle, M., Vukosi Marivate, D. and Michael Esiefarienrhe Bukohwo, P., 2019, June. Predicting road traffic accident severity using accident report data in South Africa. In Proceedings of the 20th annual international conference on digital government research (pp. 11-17).

Mousa, S.R., Bakhit, P.R. and Ishak, S., 2019. An extreme gradient boosting method for identifying the factors contributing to crash/near-crash events: a naturalistic driving study. Canadian Journal of Civil Engineering, 46(8), pp.712-721.

Murty, M.N. and Devi, V.S., 2011. Pattern recognition: An algorithmic approach. Springer Science & Business Media.

Nujjetty, A.P., Mohamedshah, Y.M. and Council, F.M., 2014. Highway safety information system: Guidebook for data files California. Washington, DC: Federal Highway Administration.

Parsa, A.B., Movahedi, A., Taghipour, H., Derrible, S. and Mohammadian, A.K., 2020. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accident Analysis & Prevention, 136, p.105405.

Pradhan, B., Ibrahim Sameen, M., Pradhan, B. and Ibrahim Sameen, M., 2020. Predicting injury severity of road traffic accidents using a hybrid extreme gradient boosting and deep neural network approach. Laser Scanning Systems in Highway and Safety Assessment: Analysis of Highway Geometry and Safety Using LiDAR, pp.119-127.

Rennie, J.D., Shih, L., Teevan, J. and Karger, D.R., 2003. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 616-623).

Rezapour, M., Farid, A., Nazneen, S. and Ksaibati, K., 2021. Using machine leaning techniques for evaluation of motorcycle injury severity. IATSS research, 45(3), pp.277-285.

Rokach, L., 2010. Ensemble-based classifiers. Artificial intelligence review, 33, pp.1-39.
Ryu, J.W., Kantardzic, M. and Walgampaya, C., 2010. Ensemble classifier based on misclassified streaming data. In Proc. of the 10th IASTED int. Conf. on artificial intelligence and applications, austria (pp. 347-354).

Sahebi, S., Mirbaha, B., Mahpour, A., & Noruzoliaee, M. H. (2015). Predicting pedestrian accidents in rural roads using ordered logit model.

Santos, K., Dias, J.P. and Amado, C., 2022. A literature review of machine learning algorithms for crash injury severity prediction. Journal of safety research, 80, pp.254-269.

Schlögl, M., Stütz, R., Laaha, G. and Melcher, M., 2019. A comparison of statistical learning methods for deriving determining factors of accident occurrence from an imbalanced high resolution dataset. Accident Analysis & Prevention, 127, pp.134-149.

Schütze, H., Manning, C.D. and Raghavan, P., 2008. Introduction to information retrieval (Vol. 39, pp. 234-265). Cambridge: Cambridge University Press.

Shafaati, M., & Boroujerdian, A. M. (2020). Investigating the influential factors in changing the likelihood of involving pedestrians in dangerous situations. AUT Journal of Civil Engineering, 4(3), 357-366.

Shafaati, M., & Saffarzadeh, M. (2023). In light of the automated fare collection data, how did the travel patterns of transit riders in Tehran change following COVID-19?. International Journal of Transportation Engineering.

Shafaati, M., & Saffarzadeh, M. (2024). Does Crowding Have a More Complicated Effect on Public Transport Users with Respect to Perceived Travel Time?. Transportation Research Record, 03611981241230297.

Singh, G., Sachdeva, S.N. and Pal, M., 2018. Comparison of three parametric and machine learning approaches for modeling accident severity on non-urban sections of Indian highways. Advances in transportation studies, 45.

Studer, M., Struffolino, E. and Fasang, A.E., 2018. Estimating the relationship between time-varying covariates and trajectories: The sequence analysis multistate model procedure. Sociological Methodology, 48(1), pp.103-135.

Tang, J., Liang, J., Han, C., Li, Z. and Huang, H., 2019. Crash injury severity analysis using a two-layer Stacking framework. Accident Analysis & Prevention, 122, pp.226-238.

Tayarani Yousefabadi, A., Mahpour, A., Farzin, I., & Mohammadian Amiri, A. (2021). The Assessment of the Change in the Share of Public Transportation by Applying Transportation Demand Management Policies. AUT Journal of Civil Engineering, 5(2), 199-212.

Umer, M., Sadiq, S., Ishaq, A., Ullah, S., Saher, N. and Madni, H.A., 2020. Comparison analysis of tree based and ensembled regression algorithms for traffic accident severity prediction. arXiv preprint arXiv:2010.14921.

Vajari, M.A., Aghabayk, K., Sadeghian, M. and Shiwakoti, N., 2020. A multinomial logit model of motorcycle crash severity at Australian intersections. Journal of safety research, 73, pp.17-24.

Wahab, L. and Jiang, H., 2019. A comparative study on machine learning based algorithms for prediction of motorcycle crash severity. PLoS one, 14(4), p.e0214966.

Weiss, G.M., 2013. Foundations of imbalanced learning. Imbalanced Learning: Foundations, Algorithms, and Applications, pp.13-41.

Yang, J., Han, S. and Chen, Y., 2023. Prediction of Traffic Accident Severity Based on Random Forest. Journal of Advanced Transportation, 2023.

Zhang, H., 2004. The optimality of naive Bayes. Aa, 1(2), p.3.

Zhang, Y., Li, H. and Ren, G., 2023. Analyzing the injury severity in single-bicycle crashes: an application of the ordered forest with some practical guidance. Accident Analysis & Prevention, 189, p.107126.

International Journal of Transportation Engineering

Volume 12, Issue 2 - Serial Number 46
Autumn 2024
Pages 1719-1745

XML

PDF 1.5 M

Article View 577
PDF Download 398

International Journal of Transportation Engineering

Developing a Framework for Selecting an Appropriate Model based on the Ensemble Learning

Volume 12, Issue 2 - Serial Number 46Autumn 2024Pages 1719-1745

Files

Share

How to cite

Statistics

Volume 12, Issue 2 - Serial Number 46
Autumn 2024
Pages 1719-1745