Abstract:
This study aims to develop a robust machine learning framework for predicting student learning mastery in Informatics subjects. The research employs a supervised learning approach using assessment-based features derived from student academic records. Due to limitations commonly found in real educational data, including imbalance and data leakage risks, synthetic data generation and feature engineering were applied to support controlled experimentation. Several classification models were implemented to evaluate the stability and consistency of the proposed framework. The results indicate that the models were able to consistently distinguish between students who achieved learning mastery and those who did not. The comparable performance across different modeling approaches suggests that the predictive capability is driven by the methodological design rather than dependence on a specific algorithm. This study demonstrates that machine learning can provide a reliable and interpretable tool to support data-driven evaluation and early intervention in Informatics education.
References
Al-Ali, A., & Qidwai, U. (2025). RULE-BASED MODELING OF LOW-DIMENSIONAL DATA WITH PCA AND BINARY PARTICLE SWARM OPTIMIZATION (BPSO) IN ANFIS. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2502.03895
Babaei, H., Zamani, M., & Mohammadi, S. (2025). THE IMPACT OF DATA SPLITTING METHODS ON MACHINE LEARNING MODELS: A CASE STUDY FOR PREDICTING CONCRETE WORKABILITY. Machine Learning for Computational Science and Engineering, 1(1). https://doi.org/10.1007/s44379-025-00021-3
Dina, A. S., Siddique, A. B., & Manivannan, D. (2022). EFFECT OF BALANCING DATA USING SYNTHETIC DATA ON THE PERFORMANCE OF MACHINE LEARNING CLASSIFIERS FOR INTRUSION DETECTION IN COMPUTER NETWORKS. IEEE Access, 10, 96731–96747. https://doi.org/10.1109/ACCESS.2022.3205337
Embarak, O. H., & Hawarna, S. (2024). AUTOMATED AI-DRIVEN SYSTEM FOR EARLY DETECTION OF AT-RISK STUDENTS. Procedia Computer Science, 231, 151–160. https://doi.org/10.1016/j.procs.2023.12.187
Hanselle, J., Heid, S., Fürnkranz, J., & Hüllermeier, E. (2025). PROBABILISTIC SCORING LISTS FOR INTERPRETABLE MACHINE LEARNING. Machine Learning, 114(3). https://doi.org/10.1007/s10994-024-06705-w
Naidu, G., Zuva, T., & Sibanda, E. M. (2023). A REVIEW OF EVALUATION METRICS IN MACHINE LEARNING ALGORITHMS. In Lecture Notes in Networks and Systems (pp. 15–25). Springer. https://doi.org/10.1007/978-3-031-35314-7_2
Qian, W., Li, S., Yi, P., & Zhang, K. (2019). A NOVEL TRANSFER LEARNING METHOD FOR ROBUST FAULT DIAGNOSIS OF ROTATING MACHINES UNDER VARIABLE WORKING CONDITIONS. Measurement, 138, 514–525. https://doi.org/10.1016/j.measurement.2019.02.073
Sasse, L., Nicolaisen-Sobesky, E., Dukart, J., Eickhoff, S. B., Götz, M., Hamdan, S., Komeyer, V., Kulkarni, A., Lahnakoski, J. M., Love, B. C., Raimondo, F., & Patil, K. R. (2025). OVERVIEW OF LEAKAGE SCENARIOS IN SUPERVISED MACHINE LEARNING. Journal of Big Data, 12(1). https://doi.org/10.1186/s40537-025-01193-8
Schlegel, V., Bharath, A. A., Zhao, Z., & Yee, K. (2025). GENERATING SYNTHETIC DATA WITH FORMAL PRIVACY GUARANTEES: STATE OF THE ART AND THE ROAD AHEAD. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2503.20846
Shafiq, D. A., Marjani, M., Habeeb, R. A. A., & Asirvatham, D. (2022). STUDENT RETENTION USING EDUCATIONAL DATA MINING AND PREDICTIVE ANALYTICS: A SYSTEMATIC LITERATURE REVIEW. IEEE Access, 10, 72480–72503. https://doi.org/10.1109/ACCESS.2022.3188767
Waheed, H., Hassan, S., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2019). PREDICTING ACADEMIC PERFORMANCE OF STUDENTS FROM VLE BIG DATA USING DEEP LEARNING MODELS. Computers in Human Behavior, 104, 106189. https://doi.org/10.1016/j.chb.2019.106189
Wong, B. T., & Li, K. C. (2019). A REVIEW OF LEARNING ANALYTICS INTERVENTION IN HIGHER EDUCATION (2011–2018). Journal of Computers in Education, 7(1), 7–28. https://doi.org/10.1007/s40692-019-00143-7