Developing an Intelligent Tutoring System for Personalized Skill Development Using Reinforcement Learning
Main Article Content
Abstract
The one-size-fits-all model of traditional education is increasingly inadequate for addressing diverse learner needs. Intelligent Tutoring Systems (ITS) offer a solution but are often limited by static, hand-crafted pedagogical rules that cannot optimize long-term learning trajectories. This paper presents the design, implementation, and empirical validation of RL-Tutor, a novel ITS that leverages Deep Reinforcement Learning (RL) to provide dynamic, personalized instruction. RL-Tutor integrates a Deep Knowledge Tracing model based on a Dynamic Key-Value Memory Network (DKVMN) to maintain a rich, continuous representation of the student's knowledge state. This state serves as the input to a Proximal Policy Optimization (PPO) agent, which functions as the pedagogical module, selecting optimal actions from a hierarchical space including problem selection, hint provision, and instructional review. A critical contribution is the formulation of a multi-faceted reward function that balances immediate performance, learning efficiency, and long-term knowledge retention. Due to the sample inefficiency of RL, the agent was first trained in a high-fidelity simulated environment with a population of 10,000 synthetic students. The trained system was then evaluated against a rule-based tutor and a static tutor in a between-subjects human study (N=90) in the domain of introductory Python programming. Results show that RL-Tutor led to significantly higher normalized learning gains (0.72 vs. 0.58 for rule-based and 0.45 for static, p < 0.01) and better retention in a one-week delayed post-test. Analysis of the learned policy revealed emergent, pedagogically sound strategies such as adaptive hinting and implicit spaced repetition. This work establishes that RL can autonomously discover complex, effective teaching policies that are tailored to individual learners and outperform traditional ITS architectures.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
M. Benvenuti et al., “Artificial intelligence and human behavioral development: A perspective on new skills and competences acquisition for the educational context,” Comput Human Behav, vol. 148, 2023, doi: 10.1016/j.chb.2023.107903. DOI: https://doi.org/10.1016/j.chb.2023.107903
F. Niño-Rojas, D. Lancheros-Cuesta, M. T. P. Jiménez-Valderrama, G. Mestre, and S. Gómez, “Systematic Review: Trends in Intelligent Tutoring Systems in Mathematics Teaching and Learning,” International Journal of Education in Mathematics, Science and Technology, vol. 12, no. 1, 2023, doi: 10.46328/ijemst.3189. DOI: https://doi.org/10.46328/ijemst.3189
G. N. Vivekananda et al., “Retracing-efficient IoT model for identifying the skin-related tags using automatic lumen detection,” Intelligent Data Analysis, vol. 27, pp. 161–180, 2023, doi: 10.3233/IDA-237442. DOI: https://doi.org/10.3233/IDA-237442
J. A. Esponda-Pérez, M. A. Mousse, S. M. Almufti, I. Haris, S. Erdanova, and R. Tsarev, “Applying Multiple Regression to Evaluate Academic Performance of Students in E-Learning,” 2024, pp. 227–235. doi: 10.1007/978-3-031-70595-3_24. DOI: https://doi.org/10.1007/978-3-031-70595-3_24
J. A. Esponda-Pérez et al., “Application of Chi-Square Test in E-learning to Assess the Association Between Variables,” 2024, pp. 274–281. doi: 10.1007/978-3-031-70595-3_28. DOI: https://doi.org/10.1007/978-3-031-70595-3_28
P. H. Nguyen, S. M. Almufti, J. A. Esponda-Pérez, D. Salguero García, I. Haris, and R. Tsarev, “The Impact of E-Learning on the Processes of Learning and Memorization,” 2024, pp. 218–226. doi: 10.1007/978-3-031-70595-3_23. DOI: https://doi.org/10.1007/978-3-031-70595-3_23
Yang, R., Yu, F. R., Si, P., Yang, Z., & Zhang, Y. (2019). Integrated blockchain and edge computing systems: A survey, some research issues and challenges. IEEE Communications Surveys & Tutorials, 21(2), 1508-1532. DOI: https://doi.org/10.1109/COMST.2019.2894727
Sonmez, C., Ozgovde, A., & Ersoy, C. (2018). Edgecloudsim: An environment for performance evaluation of edge computing systems. Transactions on Emerging Telecommunications Technologies, 29(11), e3493. DOI: https://doi.org/10.1002/ett.3493
M. K. Sharma, H. A. Alkhazaleh, S. Askar, N. H. Haroon, S. M. Almufti, and M. R. Al Nasar, “FEM-supported machine learning for residual stress and cutting force analysis in micro end milling of aluminum alloys,” International Journal of Mechanics and Materials in Design, vol. 20, no. 5, pp. 1077–1098, Oct. 2024, doi: 10.1007/s10999-024-09713-9. DOI: https://doi.org/10.1007/s10999-024-09713-9
S. M. Abdulrahman, R. R. Asaad, H. B. Ahmad, A. Alaa Hani, S. R. M. Zeebaree, and A. B. Sallow, “Machine Learning in Nonlinear Material Physics,” Journal of Soft Computing and Data Mining, vol. 5, no. 1, Jun. 2024, doi: 10.30880/jscdm.2024.05.01.010. DOI: https://doi.org/10.30880/jscdm.2024.05.01.010
Abualkishik, A. Z., Alwan, A. A., & Gulzar, Y. (2020). Disaster recovery in cloud computing systems: An overview. International Journal of Advanced Computer Science and Applications, 11(9). DOI: https://doi.org/10.14569/IJACSA.2020.0110984
Varghese, B., & Buyya, R. (2018). Next generation cloud computing: New trends and research directions. Future generation computer systems, 79, 849-861. DOI: https://doi.org/10.1016/j.future.2017.09.020
A. B. Sallow, R. R. Asaad, H. B. Ahmad, S. Mohammed Abdulrahman, A. A. Hani, and S. R. M. Zeebaree, “Machine Learning Skills To K–12,” Journal of Soft Computing and Data Mining, vol. 5, no. 1, Jun. 2024, doi: 10.30880/jscdm.2024.05.01.011. DOI: https://doi.org/10.30880/jscdm.2024.05.01.011
H. B. Ahmad, R. R. Asaad, S. M. Almufti, A. A. Hani, A. B. Sallow, and S. R. M. Zeebaree, “SMART HOME ENERGY SAVING WITH BIG DATA AND MACHINE LEARNING,” Jurnal Ilmiah Ilmu Terapan Universitas Jambi, vol. 8, no. 1, pp. 11–20, May 2024, doi: 10.22437/jiituj.v8i1.32598. DOI: https://doi.org/10.22437/jiituj.v8i1.32598
D. A. Majeed et al., “DATA ANALYSIS AND MACHINE LEARNING APPLICATIONS IN ENVIRONMENTAL MANAGEMENT,” Jurnal Ilmiah Ilmu Terapan Universitas Jambi, vol. 8, no. 2, pp. 398–408, Sep. 2024, doi: 10.22437/jiituj.v8i2.32769.
D. A. Majeed et al., “DATA ANALYSIS AND MACHINE LEARNING APPLICATIONS IN ENVIRONMENTAL MANAGEMENT,” Jurnal Ilmiah Ilmu Terapan Universitas Jambi, vol. 8, no. 2, pp. 398–408, Sep. 2024, doi: 10.22437/jiituj.v8i2.32769. DOI: https://doi.org/10.22437/jiituj.v8i2.32769
UU Republik Indonesia et al., “PENENTUAN ALTERNATIF LOKASI TEMPAT PEMBUANGAN AKHIR (TPA) SAMPAH DI KABUPATEN SIDOARJO,” Energies (Basel), vol. 15, no. 1, 2022.
S. M. Almufti et al., “INTELLIGENT HOME IOT DEVICES: AN EXPLORATION OF MACHINE LEARNING-BASED NETWORKED TRAFFIC INVESTIGATION,” Jurnal Ilmiah Ilmu Terapan Universitas Jambi, vol. 8, no. 1, pp. 1–10, May 2024, doi: 10.22437/jiituj.v8i1.32767. DOI: https://doi.org/10.22437/jiituj.v8i1.32767
Zhou, F., Wu, Y., Hu, R. Q., & Qian, Y. (2018). Computation rate maximization in UAV-enabled wireless-powered mobile-edge computing systems. IEEE Journal on Selected Areas in Communications, 36(9), 1927-1941. DOI: https://doi.org/10.1109/JSAC.2018.2864426
Tang, M., & Wong, V. W. (2020). Deep reinforcement learning for task offloading in mobile edge computing systems. IEEE transactions on mobile computing, 21(6), 1985-1997. DOI: https://doi.org/10.1109/TMC.2020.3036871
R. Rajab Asaad, R. Ismael Ali, Z. Arif Ali, and A. Ahmad Shaaban, “Image Processing with Python Libraries,” Academic Journal of Nawroz University, vol. 12, no. 2, pp. 410–416, Jun. 2023, doi: 10.25007/ajnu.v12n2a1754. DOI: https://doi.org/10.25007/ajnu.v12n2a1754
Moon, J., Ma, W., Shin, J. H., Cai, F., Du, C., Lee, S. H., & Lu, W. D. (2019). Temporal data classification and forecasting using a memristor-based reservoir computing system. Nature Electronics, 2(10), 480-487. DOI: https://doi.org/10.1038/s41928-019-0313-3
Z. Liu, P. Agrawal, S. Singhal, V. Madaan, M. Kumar, and P. K. Verma, “LPITutor: An LLM based personalized intelligent tutoring system using RAG and prompt engineering,” PeerJ Comput Sci, vol. 11, 2025, doi: 10.7717/peerj-cs.2991. DOI: https://doi.org/10.7717/peerj-cs.2991
A. Ahmed Shaban, S. M. Almufti, and R. B. Marqas, “A Modified Bat Algorithm for Economic Dispatch with Enhanced Performance Metrics,” FMDB Transactions on Sustainable Technoprise Letters, vol. 3, no. 2, pp. 59–72, Jun. 2025, doi: 10.69888/ftstpl.2025.000437. DOI: https://doi.org/10.69888/FTSTPL.2025.000437
S. M. Almufti, R. B. Marqas, Z. A. Nayef, and T. S. Mohamed, “Real Time Face-mask Detection with Arduino to Prevent COVID-19 Spreading,” Qubahan Academic Journal, vol. 1, no. 2, pp. 39–46, Apr. 2021, doi: 10.48161/qaj.v1n2a47. DOI: https://doi.org/10.48161/qaj.v1n2a47
S. M. Almufti and A. M. Abdulazeez, “An Integrated Gesture Framework of Smart Entry Based on Arduino and Random Forest Classifier,” Indonesian Journal of Computer Science, vol. 13, no. 1, Feb. 2024, doi: 10.33022/ijcs.v13i1.3735. DOI: https://doi.org/10.33022/ijcs.v13i1.3735