An Experimental Assessment of AI-Based Legal Decision-Making Systems in Contract Analysis and Risk Detection
Main Article Content
Abstract
This comprehensive experimental study evaluates the performance, reliability, and practical applicability of AI-based legal decision-making systems in contract analysis and risk detection. Utilizing a corpus of 5,247 contracts with expert-validated annotations from 12 legal professionals, we benchmark four classes of AI systems—rule-based, supervised machine learning (XGBoost), fine-tuned transformer models (Legal-BERT), and large language models (GPT-4, Claude 3)—across multiple dimensions critical to legal practice. The Key Findings of this research : a) Performance Variability: Fine-tuned Legal-BERT achieved the highest overall clause classification F1-score (0.923, 95% CI [0.917, 0.929]), but exhibited significant degradation in cross-jurisdictional applications (28.4% performance drop from US to UK contracts). b) Risk Detection Gaps: All systems demonstrated decreasing recall with increasing risk severity. GPT-4 missed 18.2% of high-severity risks (severity ≥4), while Legal-BERT missed 12.3% of total risk severity weight (FNRP metric). c)Decision Inconsistency: LLMs showed substantial inconsistency, with GPT-4 achieving only 0.81 intra-model Jaccard similarity across identical inputs and 14.7% decision variation on identical clause phrasings. d)Domain-Specific Performance: Rule-based systems performed adequately on standardized agreements (NDA: F1=0.812) but failed catastrophically on complex contracts (M&A: F1=0.432). e) Cost-Effectiveness: Local fine-tuned models provided 92.3% of GPT-4's performance at 3.5% of the cost ($0.0087 vs $0.2478 per document).We introduce two novel legal-specific metrics—False-Negative Risk Penalty (FNRP) and Severity-Weighted F1 (SwF1)—that better capture the asymmetric cost structure of legal errors. Based on our empirical findings, we propose a three-tier human-in-the-loop deployment framework that reduces attorney review time by 64% while maintaining 99.7% risk coverage. The study establishes evidence-based performance thresholds for safe deployment, recommending against autonomous use of any system with FNRP > 0.15 or cross-jurisdiction performance degradation > 25%. Our findings challenge optimistic claims of AI autonomy in legal decision-making and provide a rigorous, reproducible framework for evaluating legal AI systems in practice.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
D. A. Pashentsev and Y. G. Babaeva, “Artificial intelligence in law-making and law enforcement: Risks and new opportunities,” Vestnik Sankt-Peterburgskogo Universiteta. Pravo, vol. 15, no. 2, 2024, doi: 10.21638/spbu14.2024.214. DOI: https://doi.org/10.21638/spbu14.2024.214
Dr. S. Borade, “DETEK-AI: A Web-based Deepfake Detection System,” INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, vol. 08, no. 05, 2024, doi: 10.55041/ijsrem31285. DOI: https://doi.org/10.55041/IJSREM31285
“A comprehensive study of Cybercrime and Digital Forensics through Machine Learning and AI,” Al Rafidain Journal of Engineering Sciences, vol. 3, no. 1, 2025, doi: 10.61268/hff1pp49. DOI: https://doi.org/10.61268/hff1pp49
P. Haley and D. N. Burrell, “Using Artificial Intelligence in Law Enforcement and Policing to Improve Public Health and Safety,” Law, Economics and Society, vol. 1, no. 1, 2025, doi: 10.30560/les.v1n1p46. DOI: https://doi.org/10.30560/les.v1n1p46
K. A. Talukder and T. F. Shompa, “ARTIFICIAL INTELLIGENCE IN CRIMINAL JUSTICE MANAGEMENT: A SYSTEMATIC LITERATURE REVIEW,” Non human journal, vol. 1, no. 01, 2024, doi: 10.70008/jmldeds.v1i01.42. DOI: https://doi.org/10.70008/jmldeds.v1i01.42
D. M. Makhmudova and S. M. Almufti, “Hybrid Metaheuristic Frameworks for Multi-Objective Engineering Optimization Problems,” Qubahan Techno Journal, vol. 3, no. 1, pp. 1–14, Feb. 2024, doi: 10.48161/qtj.v3n1a23. DOI: https://doi.org/10.48161/qtj.v3n1a23
S. M. Almufti and S. R. M. Zeebaree, “Leveraging Distributed Systems for Fault-Tolerant Cloud Computing: A Review of Strategies and Frameworks,” Academic Journal of Nawroz University, vol. 13, no. 2, pp. 9–29, May 2024, doi: 10.25007/ajnu.v13n2a2012. DOI: https://doi.org/10.25007/ajnu.v13n2a2012
L. S. F. Lin, “Organisational Challenges in US Law Enforcement’s Response to AI-Driven Cybercrime and Deepfake Fraud,” Laws, vol. 14, no. 4, 2025, doi: 10.3390/laws14040046. DOI: https://doi.org/10.3390/laws14040046
L. Tang and C. Shen, “Multimodal AI-driven object detection with uncertainty quantification for cardiovascular risk assessment in autistic patients,” Front Cardiovasc Med, vol. 12, 2025, doi: 10.3389/fcvm.2025.1606159. DOI: https://doi.org/10.3389/fcvm.2025.1606159
M. Sarfraz, I. A. Sumra, B. Khalid, and E. Fatima, “AI-Driven Predictive Threat Detection and Cyber Risk Mitigation: A Survey,” Journal of Computing & Biomedical Informatics, vol. 8, no. 2, 2025.
S. M. Almufti, B. Wasfi Salim, and R. Rajab Asaad, “Automatic Verification for Handwritten Based on GLCM Properties and Seven Moments,” Academic Journal of Nawroz University, vol. 12, no. 1, pp. 130–136, Feb. 2023, doi: 10.25007/ajnu.v12n1a1651. DOI: https://doi.org/10.25007/ajnu.v12n1a1651
M. C. Dela Cruz, S. M. Almufti, and J. Bošković, “Portable Few-Shot Learning for Early Warning Systems in Small Private Online Courses: A CNN-Based Predictive Framework for Student Performance,” Qubahan Techno Journal, vol. 3, no. 4, pp. 1–13, Dec. 2024, doi: 10.48161/qtj.v3n4a42. DOI: https://doi.org/10.48161/qtj.v3n4a42
T. Miller, I. Durlik, E. Kostecka, S. Sokołowska, P. Kozlovska, and R. Zwolak, “Artificial Intelligence in Maritime Cybersecurity: A Systematic Review of AI-Driven Threat Detection and Risk Mitigation Strategies,” 2025. doi: 10.3390/electronics14091844. DOI: https://doi.org/10.3390/electronics14091844
Ç. Sıcakyüz, R. Rajab Asaad, S. Almufti, and N. R. Rustamova, “Adaptive Deep Learning Architectures for Real-Time Data Streams in Edge Computing Environments,” Qubahan Techno Journal, vol. 3, no. 2, pp. 1–14, Jun. 2024, doi: 10.48161/qtj.v3n2a25. DOI: https://doi.org/10.48161/qtj.v3n2a25
H. A. Hakim, C. B. E. Praja, and S. Ming-Hsi, “AI in Law: Urgency of the Implementation of Artificial Intelligence on Law Enforcement in Indonesia,” Jurnal Hukum Novelty, vol. 14, no. 1, 2023, doi: 10.26555/novelty.v14i1.a25943. DOI: https://doi.org/10.26555/novelty.v14i1.a25943
I. A. Olubiyi, Rahamat Oyedeji-Oduyale, and Damilola M.Adeniyi, “ARTIFICIAL INTELLIGENCE AND THE LAW: AN OVERVIEW,” ABUAD Law Journal, vol. 12, no. 1, 2024, doi: 10.53982/alj.2024.1201.01-j. DOI: https://doi.org/10.53982/alj.2024.1201.01-j
M. Araszkiewicz, T. Bench-Capon, E. Francesconi, M. Lauritsen, and A. Rotolo, “Thirty years of Artificial Intelligence and Law: overviews,” Artif Intell Law (Dordr), vol. 30, no. 4, 2022, doi: 10.1007/s10506-022-09324-9. DOI: https://doi.org/10.1007/s10506-022-09324-9
I. K. Nti, S. Boateng, J. A. Quarcoo, and P. Nimbe, “Artificial Intelligence Application in Law: A Scientometric Review,” 2024. doi: 10.47852/bonviewAIA3202729. DOI: https://doi.org/10.47852/bonviewAIA3202729
M. Abdel-Basset, R. Mohamed, S. A. A. Azeem, M. Jameel, and M. Abouhawwash, “Kepler optimization algorithm: A new metaheuristic algorithm inspired by Kepler’s laws of planetary motion,” Knowl Based Syst, vol. 268, p. 110454, May 2023, doi: 10.1016/j.knosys.2023.110454. DOI: https://doi.org/10.1016/j.knosys.2023.110454
J. Lee, Artificial Intelligence and International Law. 2022. doi: 10.1007/978-981-19-1496-6. DOI: https://doi.org/10.1007/978-981-19-1496-6
A. V. Minbaleev, “THE CONCEPT OF ‘ARTIFICIAL INTELLIGENCE’ IN LAW,” Bulletin of Udmurt University. Series Economics and Law, vol. 32, no. 6, 2022, doi: 10.35634/2412-9593-2022-32-6-1094-1099. DOI: https://doi.org/10.35634/2412-9593-2022-32-6-1094-1099
D. J. Brand, “Algorithmic decision-making and the law,” eJournal of eDemocracy and Open Government, vol. 12, no. 1, 2020, doi: 10.29379/jedem.v12i1.576. DOI: https://doi.org/10.29379/jedem.v12i1.576