๐Ÿง‘โ€โš–๏ธ Building a Saudi Labor Law AI Assistant โ€” Bilingual, Semantic, and Context-Aware



This content originally appeared on DEV Community and was authored by Zainalabdeen

The Saudi Labor Law is a complex and evolving legal framework. For HR teams, employers, and employees, understanding its details โ€” from leave entitlements to termination rules โ€” often means scrolling through dozens of pages, interpreting legal text, and trying to connect articles to real-world cases.

I wanted to change that.
So I built Saudi Labor Law AI Assistant โ€” an intelligent, bilingual chatbot that answers legal questions instantly, explains relevant articles, and even analyzes employee-specific scenarios โ€” all powered by vector search, LLMs, and semantic retrieval.

📌 Why This Project Matters

The challenges were clear:

⚠ The official English translation of the law is outdated โ€” the Arabic version is the authoritative reference.

📚 Searching manually across legal PDFs is slow and error-prone.

🧠 HR teams need contextual interpretations, not just raw text.

The solution? Combine document parsing, embeddings, vector databases, translation, and LLM reasoning into one end-to-end system that delivers article-backed, trustworthy answers in Arabic or English.

🧠 What the AI Assistant Can Do

Hereโ€™s what the system offers today:

💬 Ask legal questions in Arabic or English โ€” answers come in the same language.

🧾 Analyze real employee cases โ€” like leave eligibility, overtime pay, or termination compensation.

🔍 Retrieve the exact legal articles that support every answer.

🧑‍💼 Integrate employee data (age, salary, service years) into the reasoning process for personalized results.

🌐 Handle bilingual queries with automatic translation and context matching.

🔧 How It Works

The assistant is built on a robust NLP and retrieval pipeline:

📄 PDF Parsing โ€“ The official Arabic labor law is parsed with PyMuPDF, preserving RTL text and diacritics.

🔎 Structured Splitting โ€“ The document is split into parts, chapters, and articles with metadata.

🌐 Translation โ€“ Each article is translated to English using Helsinki-NLP/opus-mt-ar-en for bilingual support.

📊 Vectorization โ€“ Both Arabic and English texts are embedded using intfloat/multilingual-e5-base and stored in a Qdrant vector database.

🤖 Retrieval + Reasoning โ€“ A VectorIndexRetriever fetches the most relevant articles, which are then passed to GPT-4o-mini for grounded, human-readable answers.

📈 Hybrid Search Evaluation โ€“ After testing semantic and hybrid retrieval methods on 1,245 queries, hybrid search proved superior and is used by default.

🧑‍💼 Context-Aware Legal Reasoning

One of the most powerful features is employee-specific reasoning.
For example:

โ€œIs this employee eligible for 30 days of annual leave if he has worked for 6 years?โ€

The chatbot uses employee metadata (service years, salary, leave days, etc.) to reason about the law in context, delivering precise, actionable answers โ€” always citing the original legal article.

🖥 Streamlit Interface

The frontend is built with Streamlit to make the experience intuitive and user-friendly:

🌍 Auto-detect Arabic or English queries.

📄 Optional employee data input.

🔍 Expandable references with similarity scores.

📚 Source tracing from Part โ†’ Chapter โ†’ Article.

🚀 Example in Action

Arabic Example:

👤: ู…ุง ู‡ูŠ ู…ุฏุฉ ุงู„ุฅุฌุงุฒุฉ ุงู„ุณู†ูˆูŠุฉ ุจุนุฏ ุฎู…ุณ ุณู†ูˆุงุช ู…ู† ุงู„ุฎุฏู…ุฉุŸ
🤖: ูŠุณุชุญู‚ ุงู„ุนุงู…ู„ ุซู„ุงุซูŠู† ูŠูˆู…ุงู‹ ู…ู† ุงู„ุฅุฌุงุฒุฉ ุงู„ุณู†ูˆูŠุฉโ€ฆ
📖: ุงุณุชู†ุงุฏู‹ุง ุฅู„ู‰ ุงู„ู…ุงุฏุฉ ุงู„ุชุงุณุนุฉ ุจุนุฏ ุงู„ู…ุงุฆุฉ

English Example:

👤: What are the sick leave entitlements for an employee?
🤖: The employee is entitled to paid sick leave for a specific durationโ€ฆ
📖: Based on Article 117 โ€“ Chapter Four

🧭 Whatโ€™s Next

The project is just getting started. Planned enhancements include:

📑 PDF export of Q&A with references

🧮 HR calculators (end-of-service, overtime, vacation accrual)

🔊 Arabic voice interaction

📊 HR analytics dashboard

🧰 Tech Stack

Component Technology
Frontend Streamlit
LLM GPT-4o-mini
Embeddings intfloat/multilingual-e5-base
Vector DB Qdrant
Retrieval LlamaIndex
Translation Helsinki-NLP/opus-mt-ar-en
Parsing PyMuPDF (fitz)

💡 Saudi Labor Law AI Assistant is open-source and licensed under MIT. Itโ€™s built to make labor law understandable, accessible, and actionable โ€” for HR teams, companies, and employees across Saudi Arabia.

🔗 Explore the Project

👉 GitHub Repository
I build This Project as Final Project Of learning LLm-ZoomCamp Course


This content originally appeared on DEV Community and was authored by Zainalabdeen