When Chatbots Run City Hall: How Governments Can Use AI Without Losing Accountability

Introduction
An AI avatar named Diella is now the virtual “minister” for public procurement in Albania. The government claims she will evaluate bids and award contracts, and in the words of Prime Minister Edi Rama, make tenders “100% free of corruption.” Whether you cheered or shuddered at that announcement, one thing is clear: AI chatbots are moving from website helpdesks to the front lines of public service. They are answering taxpayer questions, guiding health inquiries, and in some places, being trusted with decisions that affect people’s rights, wallets, and well-being. The Guardian report and other coverage show how quickly this future is arriving.
This article breaks down a complex and fast-moving debate for anyone curious about the future of government services. We will walk through what governments are actually deploying, why they are tempted to go further, what can go wrong, and how to design AI chatbots that improve public services without eroding accountability. Along the way, we verify claims from recent headlines and point you to credible resources and rules now shaping this space.
What Just Happened in Albania, and Why It Matters
- The Announcement: In September 2025, Albania appointed an AI assistant, Diella, to serve as a virtual cabinet member for public procurement. The government states Diella will gradually take over awarding tenders to reduce bribery and political influence. Coverage from The Guardian and Euronews confirm the announcement.
- The Goal: To achieve objectivity, speed, and transparency. The logic is that with one AI “official” applying stable criteria, procurement can be more consistent and less corruptible.
- The Concerns: Critics warn the system could simply hide human influence behind a technical facade or fail in opaque ways that are hard to challenge. Even supporters stress that human oversight, appeal rights, and audit trails are non-negotiable. This debate is captured in commentary like The Washington Post op-ed urging observers to take Diella seriously. Read the argument here.
Why Governments Are Tempted to Use AI Chatbots
- Backlogs and Bottlenecks: In England and Wales, the Crown Court backlog hit a record 76,957 outstanding cases by March 2025, with wait times stretching for years. This pressure creates a strong push to automate routine tasks and initial screenings. Coverage from The Standard and subsequent reporting document the rise.
- Demand Outpacing Capacity: NHS mental health services are under sustained strain. Data from NHS Providers shows a record 2.12 million people in contact with services in April 2025. The Royal College of Psychiatrists has repeatedly warned about long waits and funding pressure. See NHS Providers’ tracker and RCPsych statements for context.
- 24/7 Service Expectations: Agencies field millions of simple queries that can often be resolved with scripted guidance. For example, the U.S. Internal Revenue Service reports its voice and chatbots have helped more than 13 million taxpayers since 2022, deflecting calls and speeding up routine interactions. IRS details here.
What Can Go Wrong When Bots Move from Help to Decisions
1) The Accountability Gap
When a chatbot gets it wrong, who is responsible? A February 2024 case in Canada offers a cautionary tale. Air Canada’s website chatbot misled a customer about bereavement fares; the airline argued the bot was a “separate legal entity.” A tribunal called that defense “remarkable” and held the company liable, ordering compensation. The lesson for the public sector is clear: if a government bot misleads someone, the agency is still accountable. Case summaries here and news coverage.
2) Opaque Systems and Weak Guardrails
Audit offices and watchdogs are starting to probe government AI deployments. In September 2025, the Queensland Audit Office reviewed an internal government chatbot (QChat) and image-recognition enforcement cameras. It found uneven oversight, low training uptake, and risks around privacy and misinformation, and it urged government-wide monitoring and ethics reviews. Read the audit summary.
3) Substitution Risks in Sensitive Domains
Chatbots are increasingly used for mental health support. Reports in the UK suggest millions of adults have tried AI tools for mental health advice, often citing long waits for care. Experts warn that AI can appear empathetic while offering harmful or inaccurate guidance, or fostering unhealthy dependence. The NHS has cautioned against using general-purpose chatbots as therapy substitutes, especially for young people. See recent coverage and NHS warnings.
4) Misleading Narratives About “AI Judges”
Estonia is often cited as piloting AI judges. In fact, the country’s Ministry of Justice clarified in 2022 that there is no project to replace judges with AI. Estonia is automating procedural steps and exploring limited automation in specific areas, but humans make judicial decisions. Official clarification. In contrast, Germany’s courts are testing AI to assist with mass, look-alike cases like flight delay claims, while keeping judges firmly in charge. See North Rhine-Westphalia’s MAKI project announcement.
Where Chatbots Work Well Today
- Answering high-volume routine questions: The IRS’s bots free up human agents for complex cases, and the system offers a clear hand-off to a human representative. IRS overview.
- Wayfinding and form guidance: USCIS’s long-running virtual assistant “Emma” helps visitors navigate the website and find official answers, with an escalation path to live agents. USCIS page.
- Local services with clear boundaries: Councils in the UK publish transparency records, define when a user should be routed to a human, and label bots clearly as non-human. Example record: Leeds City Council Money Information Centre Chatbot.
What the Rules Say (and Why They Matter)
- European Union: The EU AI Act classifies AI tools used to assist judges or influence elections as “high-risk,” requiring strict controls, documentation, and human decision-making. If your public chatbot affects rights or eligibility, expect compliance duties. Official Journal text.
- United Kingdom: Central government departments must publish Algorithmic Transparency Recording Standard (ATRS) entries for in-scope tools and keep them updated. This is quickly becoming standard practice across public services. ATRS hub and template.
- United States (federal): In March 2024, the Office of Management and Budget (OMB) required agencies to implement safeguards for AI that impacts rights and safety, designate Chief AI Officers, and maintain AI inventories. In 2025, the White House updated its AI policies to speed up adoption while maintaining strict risk management and governance. Agencies are still required to establish risk management practices and appoint Chief AI Officers (CAIOs). See the 2024 OMB fact sheet and Reuters coverage of 2025 changes.
- Risk frameworks to operationalize: NIST’s AI Risk Management Framework and its Generative AI profile offer practical guidance for governance, measurement, testing, and human oversight. NIST AI RMF.
Design Principles for Public-Sector Chatbots
If you are building or buying a government chatbot, especially one that could influence outcomes for residents or businesses, treat it like critical infrastructure. Start with these principles.
1) Be Crystal-Clear About Scope and Authority
- Label the bot as non-human, every time. Avoid avatars that imply a real person unless there is a person behind them.
- Publish the bot’s remit in plain language: what it can and cannot do, what data it sees, and how its output will be used.
- If the bot can influence or recommend decisions (eligibility, prioritization, fines), classify it as a high-risk tool and apply your strictest controls. EU guidance is explicit that tools assisting judicial authorities are high-risk and must not replace final human decisions. EU AI Act recital 61.
2) Keep a Human in the Loop—and on the Hook
- People must be able to reach a human quickly—by phone or live chat—and appeal any decision influenced by a bot.
- Assign a named senior owner (or board) inside the agency. They are accountable for the bot’s performance, impact, and continuous improvement.
- Publish an Algorithmic Transparency Record (or equivalent) and update it when things change. The UK’s ATRS is a practical model. ATRS guidance.
3) Design for Audits, FOI, and Discovery from Day One
- Log prompts, retrieved knowledge, and outputs—with privacy protections—so staff can explain how an answer was generated and respond to Freedom of Information requests and litigation.
- Implement role-based access, redaction, and retention controls so logs can be shared without exposing sensitive data.
- Adopt open reporting on model updates and known issues.
4) Test the Bot and Measure Real-World Impact
- Test against domain-specific harms: misinformation, denial of service, procedural unfairness, data leakage, and prompt injection.
- Track user outcomes: Did the bot actually reduce wait times? Did it mis-route vulnerable users? Adjust or shut down features that don’t meet your service goals.
- Use a recognized framework, such as the NIST AI RMF and its Generative AI profile, to structure testing, documentation, and risk acceptance. NIST profile.
5) Never Replace Clinical or Legal Judgment with a General-Purpose Chatbot
- Health, mental health, and legal contexts require domain-specific guardrails, licensed professionals, and clear escalation paths. Independent reporting in the UK has highlighted the risks of over-reliance, with the NHS warning against treating chatbots as therapists. See reporting.
6) Procure for Accountability, Not Just Features
- Require vendors to provide safety documentation, facilitate audits, support transparency reporting, and indemnify the agency for misrepresentations.
- Specify that training or fine-tuning data and prompt libraries used for your government deployment meet your legal requirements and can be reviewed under appropriate confidentiality.
A Practical Build-or-Buy Checklist
Before Launch
- Governance
- Name a senior responsible owner and establish an AI governance board.
- Classify the use case: informational, triage, or decision-support. Anything impacting rights gets the highest scrutiny.
- Privacy and Security
- Analyze the chatbot for security threats. Implement data minimization, data loss prevention (DLP), and secure credential management.
- Provide a clear consent and privacy notice; avoid collecting sensitive data unless essential.
- Model and Content
- Use retrieval-augmented generation (RAG) with a vetted knowledge base. Freeze and version your content. Avoid unconstrained web searches in production.
- Configure strong system prompts, guardrails, and explicit refusal patterns for high-risk topics.
- Testing
- Test with internal experts and external testers; include adversarial prompts common in your domain.
- Run A/B or canary launches and measure deflection, accuracy, time-to-resolution, and escalation rates.
- Transparency
- Publish a transparency record (purpose, data, governance, risk mitigations, contact email).
- Label the bot as a bot; disclose its limits and escalation paths.
After Launch
- Continuous Monitoring
- Track KPI and harm metrics weekly; investigate anomalies quickly.
- Sample and review transcripts for quality and fairness.
- Oversight and Feedback
- Hold monthly governance reviews; publish change logs.
- Provide easy user feedback tools and act on the feedback.
- Training and Culture
- Train staff not to paste protected or personal data into prompts unless the deployment is specifically architected for that purpose.
- Reward escalation; do not penalize staff for handing difficult cases to humans.
Case Studies and Lessons Learned
- Albania’s Diella: Lesson: If an AI tool makes procurement decisions, the bar for transparency and the ability to contest decisions must be extremely high. Publish the award criteria, the model’s evaluation steps, the human oversight structure, and clear appeal routes for bidders. It is also vital to disclose who can change prompts, weightings, or data sources. Background reporting.
- Germany’s MAKI Assistant: Lesson: Focusing AI on administrative heavy-lifting (extracting metadata, comparing text, generating drafts) can free up judges to weigh complex questions while preserving human accountability for outcomes. Project note.
- Queensland’s Audit of QChat: Lesson: Even internal bots need clear acceptable-use rules, training, and ongoing oversight. The audit also shows the importance of maintaining inventories to know where AI is used and what risks it poses. Audit summary.
- IRS Customer Service Bots: Lesson: Always provide a visible “talk to a person” option. The goal is to resolve simple questions faster, not trap people in automation. IRS overview.
Reality Check on Demand for Mental Health Chatbots
Reports suggest that more than 10 million UK adults have tried AI chatbots for mental health support. That figure comes from a commissioned survey, not official NHS data, so treat it as an indicator of public interest, not clinical effectiveness. What is not in doubt is that demand for NHS mental health services is at record levels, and waiting lists remain long. This is precisely why clear boundaries, supervised use, and fast handoffs to human care are essential. Survey-based media reporting and NHS Providers data.
The Bottom Line: Replace Tasks, Not Responsibility
AI chatbots can help governments be more responsive, precise, and fair—but only when they are scoped carefully, governed transparently, and backed by human accountability. They can deflect routine questions, triage cases, and draft standard letters. They should not decide who gets benefits or contracts without visible human oversight and meaningful rights to appeal.
As the EU AI Act, UK transparency standards, and U.S. federal policy all make clear, public trust depends on clarity about how AI is used and who answers for its mistakes. The goal is not to automate away public servants but to equip them with safer, smarter tools and to keep them answerable to the public they serve. EU AI Act and OMB policy background.
FAQs
Q1) Is it legal for governments to use chatbots for public decisions?
Yes, but the legal standard depends on the jurisdiction and the decision’s impact. The EU AI Act treats tools that assist judges as high-risk. In the U.S., OMB guidance requires risk controls, human oversight, and transparency for AI that impacts rights and safety. If a bot could affect eligibility or penalties, you must design for maximum transparency and provide human appeal routes. EU AI Act, OMB policy.
Q2) Do chatbots really save agencies time and money?
Often, yes—for routine queries. The IRS reports millions of resolved interactions and reduced call volumes. However, savings disappear if bots misroute users or create rework, so it is crucial to watch real-world metrics, not just lab tests. IRS metrics.
Q3) What about bias or hallucinations?
Use retrieval from vetted sources, limit free-form generation, and test with diverse, real user prompts. Keep a human in the loop for any advice that impacts rights. Adopt a framework like the NIST AI RMF to structure governance and post-deployment monitoring. NIST AI RMF.
Q4) Can a chatbot be a “minister” or a “judge”?
Politically, some leaders may try it; legally and practically, it raises major accountability problems. Estonia debunked headlines about “AI judges,” and Germany is piloting AI as an assistant tool, not a replacement. If an AI is given formal authority, the system still needs a named human accountable for its outputs and clear, fast ways for people to contest those outputs. Estonia clarification, Germany’s MAKI pilot.
Q5) How should agencies be transparent about their bots?
Follow or emulate the UK’s Algorithmic Transparency Recording Standard (ATRS): publish a plain-language record explaining how the tool works, what data it uses, and how it is governed. Update the record whenever something significant changes. ATRS hub.
Conclusion
AI chatbots are no longer novelties in government—they are becoming part of the service fabric. This is good news if we treat them as assistants that boost capacity and consistency, but it is bad news if we treat them as moral agents that can carry the weight of public judgment. The safest path is also the most effective: replace tasks, not responsibility. Keep humans in charge, publish what the bot does, measure outcomes, and make it easy for the public to get to a person—and to justice—when it matters.
Thank You for Reading this Blog and See You Soon! 🙏 👋
Let's connect 🚀
Latest Blogs
Read My Latest Blogs about AI

AI’s Uneven Takeoff: Insights from Anthropic’s 2025 Economic Index on Work Transformation
Discover insights from Anthropic's 2025 Economic Index, highlighting faster but uneven AI adoption trends. Understand where usage is concentrated and how enterprises are adapting.
Read more