We Don't Do Safety Scores.

Safety scores are liability theater. We document where your chatbot's epistemic integrity holds and where it collapses.

Request Assessment ->

When Chatbots Harm

Companion

Character.AI and Google settled wrongful death lawsuits. Their companion chatbots contributed to teen suicides. Kentucky's AG sued, alleging they were "preying on children."

Healthcare

NEDA's eating disorder chatbot gave weight loss advice to users seeking help for anorexia. Shut down within days of launch.

Contractual

Air Canada was held liable when its chatbot invented a bereavement fare policy. The airline couldn't disclaim responsibility for its bot's false statements.

20 documented deaths linked to LLMs since 2024 | Zero major AI deployments have published harm incident rates
Liability Exposure

Your Chatbot Carries Legal Risk

Every consumer-facing chatbot carries real liability. We map exactly where.

PRODUCTS LIABILITY  ·  NEGLIGENCE  ·  FRAUD  ·  CONSTITUTIONAL LIABILITY  ·  CONTRACTUAL LIABILITY  ·  UNFAIR TRADE PRACTICES  ·  MEDICAL MALPRACTICE  ·  SYSTEMIC DEFICIENCIES  ·  DISCRIMINATORY PRACTICES  ·  PRODUCTS LIABILITY  ·  NEGLIGENCE  ·  FRAUD  ·  CONSTITUTIONAL LIABILITY  ·  CONTRACTUAL LIABILITY  ·  UNFAIR TRADE PRACTICES  ·  MEDICAL MALPRACTICE  ·  SYSTEMIC DEFICIENCIES  ·  DISCRIMINATORY PRACTICES  · 
Our Approach

Beyond Accuracy

Standard AI evaluation asks: Is this response correct?

That's the wrong question. We ask: Should this system be answering at all?

The chatbots that hurt people aren't giving wrong answers. They're giving answers they had no business giving, with unearned certainty and credibility, to people who can't tell the difference.

Diagnostic, Not Decisional

We show you where things broke. This is not a rubber-stamped certification, but an X-ray. You own the decisions that follow.

Longitudinal, Not Static

Your chatbot doesn't fail on question one. It doesn't fail on question ten. It fails somewhere around turn 50. We test extending conversations between 30-100 turns long. That's where the danger lives.

Suspension as Success

Most evaluators ding a system for saying "I don't know." We don't. Knowing your limits is the safest thing a chatbot can do. We prove it.

What Pilot Clients Receive

You cannot certify a probabilistic system. You can document one. We give you four documents.

Four Diagnostic Artifacts

Drift Maps

Where conversations went wrong

Turn-by-turn analysis showing where a conversation lost its footing, mapped to specific harm thresholds.

Coherence Reports

Where logic broke down

Maps where your system traded truth for fluency. Identifies what it doesn't know it doesn't know.

Calibration Audits

Where confidence exceeded warrant

Documents every instance where your system spoke in certainties it hadn't earned, ranked by how badly that could hurt someone.

Suspension Logs

Where your system appropriately refused

Evidence your system knew its limits and chose honesty over the performance of helpfulness.

Evaluation Process

1

Baseline Assessment

Weeks 1–2: Map baseline behavior across our full test range

2

Longitudinal Stress Testing

Weeks 3–4: Deep testing: 30 to 100 turns, where the real behavior surfaces

3

Domain Expert Review

Weeks 5–6: Domain experts review every finding

4

Documentation & Guidance

Weeks 7–8: Full documentation: what we found, where, and what it means

Total: 6–10 weeks

Where Consumer Chatbots Operate
And Where People Are Hurt

This is where the harm concentrates:

Potential Harm →

Companion & Emotional Support Bots

Character.AI-style companions with parasocial relationships and vulnerable users

Vulnerable users Documented deaths

Healthcare Chatbots

Mental health support, symptom checkers, crisis lines where misguided advice can be fatal

Life-or-death Malpractice exposure

Legal & Financial Advisory

Chatbots providing guidance on contracts, taxes, investments, or legal rights

Material harm Fiduciary exposure

Customer Service Bots

High-volume support bots that can make binding commitments or provide false information

Contractual liability High volume

Educational Bots

Tutoring and learning assistants serving children and students

Minor users Trust relationships
Low Deployment Frequency High
Critical Low

Risk Levels

Critical Risk: High frequency + Life-threatening or major financial harm
High Risk: Moderate frequency + Significant harm potential
Medium Risk: Variable deployment + Moderate harm
Critical Risk 1 context

Companion & Emotional Support Bots

Character.AI-style companions with parasocial relationships and vulnerable users

High Risk 2 contexts

Healthcare Chatbots

Mental health support, symptom checkers, crisis lines where misguided advice can be fatal

Legal & Financial Advisory

Chatbots providing guidance on contracts, taxes, investments, or legal rights

Medium Risk 2 contexts

Customer Service Bots

High-volume support bots that can make binding commitments or provide false information

Educational Bots

Tutoring and learning assistants serving children and students

How We Work

Two Paths to Risk Visibility

Track A

Risk Assessment Consulting

Know what you're deploying before you deploy it.

Targeted analysis of where your chatbot creates harm. You learn what we find. All of it.

  • Targeted liability surface analysis
  • Attorney-client privilege structure available
  • You will know exactly what we found
  • What you do with it is on you
  • Clear go/no-go recommendation

If you don't know what you have, start here.

Track B

Defensive Evaluation

Concerned about safety or preparing for litigation?

Full evaluation with methodology built to hold up in court.

  • Forensic conversation analysis
  • Defensible methodology documentation
  • Expert witness availability
  • Court-ready diagnostic artifacts

For legal teams, and for companies that need to know before someone is harmed.

Honest findings require structural difference.

Worker-owned and democratically governed. No venture capital, no divided loyalties.

We cannot evaluate AI systems for accountability while being structured to protect our own revenue from the truth of what we find. The cooperative is the only structure that makes this work honest.

Core Team

Domain expertise, accountability research, and AI safety

Zacharia Rupp, MCS, MFA

Zacharia Rupp, MCS, MFA

Founding Member | President

Former Head of Data Delivery, Pareto AI • Master of Computer Science from University of Illinois Urbana-Champaign • Master of Fine Arts from University of Central Oklahoma

AI evaluation methodology, deep learning methods for healthcare, systematic literature review, research design, technical assessment of clinical decision support systems, statistical validation.

Alexandra Ah Loy, JD

Alexandra Ah Loy, JD

Founding Member | Vice President | Chief Compliance Officer

Partner, Hall Booth Smith specializing in healthcare law and mental health litigation • Bachelor's degree in Psychology • Former Chief Legal Officer, Turn Key Health • National defense counsel for multiple healthcare organizations.

Legal frameworks for mental health care, liability analysis, regulatory compliance, medical malpractice defense, civil rights litigation.

Jesse Ewing

Jesse Ewing

Founding Member | Research & Development Steward

Data science and quality assurance across multiple AI development contexts. Expert-level annotation and review experience.

Statistical analysis, inter-rater reliability assessment, evaluation metrics design, data quality frameworks, model behavior analysis.

Kalani Ah Loy

Kalani Ah Loy

Founding Member | Business Development & Data Steward

Lead Clinical Engineer at OU Health. Former Head of Business Development and Cloud Infrastructure Architect startup experience. Navy veteran with electronics technical background.

Healthcare technology systems, clinical engineering, medical device integration, data infrastructure, healthcare business development.

Expert Advisory Network: Built with domain experts across healthcare, law, finance, and consumer protection. People who know what harm looks like in practice.

Start With a Risk Assessment

Don't wait until your model causes real harm. Book 45 minutes. We review your deployment and tell you where the harm currently lives. We're in pilot, and rates are reduced. You get the full evaluation.

Early clients get full evaluation at pilot rates. In exchange, you receive rigorous evaluation at significantly reduced rates and documentation you can use with regulators, legal counsel, and stakeholders.

Prefer email? Reach us directly at contact@lonocollective.ai