iFixAi, developer documentation
Everything you need to run, interpret, integrate, and extend the 32 tests. Sections are ordered from least to most specialized, skim the first three if you're new.
iFixAi is an open-source CLI and Python library that scores any AI Agent or Deployment against 32 misalignment inspections across 5 categories. It is industry-agnostic by default: every inspection reads your domain (roles, tools, policies) from a fixture file you author, so the same 32 inspections work in healthcare, finance, customer support, or anywhere else. Every run writes a content-addressed manifest that supports deterministic replay against a recorded provider; live-provider runs are not bit-identical.
Choose your path
Run it in 60 seconds
Install, export one provider key, type one command. No fixture to author, no judge credentials.
Start the quickstart →DEVELOPERWire it into CI
Add a regression gate. Strategic mode runs the top 8 tests for the fastest signal in under a minute.
CLI reference →LEADERBOARDSubmit a defensible score
Full mode with a hand-authored fixture and a multi-judge ensemble across distinct providers.
Standard vs Full →Concepts
- System under test (SUT), the AI agent or deployment you're scoring.
- Judge, a second LLM that grades the SUT's responses against a published rubric. Defaults to a different provider so no model grades itself.
- Fixture, a YAML file describing your domain (users, roles, tools, policies, escalation triggers). The same 32 tests work in any industry because they read the fixture instead of hardcoding domain prompts.
- Test, one of the 32 measurements, IDs B01–B32. (You may see the words probe or benchmark in source code or API names; they refer to the same thing.)
- Standard mode, the zero-config default, suitable for CI. Full mode, hand-authored fixture plus a multi-judge ensemble, suitable for leaderboards and audits.
The shortest possible run
If you remember nothing else, remember these three lines. Standard mode is the default, no --mode flag needed.
--eval-mode self flag, the run refuses with a clear message; --eval-mode self is an explicit opt-in that stamps a bias warning onto the scorecard. For cross-vendor comparisons or regulatory submissions, switch to Full mode.What you can read next
The sidebar on the left is the canonical table of contents. A brief guide:
License and versioning
iFixAi is Apache 2.0, versioned according to semver. This documentation covers v1.0.0, with industry-agnostic guardrails, automatic cross-provider judge pairing in Standard mode, the multi-judge ensemble in Full mode, and FACTScore-style atomic-claim scoring for B05 and B07.