>iFixAi
GitHubDocs →
Getting started
  • Introduction
  • Quickstart
  • Standard vs Full
Reference
  • The 32 Tests
  • Scoring
  • Fixtures
  • Providers
Integrate
  • CLI
  • Python API
  • Reproducibility
Compliance
  • Regulatory mappings
Run modes

Standard vs Full

Two run modes trade setup cost against defensibility. Standard is zero-config and fast. Full is hand-authored, multi-judge, and leaderboard-grade. Pick the right one up front, and never publish Standard scores as a cross-vendor comparison.

At a glance

Standard
Full
Setup cost
None, zero flags
Hand-built fixture + ≥2 judge providers
Fixture
Built-in default auto-selected
Your real roles, tools, permissions, policies
Judge
Auto-paired cross-provider (SUT = A, judge = B); refuses single-credential runs unless --eval-mode self
Multi-judge ensemble (majority, conservative tie-break)
Credentials
≥2 provider keys (or --eval-mode self with one)
Target + ≥2 distinct judge provider keys
Cost
One model call per test × N tests (plus a judge call per rubric test)
Multiple judge calls per rubric test
Use case
CI gates · regression detection · sanity checks
Published scores · vendor comparison · regulatory filings
Defensibility
Sufficient for CI; --eval-mode self stamps a bias warning onto the scorecard
Defensible, no model judges itself

Standard mode: the default

Standard mode is designed for the low-config user experience: a grade in under five minutes. The first thing to understand is why iFixAi cares which provider grades the run.

The self-judge bias

Several of the 32 tests grade free-form responses with a second LLM (the judge). A model grading its own output is statistically more lenient than a neutral third party grading the same output. Standard mode therefore prefers a different provider as judge, and refuses to silently fall back to a same-model judge.

What that means in practice

When ≥2 distinct provider credentials are available, the run auto-pairs cross-provider (system under test = A, judge = B). With only one credential, the run refuses with a clear message unless you pass --eval-mode self, which is an explicit opt-in that stamps a bias warning onto the scorecard. The opt-in is acceptable for CI (signal is directional and comparison is within the same model-version) but not for leaderboard submissions.

bash
# Cross-provider auto-pairing (≥2 credentials available)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
ifixai run --provider openai
# resolves to:
#   --mode standard --eval-mode auto (judge auto-selected from ANTHROPIC)
#   --provider openai --fixture <auto-discovered default>

# Single-credential opt-in (stamps a bias warning onto the scorecard)
ifixai run --provider openai --eval-mode self

Full mode: leaderboard-grade

Full mode requires a hand-authored fixture and ≥2 distinct judge providers. The judges run as a simple-majority ensemble, break ties conservatively (fail > partial > pass), continue on surviving judges when one errors, and record per-judge attribution in the scorecard JSON so an auditor can inspect every vote.

bash
ifixai run --mode full \
  --provider openai \
  --model gpt-4o \
  --fixture ./my-fixture.yaml \
  --judge-provider anthropic --judge-api-key $ANTHROPIC_KEY \
  --judge-provider gemini    --judge-api-key $GEMINI_KEY

Ensemble aggregation

  • Majority vote: the verdict shared by the plurality of judges wins.
  • Conservative tie-break: fail > partial > pass. When judges split evenly, the safer verdict prevails.
  • Error tolerance: if a judge errors, the remaining judges proceed. Only a zero-verdict outcome is reported as inconclusive.
  • Per-judge attribution: every vote (including losers and errors) is recorded under verdict.per_judge[] in the scorecard.
⚠
SELF + Full is rejected
A user passing --mode full --eval-mode selfgets a clear CLI error. The two are incompatible by design, Full mode's premise is that no model judges itself.

Choosing a mode

Use Standard for…
  • → Nightly CI regression gates
  • → A/B testing prompts within a single model
  • → Quick sanity check after a release
  • → The first 60 seconds of adoption
Use Full for…
  • → Cross-vendor leaderboard submissions
  • → Regulatory filings & third-party audits
  • → Procurement evaluations
  • → Anywhere a score needs to survive scrutiny

Next steps

Ready to set up a domain fixture? See the fixtures guide. Ready to wire Full mode into CI? CLI reference.

>iFixAi
Apache 2.0 · v1.0.0

The open-source diagnostic for AI misalignment. 32 inspections, 5 categories, one command.

build passing · 32 inspection modules · CI-green
Product
  • Overview
  • The 32 Tests
  • Run Modes
  • Regulatory
Docs
  • Quickstart
  • CLI Reference
  • Python API
  • Reproducibility
Community
  • GitHub
© 2026 iFixAi · maintained by iMe · Apache 2.0