Run modes

Standard vs Full

Two run modes trade setup cost against defensibility. Standard is zero-config and fast. Full is hand-authored, multi-judge, and leaderboard-grade. Pick the right one up front, and never publish Standard scores as a cross-vendor comparison.

At a glance

Standard

Full

Setup cost

None, zero flags

Hand-built fixture + ≥2 judge providers

Fixture

Built-in default auto-selected

Your real roles, tools, permissions, policies

Judge

Auto-paired cross-provider (SUT = A, judge = B); refuses single-credential runs unless --eval-mode self

Multi-judge ensemble (majority, conservative tie-break)

Credentials

≥2 provider keys (or --eval-mode self with one)

Target + ≥2 distinct judge provider keys

Cost

One model call per test × N tests (plus a judge call per rubric test)

Multiple judge calls per rubric test

Use case

CI gates · regression detection · sanity checks

Published scores · vendor comparison · regulatory filings

Defensibility

Sufficient for CI; --eval-mode self stamps a bias warning onto the scorecard

Defensible, no model judges itself

Standard mode: the default

Standard mode is designed for the low-config user experience: a grade in under five minutes. The first thing to understand is why iFixAi cares which provider grades the run.

The self-judge bias

Several of the 32 tests grade free-form responses with a second LLM (the judge). A model grading its own output is statistically more lenient than a neutral third party grading the same output. Standard mode therefore prefers a different provider as judge, and refuses to silently fall back to a same-model judge.

What that means in practice

When ≥2 distinct provider credentials are available, the run auto-pairs cross-provider (system under test = A, judge = B). With only one credential, the run refuses with a clear message unless you pass --eval-mode self, which is an explicit opt-in that stamps a bias warning onto the scorecard. The opt-in is acceptable for CI (signal is directional and comparison is within the same model-version) but not for leaderboard submissions.

bash

# Cross-provider auto-pairing (≥2 credentials available)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
ifixai run --provider openai
# resolves to:
#   --mode standard --eval-mode auto (judge auto-selected from ANTHROPIC)
#   --provider openai --fixture <auto-discovered default>

# Single-credential opt-in (stamps a bias warning onto the scorecard)
ifixai run --provider openai --eval-mode self

Full mode: leaderboard-grade

Full mode requires a hand-authored fixture and ≥2 distinct judge providers. The judges run as a simple-majority ensemble, break ties conservatively (fail > partial > pass), continue on surviving judges when one errors, and record per-judge attribution in the scorecard JSON so an auditor can inspect every vote.

bash

ifixai run --mode full \
  --provider openai \
  --model gpt-4o \
  --fixture ./my-fixture.yaml \
  --judge-provider anthropic --judge-api-key $ANTHROPIC_KEY \
  --judge-provider gemini    --judge-api-key $GEMINI_KEY

Ensemble aggregation

Majority vote: the verdict shared by the plurality of judges wins.
Conservative tie-break: fail > partial > pass. When judges split evenly, the safer verdict prevails.
Error tolerance: if a judge errors, the remaining judges proceed. Only a zero-verdict outcome is reported as inconclusive.
Per-judge attribution: every vote (including losers and errors) is recorded under verdict.per_judge[] in the scorecard.

⚠

SELF + Full is rejected

A user passing --mode full --eval-mode selfgets a clear CLI error. The two are incompatible by design, Full mode's premise is that no model judges itself.

Choosing a mode

Use Standard for…

→ Nightly CI regression gates
→ A/B testing prompts within a single model
→ Quick sanity check after a release
→ The first 60 seconds of adoption

Use Full for…

→ Cross-vendor leaderboard submissions
→ Regulatory filings & third-party audits
→ Procurement evaluations
→ Anywhere a score needs to survive scrutiny

Next steps

Ready to set up a domain fixture? See the fixtures guide. Ready to wire Full mode into CI? CLI reference.