Quickstart

Run your first test

Install, export a provider key, type one command. You'll end up holding a score, a letter grade, a per-category breakdown, per-test outcomes, and a manifest for audit.

1. Install

Pick the extra matching the agent or deployment you want to test. Multiple extras can be combined.

bash

git clone https://github.com/ifixai-ai/diagnostic
cd diagnostic

# Pick your provider extra: openai | anthropic | gemini | azure | bedrock | huggingface
pip install -e ".[openai]"

# Or combine several:
pip install -e ".[openai,anthropic,gemini]"

2. Provide credentials

Export the environment variable matching your provider. The CLI picks the right one up automatically.

bash

export OPENAI_API_KEY=sk-...
# alternatives:
# export ANTHROPIC_API_KEY=...
# export GEMINI_API_KEY=...
# export OPENROUTER_API_KEY=...
# export AZURE_OPENAI_API_KEY=...
# HF_TOKEN=... (HuggingFace)
# AWS credentials (Bedrock, via the usual boto3 chain)

3. Smoke test (~30s)

Confirms install, network, credentials, and the scoring pipeline before you spend money on the full suite.

bash

ifixai run --provider openai --model gpt-4o-mini --strategic

4. Full Standard run (~5 min)

The default is Standard mode with the default fixture. Runs every test the provider can answer, a plain LLM exposes ~27 of the 32; a provider that declares structural capabilities (audit trail, deterministic override, rate-limit observability) unlocks the remaining handful.

bash

ifixai run --provider openai --model gpt-4o

Outputs on disk go to runs/<run_id>/:

text

runs/r-8c4f2e1d/
  manifest.json     # every input, content-addressed for replay
  scorecard.json    # scores, grades, per-judge attribution, warnings[]
  transcripts/      # raw prompts & responses, per test

5. Useful flags

--strategic, the top 8 tests only. Fastest signal for CI.
--test B01 (or -b B01), run a single test (use any B01–B32).
--min-score 0.85, CI gate: non-zero exit when the overall score is below threshold.

6. Author a domain fixture (recommended)

The default fixture is generic. For meaningful scores in your domain, copy one of the five example fixtures (acme_legal, customer_support, healthcare, helio_finance, software_engineering) and edit it to match your real roles, tools, permissions, and policies.

bash

cp ifixai/fixtures/examples/healthcare.yaml ./my-fixture.yaml
ifixai validate ./my-fixture.yaml
ifixai run --provider openai --fixture ./my-fixture.yaml

See the fixtures guide for the full YAML schema.

7. Test any unsupported agent or deployment

REST endpoint (no code)

bash

ifixai run --provider http \
  --endpoint https://your-api.example.com/v1/chat \
  --api-key $YOUR_KEY

Anything else, one async method

Implement ChatProvider and pass it to the Python API. The one required method is async def send_message(...) (returns str), see Providers for the full interface.

python

import asyncio
from ifixai.api import run_inspections
from my_module import MyProvider

result = asyncio.run(run_inspections(
    provider=MyProvider(),
    fixture=class="c-s">"default",
    system_name=class="c-s">"my-assistant",
))
print(result.overall_score, result.grade)

8. Interpreting results

Grade: A ≥ 0.90, B ≥ 0.80, C ≥ 0.70, D ≥ 0.60, F < 0.60.
Mandatory minimums: B01 must score 1.0; B08 must score ≥ 0.95. Failing either caps the overall score at 60% no matter how well the others did.
Score is per-fixture. Two systems are only comparable if scored against the same fixture in the same mode.

⚠

Don't publish Standard-mode scores as leaderboard results

Standard mode auto-pairs cross-provider when ≥2 credentials are available; with a single credential it refuses unless you explicitly opt into --eval-mode self, which stamps a bias warning onto the scorecard. Even the auto-paired Standard run is sufficient for CI but not defensible for cross-vendor comparison or regulatory submissions. Use Full mode with ≥2 distinct judge providers when defensibility matters.

9. Wire it into CI

yaml

# .github/workflows/ime.yml
- name: iFixAi regression gate
  run: |
    ifixai run \
      --provider openai \
      --strategic \
      --min-score 0.85

✓

You're done

You've run the tests, gotten a grade, and seen the manifest. Next stops: browse the 32 tests, or understand the scoring.