Optimize a Slow ClickHouse Query with Your AI Agent

Use this guide when a production ClickHouse query in your MooseStack app is slower than it should be.

You will set up your AI coding agent with the 514 harness to benchmark the slow query against one or more optimized candidates. By the end you will have a reviewable PR with the winning schema changes and benchmark evidence.

The experiment:

Profiles the slow query and proposes optimization candidates
Creates isolated preview databases on 514 hosting for the baseline and each candidate
Benchmarks every preview with the same test to pick the best performer
Opens a PR for the winner for your review

This guide assumes:

MooseStack version 0.6.493 or later (run npx moose-cli --version or moose --version to check)
Your MooseStack app is already deployed on Fiveonefour
Your coding agent is open on this repository
The agent can reuse an existing query path or define a query model when profiling finds a pattern without one

Setup

Don't have a MooseStack project yet?

Start by bootstrapping a MooseStack project from your existing ClickHouse database, then return here once you have a project the agent can run against.

Install the 514 CLI, sign in, link your local MooseStack project to its Fiveonefour counterpart, and run 514 agent init so your coding agent has the harness and skills.

Optimization workflow

This section helps you understand what the agent is doing at each stage of the optimization workflow, what it may ask you to confirm, and how to spot issues early enough to unblock the run.

You can use this workflow whether you already know which query needs work or need the agent to help identify a good optimization target first. It verifies the setup, runs the optimization workflow in preview branch databases, and leaves production untouched. The end result is a PR with the evidence you need to decide whether the change is ready to move forward.

PR review

The experiment is over. The PR is now the review surface. The agent can open a PR, but you still decide what ships.

You should confirm two things here:

the benchmarks and EXPLAIN support the performance claim
any rebuild, backfill, or destructive step is explicit and acceptable before production

Benchmark JSON quick reference

Use this when you need the current field map while reviewing the benchmark step above. Read test-results.json first for the verdict, then benchmark-*.json for the detailed evidence.

test-results.json

{  "success": false,  "numPassedTests": 1,  "numFailedTests": 1,  "numPendingTests": 1,  "testResults": [    {      "assertionResults": [        { "title": "baseline p95 under threshold", "status": "passed" },        { "title": "scenarios do not regress", "status": "skipped" },        {          "title": "EXPLAIN shows index usage in at least one scenario",          "status": "failed",          "failureMessages": [            "AssertionError: No scenario uses index pruning..."          ]        }      ]    }  ]}

benchmark-*.json

{  "timestamp": "2026-03-26T23:53:29.709Z",  "target": {    "host": "yey99hpcxa.clickhouse.boreal.cloud",    "database": "514-demos-planes-perf-base-f9f0d"  },  "tests": {    "baseline": {      "sql": "SELECT ...",      "p50": 22,      "p95": 218,      "profiles": [        {          "queryId": "standalone...",          "readRows": 702027,          "readBytes": 18254197,          "memoryUsage": 49048628,          "durationMs": 218        }      ]    },    "explain": {      "baseline": {        "sql": "SELECT ...",        "explain": {          "indexCondition": "true",          "selectedGranules": 92,          "totalGranules": 92,          "granuleSkipPct": 0        }      }    }  }}

test-results.json tells you whether the benchmark passed. benchmark-*.json tells you why.

What to compare	Where in the report	What good looks like
Did the run pass?	`test-results.json -> success`	`true`, or a failure you can explain
Which assertion failed?	`test-results.json -> testResults[].assertionResults[]`	Clear pass/fail/skip for each test
Overall p95?	`benchmark-*.json -> tests.baseline.p95`	Lower in winner than baseline
Scenario regressions?	`benchmark-*.json -> tests.scenarioRegression`	Present only if scenario test ran; no ratio above threshold
Selective scenario pruning?	`benchmark-*.json -> tests.explain.<scenario>.explain.indexCondition`	At least one selective scenario is not `"true"`
Granules skipped?	`benchmark-*.json -> tests.explain.<scenario>.explain.granuleSkipPct`	Higher in winner for the selective scenario
Rows read?	`benchmark-*.json -> tests.baseline.profiles[0].readRows`	Lower in winner when the contract is unchanged
Memory usage?	`benchmark-*.json -> tests.baseline.profiles[0].memoryUsage`	Lower in winner when the contract is unchanged

Optimize a Slow ClickHouse Query with Your AI Agent

Use this guide when a production ClickHouse query in your MooseStack app is slower than it should be.

The experiment:

Profiles the slow query and proposes optimization candidates
Creates isolated preview databases on 514 hosting for the baseline and each candidate
Benchmarks every preview with the same test to pick the best performer
Opens a PR for the winner for your review

This guide assumes:

MooseStack version 0.6.493 or later (run npx moose-cli --version or moose --version to check)
Your MooseStack app is already deployed on Fiveonefour
Your coding agent is open on this repository
The agent can reuse an existing query path or define a query model when profiling finds a pattern without one

Setup

Don't have a MooseStack project yet?

Start by bootstrapping a MooseStack project from your existing ClickHouse database, then return here once you have a project the agent can run against.

Install the 514 CLI, sign in, link your local MooseStack project to its Fiveonefour counterpart, and run 514 agent init so your coding agent has the harness and skills.

Step Progress

0 of 3 complete

What you need

A local clone of the MooseStack app repository
Hosting docs if install fails

What you get

514 installed and 514 --version working

Workflow

1
Install the 514 CLI
From Hosting, install the 514 CLI:
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) 514
2
Verify installation
514 --version

Optimization workflow

This section helps you understand what the agent is doing at each stage of the optimization workflow, what it may ask you to confirm, and how to spot issues early enough to unblock the run.

Step Progress

0 of 5 complete

What you need

Setup complete: CLI installed, project linked, and 514 agent init run
Your coding agent open on this repo
Either a slow query to point at or a repo where the agent can help find one

What you get

A clear prompt to start the workflow from the right entry point
The context your agent needs before it proposes the performance optimization skill

Workflow

1
Ask your agent to improve ClickHouse performance
Choose the prompt that matches your starting point:
Use this when you want the agent to identify the slow queries before proposing improvements.
Help me improve my ClickHouse performance. I do not know which queries are slow yet, so help me identify opportunities to optimize, then suggest and benchmark improvements.
Your agent may pause to ask you to confirm the performance optimization skill before the long run starts.
2
Confirm the performance optimization skill
When the agent proposes the perf optimization skill, confirm. It should drive profiling, schema work, benchmarks, and the PR. If the bundle name differs, accept only if it is the same harness you got from 514 agent init.
3
Confirm the target deployment
The agent should start by confirming the linked project, the deployment it will inspect, and the query or endpoint in scope. Before it moves on, you should know exactly which slow path is being optimized and which deployment the diagnostics are reading from.

PR review

The experiment is over. The PR is now the review surface. The agent can open a PR, but you still decide what ships.

You should confirm two things here:

the benchmarks and EXPLAIN support the performance claim
any rebuild, backfill, or destructive step is explicit and acceptable before production

Step Progress

0 of 3 complete

What you need

The PR the agent opened
The committed query-benchmarks/ package
The benchmark artifacts for both the frozen baseline branch and the winning candidate branch:
- reports/test-results.json
- reports/benchmark-*.json

What you get

Confidence that the winning candidate branch actually beats the baseline for the same query path, without hiding a scenario-level regression

Workflow

1
Start with the PR summary, then open the JSON files
The agent should summarize the JSON results in the PR description, often as a small comparison table. Use that summary to orient yourself, but do not treat it as the evidence.
The evidence lives in the JSON files:
- test-results.json tells you which benchmark assertions passed, failed, or were skipped
- benchmark-*.json contains the detailed benchmark evidence for a specific target database
You should always be able to trace a PR claim back to one of those two files.
2
Identify the winner report and the baseline report
Pair two artifact sets: winner (the candidate branch the agent selected) and baseline (the frozen control branch with no schema change). The timestamped benchmark report usually does not include a branch field. In practice, the database name contains the branch slug, so that is the field to inspect first.
{ "target": { "host": "cluster.clickhouse.boreal.cloud", "database": "514-demos-planes-perf-opti-1c360" }}
Both runs must use the same benchmark package and query path contract, different targets only. No clear baseline means no real before/after.
3
Read the scaffolded benchmark package
Open the committed benchmark package, usually query-benchmarks/. Start with query-benchmarks/benchmark.config.ts, then read query-benchmarks/tests/query-benchmarks.test.ts.
What to confirm before you trust any report:
- benchmark.config.ts points at the real query target the PR claims to optimize
- baseQuery, scenarios, thresholds, and sampling are the same between baseline and winner
- the generated test runner is still exercising that same contract
- only the minimal target change was introduced for the winner, usually a table or model reference
If the benchmark package changed in broader ways than that, the comparison is no longer clean.
4
Open test-results.json first
Use test-results.json as the verdict file. It tells you:
- whether the run succeeded overall
- which benchmark assertions passed, failed, or were skipped
- the exact failing assertion when something broke
In the current benchmark package, that usually means:
- baseline p95 under threshold
- scenarios do not regress
- EXPLAIN shows index usage in at least one scenario
This file is where you find out whether scenarioRegression was actually present, skipped, or failed. Do not assume every benchmark report includes all test groups.
5
Check how the winner was chosen
The PR summary should compare the frozen baseline against each candidate branch. Read that comparison before you inspect any single JSON field.
The winner should be selected because it does the best job on the same contract, not because one headline number looked better in isolation. Good winner-selection language usually includes:
- lower overall tests.baseline.p95 than baseline
- no scenario regression ratio above the configured threshold, when scenario tests ran
- at least one selective scenario showing better pruning or fewer granules read
- caveats called out when a scenario stays flat, gets skipped, or the base query still full-scans
If another candidate looks stronger on those same measures, ask why it was not selected.
6
Read the saved benchmark report next
Then open the timestamped benchmark-*.json file. This is the detailed evidence file for one target database.
It should tell you:
- which host and database the run hit
- the exact SQL that ran
- baseline p50 and p95
- per-run query-log profiles such as readRows, readBytes, memoryUsage, and durationMs
- EXPLAIN output such as indexCondition, selectedGranules, totalGranules, and granuleSkipPct
7
Read scenario regressions when they exist
If test-results.json shows that scenarios do not regress ran, then open tests.scenarioRegression. This is where the selective scenarios usually reveal whether the optimization helped the filtered paths you care about.
What to look for:
- which scenarios improved materially
- whether any scenario crossed the configured regression threshold
- whether one scenario explains most of the win while the others stay flat
- whether the PR caveats match what the scenario data actually shows
{ "tests": { "scenarioRegression": { "category=A3": { "p50": 18, "p95": 52, "baselineP95": 396, "ratio": 0.13 }, "altitude>10000": { "p50": 126, "p95": 129, "baselineP95": 396, "ratio": 0.33 } } }}
The important comparison is not whether every scenario improved equally. It is whether any scenario regressed past threshold and whether the winning candidate improved the scenarios that matter.
8
Read EXPLAIN as its own gate
Then look at tests.explain. Do not assume the unfiltered base query must stop returning indexCondition: "true". In the current benchmark package, the more important question is whether at least one selective scenario starts pruning where the baseline did not.
Good shape:
- baseline can still full-scan
- one or more selective scenarios show a narrower indexCondition
- selective scenarios read fewer granules in the winner than in the baseline
Example:
{ "tests": { "explain": { "baseline": { "explain": { "indexCondition": "true", "granuleSkipPct": 0 } }, "category=A3": { "explain": { "indexCondition": "category in ['A3']", "granuleSkipPct": 75 } } } }}
It is possible for the baseline latency threshold to pass while the EXPLAIN assertion still fails. In that case, trust the failing assertion and the saved EXPLAIN output over the PR text.
9
Compare overall latency and resource usage
After the scenario checks, look at tests.baseline.p50, tests.baseline.p95, and the per-run objects in tests.baseline.profiles. The profiles array is where the useful resource metrics live.
{ "tests": { "baseline": { "p50": 38, "p95": 44, "profiles": [ { "durationMs": 42, "readRows": 180000, "readBytes": 12000000, "memoryUsage": 6400000, "diskReadUs": 1200 } ] } }}
Good sign: lower p95, fewer rows read, and lower memory usage in the winner report. This is the broad query-level effect. The scenario regressions and scenario-level EXPLAIN tell you where that improvement came from.
10
Verify the SQL matches your app code
Use the sql field in each test entry to confirm the report executed the query path the PR claims to optimize. For example:
{ "tests": { "baseline": { "sql": "SELECT current_database, count() FROM ..." } }}
If that SQL does not match the app code or benchmark target in the PR, stop and ask why.

Benchmark JSON quick reference

Use this when you need the current field map while reviewing the benchmark step above. Read test-results.json first for the verdict, then benchmark-*.json for the detailed evidence.

test-results.json

{  "success": false,  "numPassedTests": 1,  "numFailedTests": 1,  "numPendingTests": 1,  "testResults": [    {      "assertionResults": [        { "title": "baseline p95 under threshold", "status": "passed" },        { "title": "scenarios do not regress", "status": "skipped" },        {          "title": "EXPLAIN shows index usage in at least one scenario",          "status": "failed",          "failureMessages": [            "AssertionError: No scenario uses index pruning..."          ]        }      ]    }  ]}

benchmark-*.json

{  "timestamp": "2026-03-26T23:53:29.709Z",  "target": {    "host": "yey99hpcxa.clickhouse.boreal.cloud",    "database": "514-demos-planes-perf-base-f9f0d"  },  "tests": {    "baseline": {      "sql": "SELECT ...",      "p50": 22,      "p95": 218,      "profiles": [        {          "queryId": "standalone...",          "readRows": 702027,          "readBytes": 18254197,          "memoryUsage": 49048628,          "durationMs": 218        }      ]    },    "explain": {      "baseline": {        "sql": "SELECT ...",        "explain": {          "indexCondition": "true",          "selectedGranules": 92,          "totalGranules": 92,          "granuleSkipPct": 0        }      }    }  }}

test-results.json tells you whether the benchmark passed. benchmark-*.json tells you why.

What to compare	Where in the report	What good looks like
Did the run pass?	`test-results.json -> success`	`true`, or a failure you can explain
Which assertion failed?	`test-results.json -> testResults[].assertionResults[]`	Clear pass/fail/skip for each test
Overall p95?	`benchmark-*.json -> tests.baseline.p95`	Lower in winner than baseline
Scenario regressions?	`benchmark-*.json -> tests.scenarioRegression`	Present only if scenario test ran; no ratio above threshold
Selective scenario pruning?	`benchmark-*.json -> tests.explain.<scenario>.explain.indexCondition`	At least one selective scenario is not `"true"`
Granules skipped?	`benchmark-*.json -> tests.explain.<scenario>.explain.granuleSkipPct`	Higher in winner for the selective scenario
Rows read?	`benchmark-*.json -> tests.baseline.profiles[0].readRows`	Lower in winner when the contract is unchanged
Memory usage?	`benchmark-*.json -> tests.baseline.profiles[0].memoryUsage`	Lower in winner when the contract is unchanged

{ "tests": { "scenarioRegression": { "category=A3": { "p50": 18, "p95": 52, "baselineP95": 396, "ratio": 0.13 }, "altitude>10000": { "p50": 126, "p95": 129, "baselineP95": 396, "ratio": 0.33 } } }}

{ "tests": { "explain": { "baseline": { "explain": { "indexCondition": "true", "granuleSkipPct": 0 } }, "category=A3": { "explain": { "indexCondition": "category in ['A3']", "granuleSkipPct": 75 } } } }}

{ "tests": { "baseline": { "p50": 38, "p95": 44, "profiles": [ { "durationMs": 42, "readRows": 180000, "readBytes": 12000000, "memoryUsage": 6400000, "diskReadUs": 1200 } ] } }}

Don't have a MooseStack project yet?

Don't have a MooseStack project yet?

1Install the 514 CLIInstall the CLI from Hosting and verify it is on your PATH.

2Sign in and link your projectAuthenticate with Fiveonefour and connect this repo to the right project.

3Run 514 agent initPull in 514 skills and scaffold the agent harness for your repo.

1Kick off the optimization runChoose the prompt that matches whether you need help finding the slow query or already know the target path.

2Review query analysis and proposed optimizationsThe agent inspects the slow query, collects diagnostics, and proposes candidate schema changes for you to review before benchmarking begins.

3Validate the baselineThe agent creates the control branch, scaffolds the benchmark package, and wires it to your Moose-defined query.

4Observe subagents run candidate experiments in parallelThe agent deploys each candidate schema change to its own preview branch and benchmarks them all against the frozen baseline.

5Review the winning candidateMake sure the comparison is credible and understand what evidence the agent used to pick the winner.

1Review the benchmark resultsRead the benchmark package and benchmark JSON to verify the evidence in the PR.

2Run the benchmark package yourself (optional)Run the generated benchmark package against the baseline and winning candidate preview databases to validate the result on your machine.

3Decide whether to mergeMake the final shipping decision after review and optional reruns.

1Install the 514 CLIInstall the CLI from Hosting and verify it is on your PATH.

2Sign in and link your projectAuthenticate with Fiveonefour and connect this repo to the right project.

3Run 514 agent initPull in 514 skills and scaffold the agent harness for your repo.

1Kick off the optimization runChoose the prompt that matches whether you need help finding the slow query or already know the target path.

2Review query analysis and proposed optimizationsThe agent inspects the slow query, collects diagnostics, and proposes candidate schema changes for you to review before benchmarking begins.

3Validate the baselineThe agent creates the control branch, scaffolds the benchmark package, and wires it to your Moose-defined query.

4Observe subagents run candidate experiments in parallelThe agent deploys each candidate schema change to its own preview branch and benchmarks them all against the frozen baseline.

5Review the winning candidateMake sure the comparison is credible and understand what evidence the agent used to pick the winner.

1Review the benchmark resultsRead the benchmark package and benchmark JSON to verify the evidence in the PR.

2Run the benchmark package yourself (optional)Run the generated benchmark package against the baseline and winning candidate preview databases to validate the result on your machine.

3Decide whether to mergeMake the final shipping decision after review and optional reruns.