We value your privacy

This site uses cookies to improve your browsing experience, analyze site traffic, and show personalized content. See our Privacy Policy.

  1. Guides
  2. Optimize a Slow ClickHouse Query with Your AI Agent
  3. Optimize a Slow ClickHouse Query with Your AI Agent

Optimize a Slow ClickHouse Query with Your AI Agent

Use this guide when a production ClickHouse query in your MooseStack app is slower than it should be.

You will set up your AI coding agent with the 514 harness to benchmark the slow query against one or more optimized candidates. By the end you will have a reviewable PR with the winning schema changes and benchmark evidence.

The experiment:

  1. Profiles the slow query and proposes optimization candidates
  2. Creates isolated preview databases on 514 hosting for the baseline and each candidate
  3. Benchmarks every preview with the same test to pick the best performer
  4. Opens a PR for the winner for your review

This guide assumes:

  • MooseStack version 0.6.493 or later (run npx moose-cli --version or moose --version to check)
  • Your MooseStack app is already deployed on Fiveonefour
  • Your coding agent is open on this repository
  • The agent can reuse an existing query path or define a query model when profiling finds a pattern without one

Setup

Don't have a MooseStack project yet?

Start by bootstrapping a MooseStack project from your existing ClickHouse database, then return here once you have a project the agent can run against.

Install the 514 CLI, sign in, link your local MooseStack project to its Fiveonefour counterpart, and run 514 agent init so your coding agent has the harness and skills.

Optimization workflow

This section helps you understand what the agent is doing at each stage of the optimization workflow, what it may ask you to confirm, and how to spot issues early enough to unblock the run.

You can use this workflow whether you already know which query needs work or need the agent to help identify a good optimization target first. It verifies the setup, runs the optimization workflow in preview branch databases, and leaves production untouched. The end result is a PR with the evidence you need to decide whether the change is ready to move forward.

PR review

The experiment is over. The PR is now the review surface. The agent can open a PR, but you still decide what ships.

You should confirm two things here:

  • the benchmarks and EXPLAIN support the performance claim
  • any rebuild, backfill, or destructive step is explicit and acceptable before production

Benchmark JSON quick reference

Use this when you need the current field map while reviewing the benchmark step above. Read test-results.json first for the verdict, then benchmark-*.json for the detailed evidence.

test-results.json

{  "success": false,  "numPassedTests": 1,  "numFailedTests": 1,  "numPendingTests": 1,  "testResults": [    {      "assertionResults": [        { "title": "baseline p95 under threshold", "status": "passed" },        { "title": "scenarios do not regress", "status": "skipped" },        {          "title": "EXPLAIN shows index usage in at least one scenario",          "status": "failed",          "failureMessages": [            "AssertionError: No scenario uses index pruning..."          ]        }      ]    }  ]}

benchmark-*.json

{  "timestamp": "2026-03-26T23:53:29.709Z",  "target": {    "host": "yey99hpcxa.clickhouse.boreal.cloud",    "database": "514-demos-planes-perf-base-f9f0d"  },  "tests": {    "baseline": {      "sql": "SELECT ...",      "p50": 22,      "p95": 218,      "profiles": [        {          "queryId": "standalone...",          "readRows": 702027,          "readBytes": 18254197,          "memoryUsage": 49048628,          "durationMs": 218        }      ]    },    "explain": {      "baseline": {        "sql": "SELECT ...",        "explain": {          "indexCondition": "true",          "selectedGranules": 92,          "totalGranules": 92,          "granuleSkipPct": 0        }      }    }  }}

test-results.json tells you whether the benchmark passed. benchmark-*.json tells you why.

What to compareWhere in the reportWhat good looks like
Did the run pass?test-results.json -> successtrue, or a failure you can explain
Which assertion failed?test-results.json -> testResults[].assertionResults[]Clear pass/fail/skip for each test
Overall p95?benchmark-*.json -> tests.baseline.p95Lower in winner than baseline
Scenario regressions?benchmark-*.json -> tests.scenarioRegressionPresent only if scenario test ran; no ratio above threshold
Selective scenario pruning?benchmark-*.json -> tests.explain.<scenario>.explain.indexConditionAt least one selective scenario is not "true"
Granules skipped?benchmark-*.json -> tests.explain.<scenario>.explain.granuleSkipPctHigher in winner for the selective scenario
Rows read?benchmark-*.json -> tests.baseline.profiles[0].readRowsLower in winner when the contract is unchanged
Memory usage?benchmark-*.json -> tests.baseline.profiles[0].memoryUsageLower in winner when the contract is unchanged

Related docs

  • Hosting
  • 514 auth
  • 514 project link
  • Fiveonefour hosting to local setup
  • Setup failures
  • Query Layer
  • Semantic Layer

On this page

SetupOptimization workflowPR reviewBenchmark JSON quick referencetest-results.jsonbenchmark-\*.jsonRelated docs
Edit this page
FiveonefourFiveonefour
Fiveonefour Docs
MooseStackHostingTemplatesGuides
Release Notes
Source572
  • Improving the Performance of Your Dashboards
  • Chat in your app
  • Custom analytics agents
  • Optimize a Slow ClickHouse Query with Your AI Agent
    • Overview
    • Tutorial

Step Progress

0 of 3 complete

What you need

  • A local clone of the MooseStack app repository
  • Hosting docs if install fails

What you get

  • 514 installed and 514 --version working

Workflow

  1. 1

    Install the 514 CLI

    From Hosting, install the 514 CLI:

    bash -i <(curl -fsSL https://fiveonefour.com/install.sh) 514
  2. 2

    Verify installation

    514 --version

Step Progress

0 of 5 complete

What you need

  • Setup complete: CLI installed, project linked, and 514 agent init run
  • Your coding agent open on this repo
  • Either a slow query to point at or a repo where the agent can help find one

What you get

  • A clear prompt to start the workflow from the right entry point
  • The context your agent needs before it proposes the performance optimization skill

Workflow

  1. 1

    Ask your agent to improve ClickHouse performance

    Choose the prompt that matches your starting point:

    Use this when you want the agent to identify the slow queries before proposing improvements.

    Help me improve my ClickHouse performance. I do not know which queries are slow yet, so help me identify opportunities to optimize, then suggest and benchmark improvements.

    Your agent may pause to ask you to confirm the performance optimization skill before the long run starts.

  2. 2

    Confirm the performance optimization skill

    When the agent proposes the perf optimization skill, confirm. It should drive profiling, schema work, benchmarks, and the PR. If the bundle name differs, accept only if it is the same harness you got from 514 agent init.

  3. 3

    Confirm the target deployment

    The agent should start by confirming the linked project, the deployment it will inspect, and the query or endpoint in scope. Before it moves on, you should know exactly which slow path is being optimized and which deployment the diagnostics are reading from.

Step Progress

0 of 3 complete

What you need

  • The PR the agent opened
  • The committed query-benchmarks/ package
  • The benchmark artifacts for both the frozen baseline branch and the winning candidate branch:
    • reports/test-results.json
    • reports/benchmark-*.json

What you get

  • Confidence that the winning candidate branch actually beats the baseline for the same query path, without hiding a scenario-level regression

Workflow

  1. 1

    Start with the PR summary, then open the JSON files

    The agent should summarize the JSON results in the PR description, often as a small comparison table. Use that summary to orient yourself, but do not treat it as the evidence.

    The evidence lives in the JSON files:

    • test-results.json tells you which benchmark assertions passed, failed, or were skipped
    • benchmark-*.json contains the detailed benchmark evidence for a specific target database

    You should always be able to trace a PR claim back to one of those two files.

  2. 2

    Identify the winner report and the baseline report

    Pair two artifact sets: winner (the candidate branch the agent selected) and baseline (the frozen control branch with no schema change). The timestamped benchmark report usually does not include a branch field. In practice, the database name contains the branch slug, so that is the field to inspect first.

    {  "target": {    "host": "cluster.clickhouse.boreal.cloud",    "database": "514-demos-planes-perf-opti-1c360"  }}

    Both runs must use the same benchmark package and query path contract, different targets only. No clear baseline means no real before/after.

  3. 3

    Read the scaffolded benchmark package

    Open the committed benchmark package, usually query-benchmarks/. Start with query-benchmarks/benchmark.config.ts, then read query-benchmarks/tests/query-benchmarks.test.ts.

    What to confirm before you trust any report:

    • benchmark.config.ts points at the real query target the PR claims to optimize
    • baseQuery, scenarios, thresholds, and sampling are the same between baseline and winner
    • the generated test runner is still exercising that same contract
    • only the minimal target change was introduced for the winner, usually a table or model reference

    If the benchmark package changed in broader ways than that, the comparison is no longer clean.

  4. 4

    Open test-results.json first

    Use test-results.json as the verdict file. It tells you:

    • whether the run succeeded overall
    • which benchmark assertions passed, failed, or were skipped
    • the exact failing assertion when something broke

    In the current benchmark package, that usually means:

    • baseline p95 under threshold
    • scenarios do not regress
    • EXPLAIN shows index usage in at least one scenario

    This file is where you find out whether scenarioRegression was actually present, skipped, or failed. Do not assume every benchmark report includes all test groups.

  5. 5

    Check how the winner was chosen

    The PR summary should compare the frozen baseline against each candidate branch. Read that comparison before you inspect any single JSON field.

    The winner should be selected because it does the best job on the same contract, not because one headline number looked better in isolation. Good winner-selection language usually includes:

    • lower overall tests.baseline.p95 than baseline
    • no scenario regression ratio above the configured threshold, when scenario tests ran
    • at least one selective scenario showing better pruning or fewer granules read
    • caveats called out when a scenario stays flat, gets skipped, or the base query still full-scans

    If another candidate looks stronger on those same measures, ask why it was not selected.

  6. 6

    Read the saved benchmark report next

    Then open the timestamped benchmark-*.json file. This is the detailed evidence file for one target database.

    It should tell you:

    • which host and database the run hit
    • the exact SQL that ran
    • baseline p50 and p95
    • per-run query-log profiles such as readRows, readBytes, memoryUsage, and durationMs
    • EXPLAIN output such as indexCondition, selectedGranules, totalGranules, and granuleSkipPct
  7. 7

    Read scenario regressions when they exist

    If test-results.json shows that scenarios do not regress ran, then open tests.scenarioRegression. This is where the selective scenarios usually reveal whether the optimization helped the filtered paths you care about.

    What to look for:

    • which scenarios improved materially
    • whether any scenario crossed the configured regression threshold
    • whether one scenario explains most of the win while the others stay flat
    • whether the PR caveats match what the scenario data actually shows
    {  "tests": {    "scenarioRegression": {      "category=A3": {        "p50": 18,        "p95": 52,        "baselineP95": 396,        "ratio": 0.13      },      "altitude>10000": {        "p50": 126,        "p95": 129,        "baselineP95": 396,        "ratio": 0.33      }    }  }}

    The important comparison is not whether every scenario improved equally. It is whether any scenario regressed past threshold and whether the winning candidate improved the scenarios that matter.

  8. 8

    Read EXPLAIN as its own gate

    Then look at tests.explain. Do not assume the unfiltered base query must stop returning indexCondition: "true". In the current benchmark package, the more important question is whether at least one selective scenario starts pruning where the baseline did not.

    Good shape:

    • baseline can still full-scan
    • one or more selective scenarios show a narrower indexCondition
    • selective scenarios read fewer granules in the winner than in the baseline

    Example:

    {  "tests": {    "explain": {      "baseline": {        "explain": {          "indexCondition": "true",          "granuleSkipPct": 0        }      },      "category=A3": {        "explain": {          "indexCondition": "category in ['A3']",          "granuleSkipPct": 75        }      }    }  }}

    It is possible for the baseline latency threshold to pass while the EXPLAIN assertion still fails. In that case, trust the failing assertion and the saved EXPLAIN output over the PR text.

  9. 9

    Compare overall latency and resource usage

    After the scenario checks, look at tests.baseline.p50, tests.baseline.p95, and the per-run objects in tests.baseline.profiles. The profiles array is where the useful resource metrics live.

    {  "tests": {    "baseline": {      "p50": 38,      "p95": 44,      "profiles": [        {          "durationMs": 42,          "readRows": 180000,          "readBytes": 12000000,          "memoryUsage": 6400000,          "diskReadUs": 1200        }      ]    }  }}

    Good sign: lower p95, fewer rows read, and lower memory usage in the winner report. This is the broad query-level effect. The scenario regressions and scenario-level EXPLAIN tell you where that improvement came from.

  10. 10

    Verify the SQL matches your app code

    Use the sql field in each test entry to confirm the report executed the query path the PR claims to optimize. For example:

    {  "tests": {    "baseline": {      "sql": "SELECT current_database, count() FROM ..."    }  }}

    If that SQL does not match the app code or benchmark target in the PR, stop and ask why.