Building on CDC-Enabled ClickHouse
This guide walks you through setting up a Moose project on top of an existing ClickHouse database that uses CDC (Change Data Capture) pipelines—ClickPipes, PeerDB, Debezium, or similar tools—to replicate data from external sources.
What You'll Learn
By the end of this guide, you'll understand how to:
- Bootstrap a Moose project from an existing ClickHouse database
- Work with externally managed tables that CDC services control
- Develop locally with realistic sample data from production
- Keep your models in sync as upstream schemas evolve
Prerequisites
- A ClickHouse database with CDC-replicated tables
- Your ClickHouse connection URL (e.g.,
https://user:pass@host:8443/database) - Node.js 20+ (TypeScript) or Python 3.10+ (Python)
Step 1: Initialize Your Project
The fastest way to get started is with moose init --from-remote. This single command:
- Creates a new Moose project
- Connects to your ClickHouse and introspects all tables
- Generates type-safe models for every table
- Automatically detects PeerDB-managed tables (via
_peerdb_*columns) and marks them asEXTERNALLY_MANAGED. Tables managed by other CDC tools (ClickPipes, Debezium, etc.) can be manually marked withlifeCycle: LifeCycle.EXTERNALLY_MANAGED - Saves your connection config to
moose.config.toml - Stores credentials securely in your OS keychain
moose init my-analytics-app --from-remote "https://user:pass@host:8443/database" --language typescriptYou'll see output like:
Created my-analytics-app from typescript-empty template Success Project created at my-analytics-app Connecting to remote ClickHouse... Introspecting tables in 'database'... Config Wrote [dev.remote_clickhouse] to moose.config.toml (host: host, database: database) Keychain Stored credentials securely for project 'my-analytics-app'What About Interactive Setup?
If you don't want to put credentials in the command, run without the URL:
moose init my-analytics-app --from-remote --language typescriptMoose will prompt you for host, username, password, and database interactively.
After initialization, navigate into your project and install dependencies:
cd my-analytics-app
npm installStep 2: Understanding Externally Managed Tables
Open your project and look at the generated files. You'll find your tables split into two categories:
Regular Tables (in your main file)
Tables that Moose fully manages—you control the schema, and Moose creates/migrates them:
import { OlapTable } from "@514labs/moose-lib"; export interface analytics_events { id: string; event_type: string; timestamp: string;} export const AnalyticsEventsTable = new OlapTable<analytics_events>("analytics_events", { orderByFields: ["timestamp", "id"],});Externally Managed Tables (in a separate file)
Tables where an external process (your CDC pipeline) owns the schema. Moose generates these in a dedicated file:
// AUTO-GENERATED FILE. DO NOT EDIT.// This file will be replaced when you run `moose db pull`. import typia from "typia";import { OlapTable, LifeCycle, ClickHouseEngines } from "@514labs/moose-lib"; export interface users { id: string & typia.tags.Format<"uuid">; email: string; name: string; created_at: string & typia.tags.Format<"date-time">; // PeerDB metadata columns (added by CDC) _peerdb_synced_at: string & typia.tags.Format<"date-time">; _peerdb_is_deleted: number; _peerdb_version: number;} export const UsersTable = new OlapTable<users>("users", { orderByFields: ["id"], engine: ClickHouseEngines.ReplacingMergeTree, ver: "_peerdb_version", lifeCycle: LifeCycle.EXTERNALLY_MANAGED, // <-- Key difference});What Does EXTERNALLY_MANAGED Mean?
The lifeCycle: LifeCycle.EXTERNALLY_MANAGED setting tells Moose:
| Moose Behavior | Regular Tables | Externally Managed |
|---|---|---|
| Creates table in production | Yes | No |
| Runs migrations | Yes | No |
| Generates type-safe models | Yes | Yes |
| Allows building views/APIs on top | Yes | Yes |
| You edit the schema | Yes | No (regenerated by db pull) |
In short: Moose gives you all the developer experience benefits (types, autocomplete, views, APIs) without touching the tables your CDC pipeline owns.
Don't Edit External Models Directly
The externalModels.ts / external_models.py file is regenerated every time you run moose db pull. Any manual changes will be overwritten. If you need to customize how a table is modeled, move it to your main file and remove EXTERNALLY_MANAGED.
Step 3: Local Development Setup
Now let's configure how Moose handles these external tables during local development. Open your moose.config.toml:
# This was auto-generated by moose init --from-remote[dev.remote_clickhouse]host = "your-clickhouse-host.example.com"port = 8443database = "production_db"use_ssl = trueprotocol = "http"Creating Local Mirror Tables
When you run moose dev, Moose creates the schema for externally managed tables in your local ClickHouse, but they start out empty. To populate them with sample data from your remote ClickHouse for local development, enable local mirrors:
[dev.externally_managed.tables]# Create local copies of external tablescreate_local_mirrors = true # Seed with sample data from remote (0 = empty tables)sample_size = 1000 # Re-seed on every moose dev start (false = only if missing)refresh_on_startup = falseConfiguration Options Explained
| Option | Default | Description |
|---|---|---|
create_local_mirrors | false | When true, Moose creates local tables matching your external table schemas |
sample_size | 0 | Number of rows to copy from remote for each table. Set to 0 for schema-only (empty tables) |
refresh_on_startup | false | When true, drops and recreates mirrors on every moose dev start. When false, only creates if missing |
When Does Seeding Happen?
Understanding when data is pulled is important for your workflow:
| Scenario | What Happens |
|---|---|
First moose dev run | Mirror tables created, sample_size rows seeded from remote |
Subsequent runs (refresh_on_startup = false) | Nothing—existing local data preserved |
Subsequent runs (refresh_on_startup = true) | Tables dropped, recreated, and reseeded |
| Remote unreachable | Tables created from local model definitions, no data seeded |
Materialized Views and Sample Data
Here's something important to understand: Materialized Views only process new data as it arrives.
In production, your CDC pipeline continuously inserts data, which triggers your materialized views. But locally, you have static seeded data—the MV won't retroactively process it.
Solutions for local MV development:
-
Seed enough data - Set
sample_sizehigh enough to have meaningful test data in your base tables -
Use
refresh_on_startup = true- This re-inserts data on each startup, triggering MVs (but slower startup) -
Manually trigger with
moose seed- Insert test data whilemoose devis running (requires the local dev server to be up):moose seed clickhouse --limit 100 -
Test MVs with direct inserts - During development, insert test rows manually to trigger MV logic
Production Behavior
In production, this isn't an issue—CDC continuously streams data, and your MVs process it in real-time.
Step 4: Running Local Development
With your config set up, start the dev server:
moose devFirst Run with Credentials
If credentials aren't in your keychain (e.g., you manually edited the config), Moose prompts you:
Credentials Remote ClickHouse credentials required: Host: your-clickhouse-host.example.com Database: production_db Enter username (default: default)> your_username Enter password> ******** Keychain Stored credentials securely for project 'my-analytics-app'Credentials are stored in your OS keychain and reused automatically on subsequent runs—no additional configuration needed.
What Happens on Startup
- Local infrastructure starts — Docker containers for ClickHouse, Redpanda, etc.
- External tables detected — credentials resolved from OS keychain (you'll be prompted if missing)
- Remote schema compared — local mirrors created if
create_local_mirrors = true - Data seeded (if
sample_size > 0and remote is reachable) - Dev server starts at
http://localhost:4000
Developing Locally
Now you can build on top of your CDC data:
- Create views that aggregate or transform external table data
- Build APIs that query across managed and external tables
- Test queries against realistic sample data
- Iterate quickly with hot-reload—no production impact
Step 5: Syncing Schema Changes
CDC pipelines evolve. Your DBA adds columns, the CDC service updates metadata fields, or new tables appear. When this happens, your local models need to sync.
Manual Sync
Run moose db pull to refresh your external models:
moose db pullThis:
- Connects to your remote ClickHouse (using saved credentials)
- Introspects current schemas for all externally managed tables
- Regenerates
externalModels.ts/external_models.py - Adds any new tables that appeared in the remote database
Example: A New Column Appears
Say your CDC pipeline starts syncing a new phone column to the users table:
Connecting to remote ClickHouse... Introspecting remote tables...External models refreshed (3 table(s))Your external models file now includes the new column:
export interface users { id: string; email: string; name: string; phone: string; // <-- New column! created_at: string; // ...}TypeScript immediately catches any code that now needs updating.
Automatic Sync on Dev Start
For active development where schemas change frequently, auto-sync on startup:
[http_server_config]on_first_start_script = "moose db pull" [watcher_config]# Prevent reload loop from generated file changesignore_patterns = ["app/externalModels.ts"]Now every moose dev starts by pulling the latest schemas.
Complete Configuration Reference
Here's a full moose.config.toml for a CDC-based project:
language = "typescript" # Remote ClickHouse connection (auto-generated by moose init --from-remote)[dev.remote_clickhouse]host = "abc123.us-east-1.aws.clickhouse.cloud"port = 8443database = "production"use_ssl = trueprotocol = "http" # Local development with external tables[dev.externally_managed.tables]create_local_mirrors = truesample_size = 500refresh_on_startup = false # Auto-sync schemas on startup[http_server_config]on_first_start_script = "moose db pull" # Don't trigger reloads on generated files[watcher_config]ignore_patterns = ["app/externalModels.ts"]Quick Reference
| Task | Command |
|---|---|
| Start new project from existing ClickHouse | moose init my-app --from-remote <URL> --language <typescript|python> |
| Sync external models after schema change | moose db pull |
| Start local development | moose dev |
| Seed more data locally | moose seed clickhouse --limit 1000 |
Next Steps
- External Tables Reference - Deep dive into
EXTERNALLY_MANAGEDlifecycle - Materialized Views - Build real-time aggregations on CDC data
- APIs & Web Apps - Expose your data through type-safe endpoints