# Moose / Olap Documentation – TypeScript ## Included Files 1. moose/olap/apply-migrations.mdx 2. moose/olap/db-pull.mdx 3. moose/olap/external-tables.mdx 4. moose/olap/indexes.mdx 5. moose/olap/insert-data.mdx 6. moose/olap/model-materialized-view.mdx 7. moose/olap/model-table.mdx 8. moose/olap/model-view.mdx 9. moose/olap/planned-migrations.mdx 10. moose/olap/read-data.mdx 11. moose/olap/schema-change.mdx 12. moose/olap/schema-optimization.mdx 13. moose/olap/schema-versioning.mdx 14. moose/olap/supported-types.mdx 15. moose/olap/ttl.mdx ## Applying Migrations Source: moose/olap/apply-migrations.mdx How to apply migrations to your database # Applying Migrations This page covers OLAP migrations. For migrations across the MooseStack, see the Migrate docs. ## Overview Migrations are designed for two complementary goals: - Move fast locally by inferring changes from your code and applying them immediately to your local database. - Be deliberate in production by executing a reviewed, versioned plan that matches your intent and protects data. How to think about it: - Development mode: You edit code, MooseStack infers the SQL and immediately applies it to local ClickHouse. Great for rapid iteration; not guaranteed to infer intent (e.g., renames). - Production (planned) mode: You generate a plan from the target environment vs your code, review and commit the plan, and MooseStack executes it deterministically during deploy with drift checks. What you need to do: - In dev: just code. MooseStack handles local diffs automatically. - In prod (OLAP): - Generate and save a plan: ```bash moose generate migration --url https:// --token --save ``` - Review and edit the plan (`plan.yaml`) as needed - Commit the plan to source control - Deploy to production. MooseStack validates snapshots (current DB vs `remote_state.json`, desired code vs `local_infra_map.json`) and executes `plan.yaml` in order. If drift is detected, the deploy aborts; regenerate the plan and retry. ## Development Workflow ### Starting the Runtime Use `moose dev` to start the MooseStack runtime with automatic migration detection: ```bash moose dev ⡏ Starting local infrastructure Successfully started containers Validated clickhousedb-1 docker container Validated redpanda-1 docker container Successfully validated red panda cluster Validated temporal docker container Successfully ran local infrastructure ``` ### Hot-Reloaded Migrations MooseStack continuously monitors your code changes and applies migrations automatically. All changes are applied to your **local database only**. ```ts filename="app/tables/events.ts" copy interface Event { id: Key; name: string; createdAt: Date; status: string; // New field - will trigger migration } /planned-migrations`} ctaLabel="Planned Migrations (OLAP)" compact> Use planned migrations to generate, review, and apply OLAP DDL plans deterministically. ### Generating Migration Plans When using planned migrations for OLAP, you need to generate a migration plan from the remote environment. This is done by running the following command: ```bash moose generate migration --url https:// --token --save ``` This generates a few files in the `migrations` directory: - `plan.yaml`: The migration plan containing an ordered list of operations to apply to the remote database to bring it into alignment with your local code. - `remote_state.json`: A snapshot of the remote database state at the time the plan was generated - `local_infra_map.json`: A snapshot of the local database state at the time the plan was generated The remote and local state are used to validate that the plan is still valid at the time of deployment. If there have been schema changes made to your live remote database since the plan was generated, the deployment will abort and you will need to regenerate the plan. This is to prevent you from dropping data unintentionally. ### Reviewing and Editing the Plan You can review and edit the plan as needed. The plan is a YAML file that contains an ordered list of operations to apply to the remote database to bring it into alignment with your local code. ```yaml filename="migrations/plan.yaml" copy ``` ### Applying the Plan The plan is applied during deployment. MooseStack will validate that the remote database state matches the snapshot of the database state at the time the plan was generated, and applies `plan.yaml` in order; it aborts if snapshots don’t match current state. ## Migration Types ### Adding New Tables or Materialized Views ```typescript filename="index.ts" {4-7} copy // Export the new table and materialized view to apply changes export { newTable, // New table newMaterializedView // New materialized view } ``` The dev mode will automatically detect the new table or materialized view and apply the changes to your local database. You will see a log like this in the terminal: ```bash filename="Terminal" copy $ moose dev ⠋ Processing Infrastructure changes from file watcher + Table: new_table Version None - id: String, a_column: String, some_other_column: Float64 - - deduplicate: false + Table: target_table Version None - id: String, a_column: String, some_other_column: Float64 - id - deduplicate: false + SQL Resource: mv_to_target ``` The generated plan for this operation will look like this: ```yaml filename="migrations/plan.yaml" copy - CreateTable: table: name: new_table columns: - name: id data_type: String required: true unique: false primary_key: true default: null annotations: [] - name: a_column data_type: String required: true unique: false primary_key: false default: null annotations: [] - name: some_other_column data_type: Float64 required: true unique: false primary_key: false default: null annotations: [] order_by: - id deduplicate: false engine: MergeTree version: null metadata: description: null life_cycle: FULLY_MANAGED - CreateTable: table: name: target_table columns: - name: id data_type: String required: true unique: false primary_key: true default: null annotations: [] - name: a_column data_type: String required: true unique: false primary_key: false default: null annotations: [] - name: some_other_column data_type: Float64 required: true unique: false primary_key: false default: null annotations: [] order_by: - id deduplicate: false engine: MergeTree version: null metadata: description: null life_cycle: FULLY_MANAGED - RawSQL: sql: "CREATE MATERIALIZED VIEW mv_to_target TO target_table AS SELECT * FROM source_table" description: Running setup SQL for resource mv_to_target ``` ### Column Additions Adding new fields to your data models: ```ts filename="Before" copy interface AddedColumn { id: Key; another_column: string; some_column: string; } ``` ```ts filename="After" copy interface AddedColumn { id: Key; another_column: string; some_column: string; new_column: number; // New field - migration applied } ``` In dev mode, you will see a log like this: ```bash filename="Terminal" copy $ moose dev ⢹ Processing Infrastructure changes from file watcher ~ Table events: Column changes: + new_column: Int64 ``` The generated plan for this operation will look like this: ```yaml filename="migrations/plan.yaml" copy - AddTableColumn: table: "events" column: name: "new_column" data_type: "Int64" ``` ### Column Removals Removing fields from your data models: ```ts filename="Before" copy interface RemovedColumn { id: Key; another_column: string; some_column: string; old_column: number; } const selectStatement = sql` SELECT toStartOfDay(${sourceTable.columns.a_date}) as day, uniq(${sourceTable.columns.id}) as count, SUM(${sourceTable.columns.a_number}) as sum FROM ${sourceTable} GROUP BY day `; const mv = new MaterializedView({ selectStatement, selectTables: [sourceTable], targetTable: { name: "target_table", engine: ClickHouseEngines.MergeTree, orderByFields: ["day"], }, materializedViewName: "mv_table_to_target, }); ``` ```ts filename="After.ts" copy interface TargetSchema { day: Date; count: number; sum: number; avg: number; // New column - migration applied } const selectStatement = sql` SELECT toStartOfDay(${sourceTable.columns.a_date}) as day, uniq(${sourceTable.columns.id}) as count, sum(${sourceTable.columns.a_number}) as sum, avg(${sourceTable.columns.a_number}) as avg FROM ${sourceTable} GROUP BY day `; --- ## Syncing External Tables Source: moose/olap/db-pull.mdx Refresh your external table models from an existing ClickHouse database # Syncing External Tables ## What this is Use `moose db pull` to refresh the definitions of tables you marked as `EXTERNALLY_MANAGED` from a live ClickHouse instance. It reads your code to find external tables, fetches their remote schemas, regenerates one external models file, and creates a small git commit if anything changed. If new external tables were added remotely (e.g., new CDC streams), they are added to the external models file as part of the same run. ## When to use it - **External tables changed remotely**: a DBA, CDC, or ETL pipeline updated schema. - **Keep types in sync**: update generated models without touching fully-managed tables. - **Safe by design**: does not modify the database or your managed models. This is a read-only sync for your code models. For concepts and modeling guidance, see [External Tables](/moose/olap/external-tables). To bootstrap a project from an existing DB, see [Initialize from ClickHouse](/moose/getting-started/from-clickhouse). ## Requirements - Tables are defined with `lifeCycle: EXTERNALLY_MANAGED` - A ClickHouse connection string (native or HTTP/S) ## Connection strings `db pull` accepts both native and HTTP(S) URLs. Native strings are automatically converted to HTTP(S) with the appropriate ports. Examples: ```bash filename="Terminal" copy # Native (auto-converted to HTTPS + 8443) moose db pull --connection-string "clickhouse://explorer@play.clickhouse.com:9440/default" # HTTPS (explicit database via query param) moose db pull --connection-string "https://play.clickhouse.com/?user=explorer&database=default" # Local HTTP moose db pull --connection-string "http://localhost:8123/?user=default&database=default" ``` ## What gets written `app/externalModels.ts` `db pull` treats this file as the single source of truth for `EXTERNALLY_MANAGED` tables. It introspects the remote schema, updates existing external tables, and adds any newly detected external tables here. It does not modify models elsewhere in your codebase. Keep all external tables in this file and import it once from your root (`app/index.ts`). Important: - The file is overwritten on every run (or at the path passed via `--file-path`). - If you customize the path, ensure your root file imports it so Moose loads your external models. ## How it works When you run `db pull` the CLI does the following: - Loads your project’s infrastructure map and identifies tables marked as `EXTERNALLY_MANAGED`. - Connects to the remote ClickHouse specified by `--connection-string` and introspects the live schemas for those tables. - Regenerates a single external models file that mirrors the remote schema. - Adds any newly detected external tables from the remote database to the generated file so your code stays in sync as sources evolve. - Does not change any fully managed tables, your `app/index.ts`, or the database itself. - Creates a small git commit if the generated file changed, so you can review and share the update. ### Example output ```ts filename="app/externalModels.ts" // AUTO-GENERATED FILE. DO NOT EDIT. // This file will be replaced when you run `moose db pull`. // ...typed table definitions matching remote EXTERNALLY_MANAGED tables... ``` ## Command ```bash filename="Terminal" copy moose db pull --connection-string [--file-path ] ``` - **--connection-string**: Required. ClickHouse URL (native or HTTP/S) - **--file-path**: Optional. Override the default output file. The file at this path will be regenerated (overwritten) on each run. ## Typical Use Cases ### Remote schema changed; update local types Your DBA, CDC pipeline (e.g., ClickPipes), or ETL job updated a table’s schema. To keep your code accurate and type-safe, refresh your external models so queries, APIs, and materialized views reference the correct columns and types. ```bash filename="Terminal" copy moose db pull --connection-string ``` This updates only `EXTERNALLY_MANAGED` models and leaves managed code untouched. ### Automatically run on dev startup (keep local fresh) In active development, schemas can drift faster than you commit updates. Running `db pull` on dev startup helps ensure your local code matches the live schema you depend on. ```bash filename="Terminal" copy export REMOTE_CLICKHOUSE_URL="clickhouse://:@:/" ``` Add to `moose.config.toml`: ```toml filename="moose.config.toml" copy [http_server_config] on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL" ``` This runs once when the dev server first starts. To run after code reloads, use `on_reload_complete_script`. If you run this frequently, prefer HTTP(S) URLs and cache credentials via env/secrets to avoid friction. ### New project from an existing DB If you’re starting with an existing ClickHouse database, bootstrap code with `init --from-remote`, then use `db pull` over time to keep external models fresh: ```bash filename="Terminal" copy moose init my-project --from-remote $REMOTE_CLICKHOUSE_URL --language ``` Review the full getting started guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB. ### A new CDC/external table appeared; add it to code Your CDC pipeline created a new table (or exposed a new stream). Pull to add the new table to your external models file automatically. ```bash filename="Terminal" copy moose db pull --connection-string ``` The regenerated external models file will now include the newly discovered external table. ## Troubleshooting - **No changes written**: Ensure tables are actually marked as `EXTERNALLY_MANAGED` and names match remote. - **Unsupported types**: The CLI will list tables with unsupported types; they’re skipped in the generated file. - **Auth/TLS errors**: Verify scheme/ports (8123 or 8443) and credentials; try HTTPS if native URL fails. - **Git commit issues**: The command attempts a lightweight commit; commit manually if your working tree is dirty. ## Related - **External Tables**: concepts and configuration - **Initialize from ClickHouse**: bootstrap projects from an existing DB - **Supported Types**: mapping and constraints --- ## External Tables Source: moose/olap/external-tables.mdx Connect to externally managed database tables and CDC services # External Tables ## Overview External tables allow you to connect Moose to database tables that are managed outside of your application. This is essential when working with: - **CDC (Change Data Capture) services** like ClickPipes, Debezium, or AWS DMS - **Legacy database tables** managed by other teams - **Third-party data sources** with controlled schema evolution ## When to Use External Tables ## Configuration Set `lifeCycle: LifeCycle.EXTERNALLY_MANAGED` to tell Moose not to modify the table schema: ```ts filename="ExternalTableExample.ts" copy interface CdcUserData { id: string; name: string; email: string; updated_at: Date; } // Connect to CDC-managed table const cdcUserTable = new OlapTable("cdc_users", { lifeCycle: LifeCycle.EXTERNALLY_MANAGED }); ``` ## Getting Models for External Tables ### New project: initialize from your existing ClickHouse If you don’t yet have a Moose project, use init-from-remote to bootstrap models from your existing ClickHouse: ```bash filename="Terminal" copy moose init my-project --from-remote --language ``` What happens: - Moose introspects your database and generates table models in your project. - If Moose detects ClickPipes (or other CDC-managed) tables, it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file: - TypeScript: `app/externalModels.ts` - Python: `app/external_models.py` - This is a best-effort detection to separate CDC-managed tables from those you may want Moose to manage in code. How detection works (ClickPipes/PeerDB example): - Moose looks for PeerDB-specific fields that indicate CDC ownership and versions, such as `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`, and related metadata columns. - When these are present, the table will be marked `EXTERNALLY_MANAGED` and emitted into the external models file automatically. ```ts filename="externalModels.ts" copy export interface foo { id: string & typia.tags.Format<"uuid">; name: string; description: string | undefined; status: string; priority: number & ClickHouseInt<"int32">; is_active: boolean; metadata: string | undefined; tags: string[]; score: (string & ClickHouseDecimal<10, 2>) | undefined; large_text: string | undefined; created_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<6>; updated_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<6>; _peerdb_synced_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<9> & ClickHouseDefault<"now64()">; _peerdb_is_deleted: number & ClickHouseInt<"int8">; _peerdb_version: number & ClickHouseInt<"int64">; } ); ``` ### Existing project: mark additional external tables If there are other tables in your DB that are not CDC-managed but you want Moose to treat as external (not managed by code): 1) Mark them as external in code ```ts copy // In a file you control (not the external file yet) const table = new OlapTable("my_table", { lifeCycle: LifeCycle.EXTERNALLY_MANAGED }); ``` 2) Move them into the external models file - Move the model definitions to your external file (`app/externalModels.ts` or `app/external_models.py`). - Ensure your root file still loads only the external models via a single import: - Add `import "./externalModels";` in your `app/index.ts` file. This keeps truly external tables out of your managed code path, while still making them available locally (and in tooling) without generating production DDL. ## Important Considerations `EXTERNALLY_MANAGED` tables reflect schemas owned by your CDC/DBA/ETL processes. Do not change their field shapes in code. If you accidentally edited an external model, revert to the source of truth by running **DB Pull**: [/moose/olap/db-pull](/moose/olap/db-pull). Locally, externally managed tables are created/kept in sync in your development ClickHouse so you can develop against them and **seed data**. See **Seed (ClickHouse)** in the CLI: [/moose/moose-cli#seed-clickhouse](/moose/moose-cli#seed-clickhouse). Moose will **not** apply schema changes to `EXTERNALLY_MANAGED` tables in production. If you edit these table models in code, those edits will not produce DDL operations in the migration plan (they will not appear in `plan.yaml`). For more on how migration plans are generated and what shows up in `plan.yaml`, see [/moose/olap/planned-migrations](/moose/olap/planned-migrations). ## Staying in sync with remote schema For `EXTERNALLY_MANAGED` tables, keep your code in sync with the live database by running DB Pull. You can do it manually or automate it in dev. ```bash filename="Terminal" copy moose db pull --connection-string ``` Use DB Pull to regenerate your external models file from the remote schema. To run it automatically during development, see the script hooks in [the local development guide](/moose/local-dev#script-execution-hooks). --- ## Secondary Indexes Source: moose/olap/indexes.mdx Specifying indexes with Moose OLAP ## Indexes for ClickHouse tables Moose lets you declare secondary/data-skipping indexes directly in your table definitions. Moose generates the ClickHouse `INDEX` clauses on create and plans `ALTER TABLE ADD/DROP INDEX` operations when you change them later. ### When to use indexes - Use indexes to optimize selective predicates on large tables, especially string and high-cardinality columns. - Common types: `minmax`, `Set(max_rows)`, `ngrambf_v1(...)`, `bloom_filter`. ### TypeScript ```ts interface Events { id: string; user: string; message: string; } ); ``` ### Python ```python from moose_lib.dmv2.olap_table import OlapTable, OlapConfig, MergeTreeEngine from pydantic import BaseModel class Events(BaseModel): id: str user: str message: str events_table = OlapTable[Events]( "Events", OlapConfig( engine=MergeTreeEngine(), order_by_fields=["id"], indexes=[ OlapConfig.TableIndex(name="idx_user", expression="user", type="minmax", granularity=1), OlapConfig.TableIndex(name="idx_message_ngrams", expression="message", type="ngrambf_v1", arguments=["3","256","1","123"], granularity=1), ], ), ) ``` ### How Moose applies changes - On create, Moose emits `INDEX ...` entries inside `CREATE TABLE`. - On change, Moose plans `ALTER TABLE DROP INDEX ` then `ADD INDEX ...` if the definition changed; pure adds/drops are applied as single operations. --- ## Inserting Data Source: moose/olap/insert-data.mdx Insert data into OLAP tables using various methods # Inserting Data Inserting data into your database is a common task. MooseStack provides a few different ways to insert data into your database. If a table column is modeled as optional in your app type but has a ClickHouse default, Moose treats incoming records as optional at the API/stream boundary, but the ClickHouse table stores the column as required with a DEFAULT clause. If you omit the field in the payload, ClickHouse fills it with the default at insert time. `field?: number & ClickHouseDefault<"18">` or `WithDefault` ## From a Stream (Streaming Ingest) When you need to stream data into your ClickHouse tables, you can set the `Stream.destination` as a reference to the `OlapTable` you want to insert into. This will automatically provision a synchronization process that batches and inserts data into the table. ```ts filename="StreamInsert.ts" copy interface Event { id: Key; userId: string; timestamp: Date; eventType: string; } const eventsTable = new OlapTable("Events"); const stream = new Stream("Events", { destination: eventsTable // automatically syncs the stream to the table in ClickHouse-optimized batches }); ``` [ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically batches your data into ClickHouse-optimized batches of up to 100,000 records, with automatic flushing every second. It also handles at-least-once delivery and retries on connection errors to ensure your data is never lost. ## From a Workflow (Batch Insert) If you have data source better suited for batch patterns, use a workflow and the direct `insert()` method to land data into your tables: ```ts filename="WorkflowInsert.ts" copy interface Event { id: Key; userId: string; timestamp: Date; eventType: string; } const eventsTable = new OlapTable("user_events"); const etlTask = new Task({ name: "ETL", run: async () => { const result = await eventsTable.insert([ { id: "evt_1", userId: "user_123", timestamp: new Date(), eventType: "click" }, { id: "evt_2", userId: "user_456", timestamp: new Date(), eventType: "view" } // ... more records of type Event ]); } }) ) ``` ## From a Client App ### Via REST API In your Moose code, you can leverage the built in [MooseAPI module](/moose/apis) to place a `POST` REST API endpoint in front of your streams and tables to allow you to insert data from external applications. ```ts filename="IngestApi.ts" copy const ingestApi = new IngestApi("user_events", { destination: events_stream }); ``` Alternatively, use `IngestPipeline` instead of standalone `IngestApi`, `Stream` `OlapTable` components: ```ts filename="IngestPipeline.ts" copy const eventsPipeline = new IngestPipeline("user_events", { ingestApi: true, stream: true, table: { orderByFields: ["id", "timestamp"], engine: ClickHouseEngines.ReplacingMergeTree, } }) ``` With these APIs you can leverage the built-in OpenAPI client integration to generate API clients in your own language to connect to your pipelines from external applications. ### Coming Soon: MooseClient We're working on a new client library that you can use to interact with your Moose pipelines from external applications. Join the community slack to stay updated and let us know if you're interested in helping us build it. ## Direct Data Insertion The `OlapTable` provides an `insert()` method that allows you to directly insert data into ClickHouse tables with validation and error handling. ### Inserting Arrays of Records ```ts filename="DirectInsert.ts" copy interface UserEvent { id: Key; userId: string; timestamp: Date; eventType: string; } const eventsTable = new OlapTable("user_events"); // Insert single record or array of records const result = await eventsTable.insert([ { id: "evt_1", userId: "user_123", timestamp: new Date(), eventType: "click" }, { id: "evt_2", userId: "user_456", timestamp: new Date(), eventType: "view" } ]); console.log(`Successfully inserted: ${result.successful} records`); console.log(`Failed: ${result.failed} records`); ``` ClickHouse strongly recommends batching inserts. You should avoid inserting single records in to tables, and consider using Moose Streams and Ingest Pipelines if your data source sends events as individual records. ### Handling Large Batch Inserts For large datasets, use Node.js streams for memory-efficient processing: ```ts filename="StreamInsert.ts" copy const dataStream = new Readable({ objectMode: true, read() { // Stream implementation } }); const result = await eventsTable.insert(dataStream, { strategy: 'fail-fast' // Note: 'isolate' not supported with streams }); ``` ### Validation Methods Before inserting data, you can validate it using the following methods: ```ts filename="ValidationMethods.ts" copy // Type guard with compile-time type narrowing if (eventsTable.isValidRecord(unknownData)) { // TypeScript now knows unknownData is UserEvent console.log(unknownData.userId); // Type-safe access } // Detailed validation with error reporting const validationResult = eventsTable.validateRecord(unknownData); if (validationResult.success) { console.log("Valid data:", validationResult.data); } else { console.log("Validation errors:", validationResult.errors); } // Assert validation (throws on failure) try { const validData = eventsTable.assertValidRecord(unknownData); // Use validData with full type safety } catch (error) { console.log("Validation failed:", error.message); } ``` ### Error Handling Strategies Choose from three error handling strategies based on your reliability requirements: #### Fail-Fast Strategy (Default) ```ts filename="FailFast.ts" copy // Stops immediately on any error const result = await eventsTable.insert(data, { strategy: 'fail-fast' }); ``` #### Discard Strategy ```ts filename="Discard.ts" copy // Discards invalid records, continues with valid ones const result = await eventsTable.insert(data, { strategy: 'discard', allowErrors: 10, // Allow up to 10 failed records allowErrorsRatio: 0.05 // Allow up to 5% failure rate }); ``` #### Isolate Strategy ```ts filename="Isolate.ts" copy // Retries individual records to isolate failures const result = await eventsTable.insert(data, { strategy: 'isolate', allowErrorsRatio: 0.1 }); // Access detailed failure information if (result.failedRecords) { result.failedRecords.forEach(failed => { console.log(`Record ${failed.index} failed: ${failed.error}`); }); } ``` ### Performance Optimization The insert API includes several performance optimizations: - **Memoized connections**: ClickHouse clients are reused across insert calls - **Batch processing**: Optimized batch sizes for large datasets - **Async inserts**: Automatic async insert mode for datasets > 1000 records - **Connection management**: Use `close_client()` when completely done ```ts filename="Performance.ts" copy // For high-throughput scenarios const result = await eventsTable.insert(largeDataset, { validate: false, // Skip validation for performance strategy: 'discard' }); // Clean up when completely done (optional) await eventsTable.closeClient(); ``` ## Best Practices --- ## Creating Materialized Views Source: moose/olap/model-materialized-view.mdx Create and configure materialized views for data transformations # Modeling Materialized Views ## Overview Materialized views are write-time transformations in ClickHouse. A static `SELECT` populates a destination table from one or more sources. You query the destination like any other table. The `MaterializedView` class wraps [ClickHouse `MATERIALIZED VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view) and keeps the `SELECT` explicit. When you edit the destination schema in code and update the `SELECT` accordingly, Moose applies the corresponding DDL, orders dependent updates, and backfills as needed, so the pipeline stays consistent as you iterate. In local dev, Moose Migrate generates and applies DDL to your local database. Today, destination schemas are declared in code and kept in sync manually with your `SELECT`. Moose Migrate coordinates DDL and dependencies when you make those changes. A future enhancement will infer the destination schema from the `SELECT` and update it automatically. This dependency awareness is critical for [cascading materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view-with-dependencies). Moose Migrate [orders DDL across views and tables](https://www.fiveonefour.com/blog/Moose-SQL-Getting-DDL-Dependencies-in-Order) to avoid failed migrations and partial states. ### Basic Usage ```ts filename="BasicUsage.ts" copy // Define the schema of the transformed rows-- this is static and it must match the results of your SELECT. It also represents the schema of your entire destination table. interface TargetSchema { id: string; average_rating: number; num_reviews: number; } ); ``` The ClickHouse `MATERIALIZED VIEW` object acts like a trigger: on new inserts into the source table(s), it runs the SELECT and writes the transformed rows to the destination. ### Quick Reference ```typescript filename="ViewOptions.ts" interface MaterializedViewConfig { // Static SELECT that computes the destination rows selectStatement: string | Sql; // Tables/views the query reads from selectTables: (OlapTable | View)[]; // Name of the ClickHouse MATERIALIZED VIEW object materializedViewName: string; // Destination table where materialized rows are stored targetTable?: | OlapTable | { name: string; engine?: ClickHouseEngines; orderByFields?: (keyof T & string)[]; }; /** @deprecated prefer targetTable */ tableName?: string; /** @deprecated prefer targetTable */ engine?: ClickHouseEngines; /** @deprecated prefer targetTable */ orderByFields?: (keyof T & string)[]; } ``` ## Modeling the Target Table The destination table is where the transformed rows are written by the materialized view. You can model it in two ways: ### Option 1 — Define target table inside the MaterializedView (most cases) - Simple, co-located lifecycle: the destination table is created/updated/dropped with the MV. - Best for: projection/denormalization, filtered serving tables, enrichment joins, and most rollups. ```ts filename="InlineTarget.ts" copy interface Dest { id: string; value: number } new MaterializedView({ selectStatement: sql`SELECT id, toInt32(value) AS value FROM ${sourceTable}`, selectTables: [sourceTable], targetTable: { name: "serving_table", orderByFields: ["id"] }, // MergeTree by default materializedViewName: "mv_to_serving_table", }); ``` ### Option 2 — Decoupled: reference a standalone `OlapTable` Certain use cases may benefit from a separate lifecycle for the target table that is managed independently from the MV. ```ts filename="DecoupledTarget.ts" copy interface Dest { id: string; value: number } const targetTable = new OlapTable("target_table", { engine: ClickHouseEngines.MergeTree, orderByFields: ["id"], }); new MaterializedView({ selectStatement: sql`SELECT id, toInt32(value) AS value FROM ${sourceTable}`, selectTables: [sourceTable], targetTable: targetTable, materializedViewName: "mv_to_target_table", }); ``` ### Basic Transformation, Cleaning, Filtering, Denormalization Create a narrower, query-optimized table from a wide source. Apply light transforms (cast, rename, parse) at write time. ```ts filename="Denormalization.ts" copy interface Dest { id: string; value: number; created_at: string } new MaterializedView({ selectStatement: sql` SELECT id, toInt32(value) AS value, created_at FROM ${sourceTable} WHERE active = 1 `, selectTables: [sourceTable], targetTable: { name: "proj_table" }, materializedViewName: "mv_to_proj_table", }); ``` ### Aggregations ### Simple Additive Rollups When you want to maintain running sums (counts, totals) that are additive per key, use the `SummingMergeTree` engine: ```ts filename="Summing.ts" copy interface DailyCounts { day: string; user_id: string; events: number; } const stmt = sql` SELECT toDate(${events.columns.timestamp}) AS day, ${events.columns.user_id} AS user_id, count(*) AS events FROM ${events} GROUP BY day, user_id `; new MaterializedView({ selectStatement: stmt, selectTables: [events], targetTable: { name: "daily_counts", engine: ClickHouseEngines.SummingMergeTree, orderByFields: ["day", "user_id"], }, materializedViewName: "mv_to_daily_counts", }); ``` #### Complex Aggregations When you want to compute complex aggregation metrics that are not just simple additive operations (sum, count, avg, etc), but instead uses more complex anlaytical functions: (topK,percentile, etc), create a target table with the `AggregatingMergeTree` engine. ```ts filename="AggTransform.ts" copy interface MetricsById { id: string; avg_rating: number & Aggregated<"avg", [number]>; daily_uniques: number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]>; } // All Aggregate Functions in this query have a [functionName][State]() suffix. const stmt = sql` SELECT ${events.columns.id} AS id, avgState(${events.columns.rating}) AS avg_rating, uniqExactState(${events.columns.user_id}) AS daily_uniques FROM ${events} GROUP BY ${events.columns.id} `; new MaterializedView({ selectStatement: stmt, selectTables: [events], targetTable: { name: "metrics_by_id", engine: ClickHouseEngines.AggregatingMergeTree, orderByFields: ["id"], }, materializedViewName: "mv_metrics_by_id", }); ``` Jump to the [Advanced: AggregatingMergeTree transformations](#advanced-aggregatingmergetree-transformations) section for more details. ### Fan-in Patterns When you have multiple sources that you want to merge into a single destination table, its best to create an OlapTable and reference it in each MV that needs to write to it: ```ts filename="FanIn.ts" copy interface DailyCounts { day: string; user_id: string; events: number } // Create the destination table explicitly const daily = new OlapTable("daily_counts", { engine: ClickHouseEngines.SummingMergeTree, orderByFields: ["day", "user_id"], }); // MV 1 - write to the daily_counts table const webStmt = sql`SELECT toDate(ts) AS day, user_id, 1 AS events FROM ${webEvents}`; const mv1 = new MaterializedView({ selectStatement: webStmt, selectTables: [webEvents], targetTable: daily, materializedViewName: "mv_web_to_daily_counts", }); // MV 2 - write to the daily_counts table const mobileStmt = sql`SELECT toDate(ts) AS day, user_id, 1 AS events FROM ${mobileEvents}`; const mv2 = new MaterializedView({ selectStatement: mobileStmt, selectTables: [mobileEvents], targetTable: daily, materializedViewName: "mv_mobile_to_daily_counts", }); ``` ### Blue/green schema migrations Create a new table for a breaking schema change and use an MV to copy data from the old table; when complete, switch reads to the new table and drop just the MV and old table. For more information on how to use materialized views to perform blue/green schema migrations, see the [Schema Versioning](./schema-versioning) guide. ## Defining the transformation The `selectStatement` is a static SQL query that Moose runs to transform data from your source table(s) into rows for the destination table. Transformations are defined as ClickHouse SQL queries. We strongly recommend using the ClickHouse SQL reference and functions overview to help you develop your transformations. - Use the Moose `sql` template to interpolate tables and columns safely. This gives type-checked column references and prevents runtime parameters. - Reference tables and columns via objects in your project (e.g., `${sourceTable}`, `${sourceTable.columns.id}`) rather than string literals. ```ts filename="Transformation.ts" copy interface Dest { id: string; name: string; day: string } const transformation = sql` SELECT ${users.columns.id} AS id, ${users.columns.name} AS name, toDate(${events.columns.ts}) AS day FROM ${events} JOIN ${users} ON ${events.columns.user_id} = ${users.columns.id} WHERE ${events.columns.active} = 1 `; new MaterializedView({ selectStatement: transformation, selectTables: [events, users], targetTable: { name: "user_activity_by_day" }, materializedViewName: "mv_user_activity_by_day", }); ``` The columns returned by your `SELECT` must exactly match the destination table schema. - Use column aliases (`AS target_column_name`) to align names. - All destination columns must be present in the `SELECT`, or the materialized view won't be created. Adjust your transformation or table schema so they match. Go to the [Advanced: Writing SELECT statements to Aggregated tables](#writing-select-statements-to-aggregated-tables) section for more details. ## Backfill Destination Tables When the MaterializedView is created, Moose backfills the destination once by running your `SELECT` (so you start with a fully populated table). Materialized views that source from S3Queue tables are **not backfilled** automatically. S3Queue tables only process new files added to S3 after the table is created - there is no historical data to backfill from. The MV will start populating as new files arrive in S3. You can see the SQL that Moose will run to backfill the destination table when you generate the [Migration Plan](./migration-plan). During dev mode, as soon as you save the MaterializedView, Moose will run the backfill and you can see the results in the destination table by querying it in your local ClickHouse instance. ## Query Destination Tables You can query the destination table like any other table. For inline or decoupled target tables, you can reference target table columns and tables directly in your queries: ```ts filename="Query.ts" copy // Inline-defined destination table from earlier examples const q = sql` SELECT ${mv.targetTable.columns.id}, ${mv.targetTable.columns.value} FROM ${mv.targetTable} ORDER BY ${mv.targetTable.columns.id} LIMIT 10`; ``` If you define your target table outside of the MaterializedView, you can also just reference the table by its variable name in your queries: ```ts filename="QueryDecoupled.ts" copy const targetTable = new OlapTable<{ id: string; average_rating: number }>("target_table") // Assuming `targetTable` is the OlapTable you created explicitly const q = sql` SELECT ${targetTable.columns.id}, ${targetTable.columns.average_rating} FROM ${targetTable} WHERE ${targetTable.columns.id} = 'abc' `; ``` Go to the [Querying Aggregated tables](#querying-aggregated-tables) section for more details on how to query Aggregated tables. ## Advanced: Aggregations + Materialized Views This section dives deeper into advanced patterns and tradeoffs when building aggregated materialized views. ### Target Tables with `AggregatingMergeTree` When using an `AggregatingMergeTree` target table, you must use the `AggregateFunction` type to model the result of the aggregation functions: ```ts filename="AggTransform.ts" copy interface MetricsById { id: string; /** * Result of avgState(events.rating) * - avgState(number) returns number, so model the type as number * - Aggregated arg type is [number] because the column (events.rating) is a number * - Aggregated function name is "avg" */ avg_rating: number & Aggregated<"avg", [number]>; /** * Result of uniqExactState(events.user_id) * - uniqExact returns an integer; use number & ClickHouseInt<"uint64"> for precision * - Aggregated arg type is [string] because the column (events.user_id) is a string * - Aggregated function name is "uniqExact" */ daily_uniques: number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]>; } // All Aggregate Functions in this query have a [functionName][State]() suffix. const stmt = sql` SELECT ${events.columns.id} AS id, avgState(${events.columns.rating}) AS avg_rating, uniqExactState(${events.columns.user_id}) AS daily_uniques FROM ${events} GROUP BY ${events.columns.id} `; new MaterializedView({ selectStatement: stmt, selectTables: [events], targetTable: { name: "metrics_by_id", engine: ClickHouseEngines.AggregatingMergeTree, orderByFields: ["id"], }, materializedViewName: "mv_metrics_by_id", }); ``` - Using `avg()`/`uniqExact()` in the SELECT instead of `avgState()`/`uniqExactState()` - Forgetting to annotate the schema with `Aggregated<...>` so the target table can be created correctly - Mismatch between `GROUP BY` keys in your `SELECT` and the `orderByFields` of your target table ### Modeling columns with `AggregateFunction` - Pattern: `U & Aggregated<"agg_func_name", [Types]>` - `U` is the read-time type (e.g., `number`, `string`) - `agg_func_name` is the aggregation name (e.g., `avg`, `uniqExact`) - `Types` are the argument types. These are the types of the columns that are being aggregated. ```ts filename="FunctionToTypeMapping.ts" copy number & Aggregated<"avg", [number]> // avgState(col: number) number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]> // uniqExactState(col: string) number & ClickHouseInt<"uint64"> & Aggregated<"count", []> // countState(col: any) string & Aggregated<"argMax", [string, Date]> // argMaxState(col: string, value: Date) string & Aggregated<"argMin", [string, Date]> // argMinState(col: string, value: Date) number & Aggregated<"corr", [number, number]> // corrState(col1: number, col2: number) ``` ### Writing SELECT statements to Aggregated tables When you write to an `AggregatingMergeTree` table, you must add a `State` suffix to the aggregation functions in your `SELECT` statement. ```ts filename="AggTransform.ts" copy interface MetricsById { id: string; avg_rating: number & Aggregated<"avg", [number]>; total_reviews: number & Aggregated<"sum", [number]>; } const aggStmt = sql` SELECT ${reviews.columns.id} AS id, avgState(${reviews.columns.rating}) AS avg_rating, countState(${reviews.columns.id}) AS total_reviews FROM ${reviews} GROUP BY ${reviews.columns.id} `; const mv = new MaterializedView({ selectStatement: aggStmt, selectTables: [reviews], targetTable: { name: "metrics_by_id", engine: ClickHouseEngines.AggregatingMergeTree, orderByFields: ["id"], }, materializedViewName: "mv_metrics_by_id", }); ``` Why states? Finalized values (e.g., `avg()`) are not incrementally mergeable. Storing states lets ClickHouse maintain results efficiently as new data arrives. Docs: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/index and https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state ### Querying Aggregated Tables When you query a table with an `AggregatingMergeTree` engine, you must use aggregate functions with the `Merge` suffix (e.g., `avgMerge`) `or rely on Moose’s `Aggregated` typing plus `sql` to auto-finalize at query time. ```ts filename="QueryAgg.ts" copy // Auto-finalized via Aggregated + sql const cols = mv.targetTable.columns; // mv from earlier Agg example const autoFinalized = sql` SELECT ${cols.avg_rating}, ${cols.total_reviews} FROM ${mv.targetTable} WHERE ${cols.id} = '123' `; // Manual finalization (explicit ...Merge) const manual = sql` SELECT avgMerge(avg_rating) AS avg_rating, countMerge(total_reviews) AS total_reviews FROM metrics_by_id WHERE id = '123' `; ``` ## Choosing the right engine - Use `MergeTree` for copies/filters/enrichment without aggregation semantics. - Use `SummingMergeTree` when all measures are additive, and you want compact, eventually-consistent sums. - Use `AggregatingMergeTree` for non-additive metrics and advanced functions; store states and finalize on read. - Use `ReplacingMergeTree` for dedup/upserts or as an idempotent staging layer before rollups. --- ## Modeling Tables Source: moose/olap/model-table.mdx Model your database schema in code using native TypeScript/Python typing # Modeling Tables ## Overview Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing. You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more. ```ts filename="FirstTable.ts" copy interface MyFirstTable { id: Key; name: string; age: number; } // Create a table named "first_table" ); // For deduplication, use the ReplacingMergeTree factory ); // Now you can: // - Write to this table from streams // - Query it directly // - Use it as a source for materialized views ``` ### Creating Tables in Ingestion Pipelines For end-to-end data flows, create tables as part of an ingestion pipeline: ```ts filename="PipelineTable.ts" // Define your schema interface UserEvent { id: Key; userId: string; timestamp: Date; eventType: string; } // Create a complete ingestion pipeline with a table const eventsPipeline = new IngestPipeline("user_events", { ingestApi: true, // Creates a REST API endpoint at POST localhost:4000/ingest/user_events stream: true, // Creates Kafka/Redpanda topic table: { // Creates and configures the table named "user_events" orderByFields: ["id", "timestamp"] } }); // Access the table component when needed const eventsTable = eventsPipeline.table; ``` ## Data Modeling ### Special ClickHouse Types (LowCardinality, Nullable, etc) ```ts filename="ClickHouseTypes.ts" copy export interface ClickHouseOptimizedExample { id: Key; stringField: string; numberField: number; decimalField: Decimal<10, 2>; // Precise decimal storage // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works lowCardinalityField: string & LowCardinality; // Faster queries for enum-like data nestedObject: { innerString: string; innerNumber: number; }; namedTupleField: { name: string; value: number; } & ClickHouseNamedTuple; // Optimized nested storage numberArray: number[]; mapField: Record; literalField: "optionA" | "optionB"; optionalField?: string; // Nullable field dateField: Date; } ``` ### Default values Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL. ```ts filename="Defaults.ts" copy interface Event { id: Key; // Static defaults (ClickHouse expression as a string literal) status: string & ClickHouseDefault<"'pending'">; // DEFAULT 'pending' retries: number & ClickHouseDefault<"0">; // DEFAULT 0 // Server-side timestamps createdAt: Date & ClickHouseDefault<"now()">; // DEFAULT now() // Decimal with default amount: Decimal<10, 2> & ClickHouseDefault<"0">; // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax } ); ``` The value passed into the `ClickHouseDefault<"">` tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on `Date` fields with `ClickHouseDefault`, use `WithDefault` as a fallback workaround. If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause. - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type. - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault`) → non-nullable column with default `18`. This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists. ### Database Selection By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field: ```ts filename="DatabaseOverride.ts" copy interface UserData { id: Key; name: string; email: string; } // Table in default database (from moose.config.toml) const defaultTable = new OlapTable("users"); // Table in specific database (e.g., "analytics") const analyticsTable = new OlapTable("users", { database: "analytics", orderByFields: ["id"] }); ``` To use custom databases, configure them in your `moose.config.toml`: ```toml [clickhouse_config] db_name = "local" additional_databases = ["analytics", "staging"] ``` The databases in `additional_databases` will be created automatically when you start your Moose application. ### Primary Keys and Sorting You must configure table indexing using one of these approaches: 1. Define at least one `Key` in your table schema 2. Specify `orderByFields` in the table config 3. Use both (all `Key` fields must come first in the `orderByFields` array) ```ts filename="PrimaryKeyConfig.ts" copy // Approach 1: Using primary key only interface Record1 { id: Key; // Primary key field field1: string; field2: number; } const table1 = new OlapTable("table1"); // id is the primary key ``` ### Order By Fields Only ```ts filename="OrderByFieldsOnly.ts" copy // Approach 2: Using orderByFields only interface SchemaWithoutPrimaryKey { field1: string; field2: number; field3: Date; } const tableWithOrderByFieldsOnly = new OlapTable("table2", { orderByFields: ["field1", "field2"] // Specify ordering without primary key }); ``` ### Order By Expression Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely. ```ts filename="OrderByExpression.ts" copy // Use a ClickHouse expression for ORDER BY interface Events { userId: string; createdAt: Date; eventType: string; } const tableWithOrderByExpression = new OlapTable("events", { // Equivalent to orderByFields: ["userId", "createdAt", "eventType"] orderByExpression: "(userId, createdAt, eventType)", }); // Advanced: functions inside expression const tableWithMonthBucketing = new OlapTable("events_by_month", { orderByExpression: "(userId, toYYYYMM(createdAt))", }); // No sorting (ClickHouse tuple() means empty ORDER BY) const unsortedTable = new OlapTable("events_unsorted", { orderByExpression: "tuple()", }); ``` ### Using Both Primary Key and Order By Fields ```ts filename="ComboKeyAndOrderByFields.ts" copy // Approach 3: Using both (primary key must be first) interface SchemaWithKey { id: Key; // Primary key field field1: string; field2: number; } const tableWithKeyAndOrderByFields = new OlapTable("table3", { orderByFields: ["id", "field1"] // Primary key must be first }); ``` ### Using Multiple Primary Keys ```ts filename="MultiKeyTable.ts" copy interface MultiKeyRecord { key1: Key; key2: Key; field1: string; } const multiKeyTable = new OlapTable("multi_key_table", { orderByFields: ["key1", "key2", "field1"] // Multiple keys must come first }); ``` ### Table engines By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration. ```ts filename="TableEngine.ts" copy // Default MergeTree engine const table = new OlapTable("table", { orderByFields: ["id"] }); // Use engine configuration for other engines const dedupTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "version", // Optional: keeps row with highest version isDeleted: "deleted" // Optional: soft delete when deleted=1 }); ``` #### Deduplication (`ReplacingMergeTree`) Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key: ```ts filename="DeduplicatedTable.ts" copy // Basic deduplication const table = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"] }); // With version column (keeps record with highest version) const versionedTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at" // Column that determines which version to keep }); // With soft deletes (requires ver parameter) const softDeleteTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at", isDeleted: "deleted" // UInt8 column: 1 marks row for deletion }); ``` ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately. **Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key. **Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type. For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree). #### Streaming from S3 (`S3Queue`) Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added: S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details. ```ts filename="S3StreamingTable.ts" copy // Use direct configuration (S3Queue does not support orderByFields) ); ``` S3Queue is a streaming engine and does not support `orderByFields` or ORDER BY clauses. Configure only engine-specific parameters like `s3Path`, `format`, and `settings`. **Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities. **Solution**: Use `mooseRuntimeEnv` for runtime credential resolution: ```ts filename="SecureS3Streaming.ts" copy // ✅ RECOMMENDED: Runtime environment variable resolution ); ``` **Then set environment variables:** ```bash filename="Terminal" copy export AWS_ACCESS_KEY_ID="AKIA..." export AWS_SECRET_ACCESS_KEY="your-secret-key" moose prod up ``` **Benefits:** - Credentials never embedded in Docker images - Supports credential rotation (changing passwords triggers table recreation) - Different credentials per environment (dev/staging/prod) - Clear error messages if environment variables are missing S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas. #### Direct S3 Access (`S3`) Use the `S3` engine for direct read/write access to S3 storage without streaming semantics: ```ts filename="S3Table.ts" copy // S3 table with credentials (recommended with mooseRuntimeEnv) ); // Public S3 bucket (no authentication) ); ``` - **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data. - **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion. Both engines support the same credential management and format options. #### In-Memory Buffer (`Buffer`) The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds: ```ts filename="BufferTable.ts" copy // First create the destination table ); // Then create buffer that writes to it ); ``` - Data in buffer is **lost if server crashes** before flush - Not suitable for critical data that must be durable - Best for high-throughput scenarios where minor data loss is acceptable - Buffer and destination table must have identical schemas - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer). #### Distributed Tables (`Distributed`) The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling: ```ts filename="DistributedTable.ts" copy // Distributed table across cluster ); ``` - Requires a configured ClickHouse cluster with remote_servers configuration - The local table must exist on all cluster nodes - Distributed tables are virtual - data is stored in local tables - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables - The `cluster` name must match a cluster defined in your ClickHouse configuration For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). #### Replicated Engines Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants: - `ReplicatedMergeTree` - Replicated version of MergeTree - `ReplicatedReplacingMergeTree` - Replicated with deduplication - `ReplicatedAggregatingMergeTree` - Replicated with aggregation - `ReplicatedSummingMergeTree` - Replicated with summation ```ts filename="ReplicatedEngines.ts" copy // Basic replicated table with explicit paths const replicatedTable = new OlapTable("records", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/records", replicaName: "{replica}", orderByFields: ["id"] }); // Replicated with deduplication const replicatedDedup = new OlapTable("dedup_records", { engine: ClickHouseEngines.ReplicatedReplacingMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records", replicaName: "{replica}", ver: "updated_at", isDeleted: "deleted", orderByFields: ["id"] }); // For ClickHouse Cloud or Boreal (no parameters needed) const cloudReplicated = new OlapTable("cloud_records", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"] }); ``` The `keeper_path` and `replica_name` parameters are **optional** for replicated engines: - **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse). - **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster. **Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime. For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication). ## Externally Managed Tables If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`. ```ts filename="ExternallyManagedTable.ts" copy // Table managed by external system const externalTable = new OlapTable("external_users", { orderByFields: ["id", "timestamp"], lifeCycle: LifeCycle.EXTERNALLY_MANAGED // Moose won't create or modify this table }); ``` Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation. ## Invalid Configurations ```ts filename="InvalidConfig.ts" copy // Error: No primary key or orderByFields interface BadRecord1 { field1: string; field2: number; } const badTable1 = new OlapTable("bad_table1"); // Error: Primary key not first in orderByFields interface BadRecord2 { id: Key; field1: string; } const badTable2 = new OlapTable("bad_table2", { orderByFields: ["field1", "id"] // Wrong order - primary key must be first }); // Error: Nullable field in orderByFields interface BadRecord3 { id: Key; field1: string; field2?: number; } const badTable3 = new OlapTable("bad_table3", { orderByFields: ["id", "field2"] // Can't have nullable field in orderByFields }); ``` ## Development Workflow ### Local Development with Hot Reloading One of the powerful features of Moose is its integration with the local development server: 1. Start your local development server with `moose dev` 2. When you define or modify an `OlapTable` in your code and save the file: - The changes are automatically detected - The TypeScript compiler plugin processes your schema definitions - The infrastructure is updated in real-time to match your code changes - Your tables are immediately available for testing For example, if you add a new field to your schema: ```ts filename="HotReloading.ts" copy // Before interface BasicSchema { id: Key; name: string; } // After adding a field interface BasicSchema { id: Key; name: string; createdAt: Date; // New field } ``` The Moose framework will: 1. Detect the change when you save the file 2. Update the table schema in the local ClickHouse instance 3. Make the new field immediately available for use ### Verifying Your Tables You can verify your tables were created correctly using: ```bash filename="Terminal" copy # List all tables in your local environment moose ls ``` #### Connecting to your local ClickHouse instance You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file: ```toml filename="moose.config.toml" copy [clickhouse_config] db_name = "local" user = "panda" password = "pandapass" use_ssl = false host = "localhost" host_port = 18123 native_port = 9000 ``` --- ## Modeling Views Source: moose/olap/model-view.mdx Define standard ClickHouse Views for read-time projections # Modeling Views ## Overview Views are read-time projections in ClickHouse. A static `SELECT` defines the view over one or more base tables or other views. Moose wraps [ClickHouse `VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view) with a simple `View` class in TypeScript. You provide the view name, the `SELECT`, and the list of source tables/views so Moose can order DDL correctly during migrations. Use `View` when you want a virtual read-time projection and don’t need write-time transformation or a separate storage table. For write-time pipelines and backfills, use a Materialized View instead. ## Basic Usage ```ts filename="BasicUsage.ts" copy `. --- ## Planned Migrations (OLAP) Source: moose/olap/planned-migrations.mdx Generate, review, and safely execute ClickHouse DDL plans # Planned Migrations Migration planning is a new way to have more fine-grained control over HOW database schema changes are applied to your database when you deploy your code into production. ## Why planned migrations? Most database migrations are designed under the assumption that your code is the sole owner of the database schema. In OLAP databases, we have to be more careful and assume that schema changes can happen at any time: - The database schema is shared with other services (e.g. Change Data Capture services like ClickPipes) - Other users (e.g. analysts) of the database may have credentials that let them change the schema This is why the plan is generated from the remote environment, and validated against the live state of the database at the time of deployment. If it detects a drift, it will abort the deployment and require you to regenerate the plan, to make sure you are not dropping data unintentionally. Planned migrations apply only to OLAP (ClickHouse) schema changes. Streaming, APIs, and processes are unaffected by this flow. ## What this does - Generates an ordered set of ClickHouse operations and writes them to `./migrations/plan.yaml` - Saves two validation snapshots for drift detection: - `./migrations/remote_state.json` (state when plan was created) - `./migrations/local_infra_map.json` (desired state from your local code) - When enabled, validates state and executes the exact reviewed operations ## Prerequisites ```toml file="moose.config.toml" [features] olap = true ddl_plan = true ``` ## Generating a Plan Once done editing your code in your feature branch, you can generate a plan that diffs your local code against your live remote database: **For Moose server deployments:** ```bash filename="Terminal" copy moose generate migration --url https:// --token --save ``` **For serverless deployments:** ```bash filename="Terminal" copy moose generate migration --clickhouse-url clickhouse://user:pass@host:port/db --save ``` Outputs: ```text ./migrations/plan.yaml ./migrations/remote_state.json ./migrations/local_infra_map.json ``` What each file contains: - `remote_state.json`: The state of the remote database when the plan was generated. - `local_infra_map.json`: The state of the local code when the plan was generated. - `plan.yaml`: The plan to apply to the remote database based on the diff between the two states. You will commit the entire `migrations/` directory to version control, and Moose will automatically apply the plan when you deploy the code to production. ## Review and edit the plan Moose makes some assumptions about your schema changes, such as renaming a column instead of dropping and adding. You can modify the plan to override these assumptions. Open `plan.yaml` in your PR. Operations are ordered (teardown first, then setup) to avoid dependency issues. Review like regular code. You can also edit the plan to override the default assumptions Moose makes. ```yaml filename="migrations/plan.yaml" copy # Drop a deprecated column - DropTableColumn: table: "events" column_name: "deprecated_field" # Rename a column to match code - RenameTableColumn: table: "events" before_column_name: "createdAt" after_column_name: "created_at" # Add a new nullable column after created_at - AddTableColumn: table: "events" column: name: "status" data_type: "String" required: false unique: false primary_key: false default: null annotations: [] comment: null after_column: "created_at" # Change a column type to Nullable(Float64) - ModifyTableColumn: table: "events" before_column: name: "value" data_type: "Float64" required: false unique: false primary_key: false default: null annotations: [] comment: null after_column: name: "value" data_type: Nullable: nullable: "Float64" required: false unique: false primary_key: false default: null annotations: [] comment: null # Create a simple view via raw SQL - RawSql: sql: - "CREATE VIEW IF NOT EXISTS `events_by_user` AS SELECT user_id, count() AS c FROM events GROUP BY user_id" description: "Creating view events_by_user" ``` You can edit the plan to override the default assumptions Moose makes. ### When to edit the plan There are two main reasons to edit the plan: 1. To "override" the default assumptions Moose makes when it cannot infer the intent of your schema changes, such as renaming a column instead of dropping and adding. 2. To add new operations that are not covered by the default assumptions, such as adding a backfill operation to a new column. #### Rename a column instead of drop/add When you rename a column, Moose will default to dropping and adding the column. However, you can override this by using the `RenameTableColumn` operation: ```ts interface SourceSchema { created_at: Date; color: string; } interface TargetSchema { createdAt: Date; color: string; } sourceTable = new OlapTable("source_table"); targetTable = new OlapTable("target_table"); ``` ```yaml filename="migrations/plan.yaml" copy - DropTableColumn: table: source_table column_name: created_at - AddTableColumn: table: source_table column: name: createdAt data_type: DateTime required: true unique: false primary_key: false default: null annotations: [] after_column: color ``` In the plan, you can override this by using the `RenameTableColumn` operation: ```yaml filename="migrations/plan.yaml" copy created_at: 2025-08-20T05:35:31.668353Z - RenameTableColumn: table: source_table before_column_name: created_at after_column_name: createdAt ``` #### Add a backfill operation to a new column When you add a new column, Moose will default to backfilling the column based on the value in the `default` field. If your field is a `DateTime`, you can edit the plan to set the default value to the current timestamp: ```yaml filename="migrations/plan.yaml" copy - AddTableColumn: table: "source_table" column: name: "created_at" data_type: "DateTime" required: false unique: false default: NOW ## Specify the default value to the current timestamp ``` You can also override the the default behavior by using the `RawSql` operation to define your own custom backfill logic: ```yaml filename="migrations/plan.yaml" copy - AddTableColumn: table: "source_table" column: name: "created_at" data_type: "DateTime" required: false unique: false default: null - RawSql: sql: - "UPDATE events SET created_at = toDateTime(created_at_ms / 1000) WHERE created_at IS NULL" description: "Backfill created_at from created_at_ms" ``` ## Deployment Flows ### Moose Server Deployments For Moose server deployments (with `moose prod` running), migrations are applied automatically on startup. Generate plans using: ```bash filename="Terminal" copy moose generate migration --url https:// --token --save ``` When you deploy, Moose validates the plan and executes it automatically. ### Serverless Deployments For serverless deployments (no Moose server), you manage migrations manually using the ClickHouse connection directly: ```toml file="moose.config.toml" [state_config] storage = "clickhouse" [features] olap = true data_model_v2 = true ``` **Workflow:** 1. **Generate the plan** from your ClickHouse database: ```bash filename="Terminal" copy moose generate migration --clickhouse-url --save ``` 2. **Review** the generated `./migrations/` files in your PR 3. **Execute the plan** against your ClickHouse with CI/CD or manually: ```bash filename="Terminal" copy moose migrate --clickhouse-url ``` Before applying the plan, Moose will first validate that the snapshot of your database that was taken when you generated the plan is still the same as the current database state. If it is not, Moose will abort the deployment. If it is, Moose will execute the plan in `plan.yaml` against your production database. Execution rules: - If current tables in your live production database differ from `remote_state.json`, Moose aborts (remote drift since planning). - If desired tables in your local code differ from `local_infra_map.json`, Moose aborts (code changed since planning). - If both match, `plan.yaml` operations are executed in order against ClickHouse. ## Troubleshooting - Failure to connect to remote database? Make sure you have [your admin API key setup correctly](./apis/auth#admin-endpoints) - Plan rejected due to drift: Re-generate a plan against the current remote, review, and retry. - No execution in moose server deployments: Ensure `ddl_plan = true` and `./migrations/plan.yaml` exists. - OLAP disabled: Ensure `[features].olap = true`. --- ## Querying Data Source: moose/olap/read-data.mdx Query OLAP tables using SQL with type safety # Querying Data Moose provides type-safe SQL querying for your `OlapTable` and `MaterializedView` instances. Use cases include: - Building APIs to expose your data to client/frontend applications - Building transformation pipelines inside your database with materialized views ## Querying with MooseClient Use `MooseClient` to query data from existing tables and materialized views. ### Basic Querying ```ts filename="BasicQuerying.ts" const client = new MooseClient(); // Query existing table const query = sql` SELECT id, name, email FROM ${UserTable} WHERE status = 'active' LIMIT 10 `; const result = await client.query.execute(query); const data = await result.json(); ``` ### Querying Materialized Views ```ts filename="QueryMaterializedView.ts" const client = new MooseClient(); // Query existing materialized view const query = sql` SELECT user_id, total_orders, average_order_value FROM user_stats_view WHERE total_orders > 10 ORDER BY average_order_value DESC `; const result = await client.query.execute(query); ``` ## Select With Column and Table References ```ts filename="TypedReferences.ts" // Reference table columns with type safety const cols = UserTable.columns; const query = sql` SELECT ${cols.id}, ${cols.name}, ${cols.email} FROM ${UserTable} WHERE ${cols.status} = 'active' `; // Multiple table references const joinQuery = sql` SELECT ${UserTable.columns.id}, ${UserTable.columns.name}, ${OrderTable.columns.order_value} FROM ${UserTable} JOIN ${OrderTable} ON ${UserTable.columns.id} = ${OrderTable.columns.user_id} `; ``` When you query a materialized view, you reference the `MaterializedView.targetTable` to get the columns of the target table. ```ts filename="TypedReferences.ts" const query = sql` SELECT ${ExampleMaterializedView.targetTable.columns.id}, ${ExampleMaterializedView.targetTable.columns.name}, ${ExampleMaterializedView.targetTable.columns.email} FROM ${ExampleMaterializedView.targetTable} `; ``` In ClickHouse, when you query a Materialized View that has columns of type `AggregateFunction` in the result set, ordinarily you would need to run: ```sql SELECT sumMerge(amount) FROM {ExampleMaterializedView} ``` When querying this with Moose, you can just reference the column name in the `sql` template literal. The interpolation will be replaced with the correct ClickHouse function: ```ts filename="TypedReferences.ts" const query = sql` SELECT ${ExampleMaterializedView.targetTable.columns.amount} FROM ${ExampleMaterializedView.targetTable} `; // This will be replaced with: // SELECT sumMerge(amount) FROM {ExampleMaterializedView} ``` ## Filtering with WHERE Clauses ```ts filename="WhereClauses.ts" // Multiple WHERE conditions const filterQuery = sql` SELECT ${UserTable.columns.id}, ${UserTable.columns.name} FROM ${UserTable} WHERE ${UserTable.columns.status} = 'active' AND ${UserTable.columns.created_at} > '2024-01-01' AND ${UserTable.columns.email} ILIKE ${'%' + searchTerm + '%'} `; // Using IN clauses const inQuery = sql` SELECT * FROM ${UserTable} WHERE ${UserTable.columns.id} IN (${userIds}) `; // Using BETWEEN const rangeQuery = sql` SELECT * FROM ${UserTable} WHERE ${UserTable.columns.age} BETWEEN ${minAge} AND ${maxAge} `; ``` ## Dynamic Query Building Use the `sql` template literal to build safe queries: ```ts filename="SqlTemplateLiterals.ts" // Safe interpolation with sql template literal const status = 'active'; const limit = 10; const query = sql` SELECT id, name, email FROM ${UserTable} WHERE ${UserTable.columns.status} = ${status} LIMIT ${limit} `; // Conditional WHERE clauses interface FilterParams { minAge?: number; status?: "active" | "inactive"; searchText?: string; } const buildConditionalQuery = (filters: FilterParams) => { let conditions = []; if (filters.minAge !== undefined) { conditions.push(sql`age >= ${filters.minAge}`); } if (filters.status) { conditions.push(sql`status = ${filters.status}`); } if (filters.searchText) { conditions.push(sql`(name ILIKE ${'%' + filters.searchText + '%'} OR email ILIKE ${'%' + filters.searchText + '%'})`); } let query = sql`SELECT * FROM ${UserTable}`; if (conditions.length > 0) { query = sql`${query} WHERE ${conditions.join(' AND ')}`; } return sql`${query} ORDER BY created_at DESC`; }; ``` ## Building APIs To build REST APIs that expose your data, see the [Bring Your Own API Framework documentation](/moose/app-api-frameworks) for comprehensive examples and patterns using Express, Koa, Fastify, or FastAPI. ## Common Pitfalls - **Column name typos**: Use `UserTable.columns.columnName` for autocomplete - **Type mismatches**: Ensure your schema types match ClickHouse types - **Missing imports**: Import your table definitions before using them - **Template literal syntax**: Use backticks `sql` not regular strings - **Forgetting await**: Always await `client.query.execute()` ## Performance Optimization If your query is slower than expected, there are a few things you can check: - If using filters, try to filter on a column that is defined in the `orderByFields` of the table - For common queries, consider [creating a materialized view](/stack/olap/create-materialized-view) to pre-compute the result set ## Further Reading --- ## schema-change Source: moose/olap/schema-change.mdx # Handling Failed Migrations One of the main benefits of the Moose local development environment is that you can detect breaking schema changes before they happen in production. This can be specifically useful for identifying incompatible data type changes when you change a column's data type and the generated migration cannot cast the existing data to the new type. This page describes how to recover from a failed migration in dev and gives a playbook for safely achieving the desired type change. ## What happened You changed a column’s data type on a table that already has data. The dev migration tried to run an in-place ALTER and ClickHouse created a mutation that failed (incompatible cast, nullability, defaults, etc.). Symptoms: - Failed migration in dev - A stuck mutation on the table - Reverting your code type alone doesn’t help until the mutation is cleared ## Quick recovery (dev) Follow these steps to get unblocked quickly. ### View the terminal logs to see the failing mutation In your terminal, you should see a message like this: ```txt ⢹ Processing Infrastructure changes from file watcher ~ Table events: Column changes: ~ value: String -> Float64 Applying: ALTER TABLE events MODIFY COLUMN value Float64 ClickHouse mutation created: mutation_id='00000001-0000-4000-8000-000000000123' Error: Code: 368. Conversion failed: cannot parse 'abc' as Float64 (column: value) Status: mutation failed; table may be partially transformed ``` Copy the mutation ID from the terminal logs and run the following command to kill the mutation. ### Kill the mutation - If you have the `mutation_id`: ```sql KILL MUTATION WHERE mutation_id = ''; ``` - If you didn’t capture the ID, find it and kill by table: ```sql SELECT mutation_id, command, is_done, latest_fail_reason FROM system.mutations WHERE database = currentDatabase() AND table = '' ORDER BY create_time DESC; KILL MUTATION WHERE database = currentDatabase() AND table = ''; ``` ClickHouse ALTERs are implemented as asynchronous mutations, not transactional. If a mutation fails mid-way, some parts may have been rewritten while others were not, leaving the table partially transformed. The failed mutation also remains queued until you kill it. Clear the mutation first, then proceed. Soon, Moose will automatically generate a local DDL plan that kills the mutation and "rolls back" the transformation to the data that was changed before the failure occurred. ### Revert your code to match the current DB schema - Change the column type in code back to the previous (working) type - Save your changes; let `moose dev` resync. You should be able to query the table again If the table only has disposable dev data, you can also `TRUNCATE TABLE .` or drop/recreate the table and let `moose dev` rebuild it. Only do this in dev. ## Safely achieving the desired type change Instead of editing the column type in place, you can add a new column with the target type and backfill the data. This is the recommended approach. ### Add a new column + backfill ```ts filename="app/tables/events.ts" copy enum StatusEnum { "active" = "active", "inactive" = "inactive", } interface Event { id: Key; name: string; createdAt: Date; status: string; status_v2: StatusEnum; // New field - will trigger migration } ``` Then, generate a plan to add the new column and backfill the data. ```bash moose generate migration --url --save --token ``` Open the generated `/migrations/plan.yaml` file. You'll see the `AddTableColumn` operation to add the new column. Right after it, you can add a `RawSql` operation to backfill the data. Here you can write an `ALTER TABLE` statement to update the new column with the data from the old column: ```yaml filename="migrations/plan.yaml" - AddTableColumn: table: "events" column: name: "status_v2" data_type: Nullable: nullable: "StatusEnum" default: null - RawSql: sql: - "ALTER TABLE events UPDATE status_v2 = toStatusEnumOrNull(status) WHERE status_v2 IS NULL" description: "Backfill status_v2 from status" ``` Then, when writing to the table, double write to both columns. This allows for all surrounding processes and applications that rely on the old column to continue working, and you can later deprecate the old column and rename the new column when you are ready. ### Later, deprecate the old column and rename the new column Once the column backfill is complete and you are ready to deprecate the old column, you can rename the new column to the old column name and apply this in a new, subsequent PR. In your code, you can rename the column and deprecate the old column: ```ts filename="app/tables/events.ts" copy interface Event { id: Key; name: string; createdAt: Date; status_old: string; // rename status to status_old status: StatusEnum; // rename status_v2 to status } ``` Initially you'll see two `DeleteTableColumn` operations, followed by two `AddTableColumn` operations. *IMPORTANT*: DELETE ALL FOUR GENERATED `DeleteTableColumn` AND `AddTableColumn` OPERATIONS WITH THE FOLLOWING: ```yaml filename="migrations/plan.yaml" - RenameTableColumn: table: "events" before_column_name: "status" after_column_name: "status_old" - RenameTableColumn: table: "events" before_column_name: "status_v2" after_column_name: "status" ``` Once the old column is no longer needed, you can drop it in a third PR. ```yaml filename="migrations/plan.yaml" - DropTableColumn: table: "events" column_name: "status_old" ``` ## Common breaking cases - String -> Int/Float: can fail on non-numeric rows; prefer `toInt64OrNull(...)`/`toFloat64OrNull(...)` + backfill - Nullable(T) -> T (NOT NULL): fails if any NULLs exist and no default is provided; backfill then drop nullability - Narrowing types (e.g., Int64 -> Int32): fails if values overflow; validate and transform first Read about migration planning and how to use it to safely manage schema changes in production. --- ## schema-optimization Source: moose/olap/schema-optimization.mdx # Schema Optimization Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements. ## Data Types Keep the following best practices in mind when defining your column types: ### Avoid Nullable Columns Nullable columns in ClickHouse have significant performance overhead. Instead of using `?` or `| undefined`, add the `& ClickHouseDefault<"...">` to your column type . ```ts filename="AvoidNullable.ts" // ❌ Bad: Using nullable columns interface UserEvent { id: string; userId: string; eventType: string; description?: string; // Nullable createdAt: Date; } // ✅ Good: Use default values instead interface UserEvent { id: string; userId: string; eventType: string; description: string & ClickHouseDefault<"''"> // DEFAULT '' createdAt: Date; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Use `LowCardinality` where possible `LowCardinality` is ClickHouse's most efficient string type for columns with limited unique values. ```ts filename="LowCardinality.ts" interface UserEvent { id: string; userId: string; eventType: string & LowCardinality; // ✅ Good for limited values status: "active" | "inactive" | "pending"; // ✅ Literals become LowCardinality automatically country: string & LowCardinality; // ✅ Good for country codes userAgent: string; // ❌ Keep as String for high cardinality createdAt: Date; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Pick the right Integer types Choose the smallest integer type that fits your data range to save storage and improve performance. ```ts filename="IntegerTypes.ts" interface UserEvent { id: string; userId: string; age: UInt8; // ✅ 0-255 (1 byte) score: Int16; // ✅ -32,768 to 32,767 (2 bytes) viewCount: UInt32; // ✅ 0 to ~4 billion (4 bytes) timestamp: UInt64; // ✅ Unix timestamp (8 bytes) eventType: string; createdAt: Date; } // Integer type ranges: // UInt8: 0 to 255 // UInt16: 0 to 65,535 // UInt32: 0 to 4,294,967,295 // UInt64: 0 to 18,446,744,073,709,551,615 // Int8: -128 to 127 // Int16: -32,768 to 32,767 // Int32: -2,147,483,648 to 2,147,483,647 // Int64: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 ``` ### Use the right precision for `DateTime` Choose appropriate DateTime precision based on your use case to balance storage and precision. ```ts filename="DateTimePrecision.ts" interface UserEvent { id: string; userId: string; eventType: string; createdAt: DateTime; // ✅ Second precision (default) updatedAt: DateTime64<3>; // ✅ Millisecond precision processedAt: DateTime64<6>; // ✅ Microsecond precision loggedAt: DateTime64<9>; // ✅ Nanosecond precision } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Use Decimal over Float Use `Decimal` for financial and precise calculations to avoid floating-point precision issues. ```ts filename="DecimalOverFloat.ts" interface Order { id: string; userId: string; amount: Decimal<10, 2>; // ✅ 10 total digits, 2 decimal places tax: Decimal<8, 2>; // ✅ 8 total digits, 2 decimal places discount: Decimal<5, 2>; // ✅ 5 total digits, 2 decimal places total: Decimal<12, 2>; // ✅ 12 total digits, 2 decimal places createdAt: Date; } // ❌ Bad: Using float for financial data interface BadOrder { id: string; amount: number; // Float - can cause precision issues tax: number; // Float - can cause precision issues } const ordersTable = new OlapTable("orders", { orderByFields: ["id", "createdAt"] }); ``` ### Use `NamedTuple` over `Nested` `NamedTuple` is more efficient than `Nested` for structured data in ClickHouse. ```ts filename="NamedTupleOverNested.ts" interface UserEvent { id: string; userId: string; eventType: string; location: { latitude: number; longitude: number; city: string; country: string; } & ClickHouseNamedTuple; metadata: { version: string; source: string; priority: number; } & ClickHouseNamedTuple; createdAt: Date; } // ❌ Bad: Using Nested (less efficient) interface BadUserEvent { id: string; location: { latitude: number; longitude: number; }; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ## Ordering ### Choose columns that you will use in WHERE and GROUP BY clauses Optimize your `orderByFields` (or `orderByExpression`) for your most common query patterns. ```ts filename="OrderByOptimization.ts" interface UserEvent { id: string; userId: string; eventType: string; status: string; createdAt: Date; country: string; } // ✅ Good: Optimized for common query patterns const userEventsTable = new OlapTable("user_events", { orderByFields: ["userId", "createdAt", "eventType"] // Most common filters first }); // Common queries this optimizes for: // - WHERE userId = ? AND createdAt > ? // - WHERE userId = ? AND eventType = ? // - GROUP BY userId, eventType ``` ### `ORDER BY` should prioritize LowCardinality columns first Place `LowCardinality` columns first in your `orderByFields` (or reflect this priority in your `orderByExpression`) for better compression and query performance. ```ts filename="LowCardinalityOrdering.ts" interface UserEvent { id: string; userId: string; eventType: LowCardinality; // ✅ Low cardinality status: LowCardinality; // ✅ Low cardinality country: LowCardinality; // ✅ Low cardinality createdAt: Date; // High cardinality sessionId: string; // High cardinality } // ✅ Good: LowCardinality columns first const userEventsTable = new OlapTable("user_events", { orderByFields: ["eventType", "status", "country", "createdAt", "sessionId"] }); // ❌ Bad: High cardinality columns first const badUserEventsTable = new OlapTable("user_events", { orderByFields: ["createdAt", "sessionId", "eventType", "status"] // Less efficient }); ``` --- ## Schema Versioning with Materialized Views Source: moose/olap/schema-versioning.mdx Use table versions and materialized views to migrate breaking schema changes safely # Table Versioning & Blue/Green Migrations ## Overview Changing a table's storage layout (engine or sorting key) in ClickHouse requires a full table rewrite. Doing it in-place can block or slow concurrent reads and writes due to heavy merges and metadata changes, creating real risk for production workloads. Blue/Green avoids this by creating a new versioned table and migrating data live via a materialized view, so traffic continues uninterrupted. **When to use it**: - Change the **table engine** (e.g., MergeTree → ReplacingMergeTree) - Update **ORDER BY fields** (sorting keys) to better match query patterns - Reshape **primary keys** or perform type changes that require a rewrite **How Moose does it**: 1. Define a new table with the same logical name and a bumped `version`, setting the new `orderByFields` and/or `engine` ([Table modeling](/moose/olap/model-table)). 2. Create a [Materialized view](/moose/olap/model-materialized-view) that selects from the old table and writes to the new one; Moose backfills once and keeps the view live for new inserts. 3. Later on, cut over readers/writers to the new export and clean up old resources ([Applying migrations](/moose/olap/apply-migrations)). Setting `config.version` on an `OlapTable` changes only the underlying table name (suffixes dots with underscores). Your code still refers to the logical table you exported. ## High-level workflow ## Example: change sorting key (ORDER BY) Assume the original `events` table orders by `id` only. We want to update the sorting key to optimize reads by ordering on `id, createdAt`. ### Original table (version 0.0) ```ts filename="app/tables/events.ts" copy interface EventV0 { id: string; name: string; createdAt: Date; } ); ``` ### New table (bump to version 0.1) Create a new table with the same logical name, but set `version: "0.1"` and update the ordering to `id, createdAt`. Moose will create `events_0_1` in ClickHouse. ```ts filename="app/tables/events_v01.ts" copy interface EventV1 { id: Key; name: string; createdAt: Date; } ); ``` ### Create the materialized view to migrate data Create a materialized view that: - SELECTs from the old table (`events_v0`) - copies fields 1:1 to the new table - writes into the versioned target table (`events_v1`) Pass the versioned `OlapTable` instance as `targetTable`. If you only pass a `tableName`, Moose will create an unversioned target. ```ts filename="app/views/migrate_events_to_v01.ts" copy ); ``` What happens when you export this view: - Moose creates the versioned table if needed - Moose creates the MATERIALIZED VIEW and immediately runs a one-time backfill (`INSERT INTO ... SELECT ...`) - ClickHouse keeps the view active: any new inserts into `events` automatically flow into `events_0_1` ## Cutover and cleanup - Update readers to query the new table export (`eventsV1`). - Update writers/streams to produce to the new table if applicable. - After verifying parity and retention windows, drop the old table and the migration view. ## Notes and tips - Use semantic versions like `0.1`, `1.0`, `1.1`. Moose will render `events_1_1` as the physical name. - Keep the migration view simple and deterministic. If you need complex transforms, prefer explicit SQL in the `selectStatement`. - Very large backfills can take time. Consider deploying during low-traffic windows. --- ## Supported Column Types Source: moose/olap/supported-types.mdx Complete guide to defining columns for ClickHouse tables in Moose # Supported Column Types Moose supports a comprehensive set of ClickHouse column types across both TypeScript and Python libraries. This guide covers all supported types, their syntax, and best practices for defining table schemas. ## Basic Types ### String Types ```typescript interface User { string: string; // String lowCardinality: string & LowCardinality; // LowCardinality(String) uuid: string & tags.Format<"uuid">; // UUID (with typia tags) } ``` | ClickHouse Type | TypeScript | Description | |------|------------|--------| | `String` | `string` | Variable-length string | | `LowCardinality(String)` | `string & LowCardinality` | Optimized for repeated values | | `UUID` | `string & tags.Format<"uuid">` | UUID format strings | ### Numeric Types ### Integer Types ```typescript interface Metrics { user_id: Int32; // Int32 count: UInt64; // UInt64 small_value: Int8; // Int8 } // Alternative: You can still use the verbose syntax if preferred interface MetricsVerbose { user_id: number & ClickHouseInt<"int32">; count: number & ClickHouseInt<"uint64">; small_value: number & ClickHouseInt<"int8">; } ``` | ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description | |------|------------|------------|--------| | `Int8` | `Int8` | `number & ClickHouseInt<"int8">` | -128 to 127 | | `Int16` | `Int16` | `number & ClickHouseInt<"int16">` | -32,768 to 32,767 | | `Int32` | `Int32` | `number & ClickHouseInt<"int32">` | -2,147,483,648 to 2,147,483,647 | | `Int64` | `Int64` | `number & ClickHouseInt<"int64">` | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | | `UInt8` | `UInt8` | `number & ClickHouseInt<"uint8">` | 0 to 255 | | `UInt16` | `UInt16` | `number & ClickHouseInt<"uint16">` | 0 to 65,535 | | `UInt32` | `UInt32` | `number & ClickHouseInt<"uint32">` | 0 to 4,294,967,295 | | `UInt64` | `UInt64` | `number & ClickHouseInt<"uint64">` | 0 to 18,446,744,073,709,551,615 | ### Floating Point Types ```typescript interface SensorData { temperature: Float32; // Float32 humidity: Float64; // Float64 pressure: number; // Default Float64 } // Alternative: You can still use the verbose syntax if preferred interface SensorDataVerbose { temperature: number & tags.Type<"float">; // Float32 humidity: number; // Float64 } ``` | ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description | |-----------------|------------|------------|---------------------| | `Float32` | `Float32` | `number & tags.Type<"float">` | 32-bit floating point | | `Float64` | `Float64` or `number` | `number` | 64-bit floating point (default) | ### Decimal Types ```typescript interface FinancialData { amount: Decimal<10, 2>; // Decimal(10,2) rate: Decimal<5, 4>; // Decimal(5,4) fee: Decimal<8, 3>; // Decimal(8,3) } // Alternative: You can still use the verbose syntax if preferred interface FinancialDataVerbose { amount: string & ClickHouseDecimal<10, 2>; // Decimal(10,2) rate: string & ClickHouseDecimal<5, 4>; // Decimal(5,4) } ``` | ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description | |------|------------|------------|---------------------| | `Decimal(P,S)` | `Decimal` | `string & ClickHouseDecimal` | Fixed-point decimal with P total digits, S decimal places | ### Boolean Type ```typescript interface User { is_active: boolean; verified: boolean; } ``` | ClickHouse Type | TypeScript | Description | |------|------------|--------| | `Boolean` | `boolean` | `boolean` | ### Date and Time Types ```typescript interface Event { created_at: DateTime; // DateTime updated_at: DateTime64<3>; // DateTime(3) logged_at: DateTime64<6>; // DateTime(6) - microsecond precision birth_date: Date; // Date } // Alternative: You can still use the verbose syntax if preferred interface EventVerbose { created_at: Date; // DateTime updated_at: Date & ClickHousePrecision<3>; // DateTime(3) birth_date: Date; // Date } ``` | ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description | |-----------------|------------|------------|-------------| | `Date` | `Date` | `Date` | Date only | | `Date16` | `Date` | `Date` | Compact date format | | `DateTime` | `DateTime` | `Date` | Date and time | | `DateTime(P)` | `DateTime64` | `Date & ClickHousePrecision` | DateTime with precision (P=0-9) | ### Network Types ```typescript interface NetworkEvent { source_ip: string & tags.Format<"ipv4">; dest_ip: string & tags.Format<"ipv6">; } ``` | ClickHouse Type | TypeScript | Description | |------|------------|--------| | `IPv4` | `string & tags.Format<"ipv4">` | IPv4 addresses | | `IPv6` | `string & tags.Format<"ipv6">` | IPv6 addresses | ## Complex Types ### Geometry Types Moose supports ClickHouse geometry types. Use the helpers in each language to get type-safe models and correct ClickHouse mappings. ```typescript ClickHousePoint, ClickHouseRing, ClickHouseLineString, ClickHouseMultiLineString, ClickHousePolygon, ClickHouseMultiPolygon, } from "@514labs/moose-lib"; interface GeoTypes { point: ClickHousePoint; // Point → [number, number] ring: ClickHouseRing; // Ring → Array<[number, number]> lineString: ClickHouseLineString; // LineString → Array<[number, number]> multiLineString: ClickHouseMultiLineString; // MultiLineString → Array> polygon: ClickHousePolygon; // Polygon → Array> multiPolygon: ClickHouseMultiPolygon; // MultiPolygon → Array>> } ``` | ClickHouse Type | TypeScript | |------|------------| | `Point` | `ClickHousePoint` | | `Ring` | `ClickHouseRing` | | `LineString` | `ClickHouseLineString` | | `MultiLineString` | `ClickHouseMultiLineString` | | `Polygon` | `ClickHousePolygon` | | `MultiPolygon` | `ClickHouseMultiPolygon` | Geometry coordinates are represented as numeric pairs `[x, y]` (TypeScript) or `tuple[float, float]` (Python). ### Array Types Arrays are supported for all basic types and some complex types. ```typescript interface User { tags: string[]; // Array(String) scores: number[]; // Array(Float64) metadata: Record[]; // Array(Json) tuple: { name: string; age: number; } & ClickHouseNamedTuple[]; // Array(Tuple(String, Int32)) } ``` ### Map Types Maps store key-value pairs with specified key and value types. ```typescript interface User { preferences: Record; // Map(String, String) metrics: Record; // Map(String, Float64) } ``` ### Nested Types Nested types allow embedding complex objects within tables. ```typescript interface Address { street: string; city: string; zip: string; } interface User { name: string; address: Address; // Nested type } ``` ### Named Tuple Types Named tuples provide structured data with named fields. ```typescript interface Point { x: number; y: number; } interface Shape { center: Point & ClickHouseNamedTuple; // Named tuple radius: number; } ``` ### Enum Types Enums map to ClickHouse enums with string or integer values. ```typescript enum UserRole { ADMIN = "admin", USER = "user", GUEST = "guest" } interface User { role: UserRole; // Enum with string values } ``` ## Special Types ### JSON Type The `Json` type stores arbitrary JSON data with optional schema configuration for performance and type safety. #### Basic JSON (Unstructured) For completely dynamic JSON data without any schema: ```typescript interface Event { metadata: Record; // Basic JSON - accepts any structure config: any; // Basic JSON - fully dynamic } ``` #### Rich JSON with Type Configuration For better performance and validation, you can define typed fields within your JSON using `ClickHouseJson`. This creates a ClickHouse `JSON` column with explicit type hints for specific paths. ```typescript // Define the structure for your JSON payload interface PayloadStructure { name: string; count: number; timestamp?: Date; } interface Event { id: string; // JSON with typed paths - better performance, allows extra fields payload: PayloadStructure & ClickHouseJson; // JSON with performance tuning options metadata: PayloadStructure & ClickHouseJson< 256, // max_dynamic_paths: limit tracked paths (default: no limit) 16, // max_dynamic_types: limit type variations (default: no limit) ["skip.me"], // skip_paths: exclude specific paths ["^tmp\\."] // skip_regexps: exclude paths matching regex >; } ``` #### Configuration Options | Option | Type | Description | |--------|------|-------------| | `max_dynamic_paths` | `number` | Maximum number of unique JSON paths to track. Helps control memory usage for highly variable JSON structures. | | `max_dynamic_types` | `number` | Maximum number of type variations allowed per path. Useful when paths may contain different types. | | `skip_paths` | `string[]` | Array of exact JSON paths to ignore during ingestion (e.g., `["temp", "debug.info"]`). | | `skip_regexps` | `string[]` | Array of regex patterns for paths to exclude (e.g., `["^tmp\\.", ".*_internal$"]`). | #### Benefits of Typed JSON 1. **Better Performance**: ClickHouse can optimize storage and queries for known paths 2. **Type Safety**: Validates that specified paths match expected types 3. **Flexible Schema**: Allows additional fields beyond typed paths 4. **Memory Control**: Configure limits to prevent unbounded resource usage - **Basic JSON** (`any`, `Dict[str, Any]`): Use when JSON structure is completely unknown or rarely queried - **Rich JSON** (`ClickHouseJson`): Use when you have known fields that need indexing/querying, but want to allow additional dynamic fields #### Example: Product Event Tracking ```typescript interface ProductProperties { category: string; price: number; inStock: boolean; } interface ProductEvent { eventId: Key; timestamp: DateTime; // Typed paths for common fields, but allows custom properties properties: ProductProperties & ClickHouseJson< 128, // Track up to 128 unique paths 8, // Allow up to 8 type variations per path ["_internal"], // Ignore internal fields ["^debug_"] // Ignore debug fields >; } ``` With this schema, you can send events like: ```json { "eventId": "evt_123", "timestamp": "2025-10-22T12:00:00Z", "properties": { "category": "electronics", // Typed field ✓ "price": 99.99, // Typed field ✓ "inStock": true, // Typed field ✓ "customTag": "holiday-sale", // Extra field - accepted ✓ "brandId": 42, // Extra field - accepted ✓ "_internal": "ignored" // Skipped by skip_paths ✓ } } ``` ### Nullable Types All types support nullable variants using optional types. ```typescript interface User { name: string; // Required email?: string; // Nullable age?: number; // Nullable } ``` If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause. - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type. - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault`) → non-nullable column with default `18`. This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists. ### SimpleAggregateFunction `SimpleAggregateFunction` is designed for use with `AggregatingMergeTree` tables. It stores pre-aggregated values that are automatically merged when ClickHouse combines rows with the same primary key. ```typescript interface DailyStats { date: DateTime; userId: string; totalViews: number & SimpleAggregated<"sum", number>; maxScore: number & SimpleAggregated<"max", number>; lastSeen: DateTime & SimpleAggregated<"anyLast", DateTime>; } const statsTable = new OlapTable("daily_stats", { engine: ClickHouseEngines.AggregatingMergeTree, orderByFields: ["date", "userId"], }); ``` See [ClickHouse docs](https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction) for the complete list of functions. ## Table Engines Moose supports all common ClickHouse table engines: | Engine | Python | Description | |--------|------------|-------------| | `MergeTree` | `ClickHouseEngines.MergeTree` | Default engine | | `ReplacingMergeTree` | `ClickHouseEngines.ReplacingMergeTree` | Deduplication | | `SummingMergeTree` | `ClickHouseEngines.SummingMergeTree` | Aggregates numeric columns | | `AggregatingMergeTree` | `ClickHouseEngines.AggregatingMergeTree` | Advanced aggregation | | `ReplicatedMergeTree` | `ClickHouseEngines.ReplicatedMergeTree` | Replicated version of MergeTree | | `ReplicatedReplacingMergeTree` | `ClickHouseEngines.ReplicatedReplacingMergeTree` | Replicated with deduplication | | `ReplicatedSummingMergeTree` | `ClickHouseEngines.ReplicatedSummingMergeTree` | Replicated with aggregation | | `ReplicatedAggregatingMergeTree` | `ClickHouseEngines.ReplicatedAggregatingMergeTree` | Replicated with advanced aggregation | ```typescript const userTable = new OlapTable("users", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id", "updated_at"] }); ``` ## Best Practices ### Type Selection - **Use specific integer types** when you know the value ranges to save storage - **Prefer `Float64`** for most floating-point calculations unless storage is critical - **Use `LowCardinality`** for string columns with repeated values - **Choose appropriate DateTime precision** based on your accuracy needs ### Performance Considerations - **Order columns by cardinality** (low to high) for better compression - **Use `ReplacingMergeTree`** for tables with frequent updates - **Specify `orderByFields` or `orderByExpression`** for optimal query performance - **Consider `LowCardinality`** for string columns with < 10,000 unique values --- ## ttl Source: moose/olap/ttl.mdx ## TTL (Time-to-Live) for ClickHouse Tables Moose lets you declare ClickHouse TTL directly in your data model: - Table-level TTL via the `ttl` option on `OlapTable` config - Column-level TTL via `ClickHouseTTL` on individual fields ### When to use TTL - Automatically expire old rows to control storage cost - Mask or drop sensitive columns earlier than the full row expiry ### TypeScript ```ts interface Event { id: Key; timestamp: DateTime; email: string & ClickHouseTTL<"timestamp + INTERVAL 30 DAY">; // column TTL } ); ``` ### Python ```python from typing import Annotated from moose_lib import OlapTable, OlapConfig, Key, ClickHouseTTL from pydantic import BaseModel from datetime import datetime class Event(BaseModel): id: Key[str] timestamp: datetime email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY")] events = OlapTable[Event]( "Events", OlapConfig( order_by_fields=["id", "timestamp"], ttl="timestamp + INTERVAL 90 DAY DELETE", ), ) ``` ### Notes - Expressions must be valid ClickHouse TTL expressions, but do not include the leading `TTL` keyword. - Column TTLs are independent from the table TTL and can be used together. - Moose will apply TTL changes via migrations using `ALTER TABLE ... MODIFY TTL` and `MODIFY COLUMN ... TTL`. ### Related - See `Modeling Tables` for defining your schema - See `Applying Migrations` to roll out TTL changes