# Moose / Olap / Model Table Documentation – TypeScript ## Included Files 1. moose/olap/model-table/model-table.mdx ## Modeling Tables Source: moose/olap/model-table/model-table.mdx Model your database schema in code using native TypeScript/Python typing # Modeling Tables ## Overview Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing. You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more. ```ts filename="FirstTable.ts" copy interface MyFirstTable { id: Key; name: string; age: number; } // Create a table named "first_table" ); // For deduplication, use the ReplacingMergeTree factory ); // Now you can: // - Write to this table from streams // - Query it directly // - Use it as a source for materialized views ``` ### Creating Tables in Ingestion Pipelines For end-to-end data flows, create tables as part of an ingestion pipeline: ```ts filename="PipelineTable.ts" // Define your schema interface UserEvent { id: Key; userId: string; timestamp: Date; eventType: string; } // Create a complete ingestion pipeline with a table const eventsPipeline = new IngestPipeline("user_events", { ingestApi: true, // Creates a REST API endpoint at POST localhost:4000/ingest/user_events stream: true, // Creates Kafka/Redpanda topic table: { // Creates and configures the table named "user_events" orderByFields: ["id", "timestamp"] } }); // Access the table component when needed const eventsTable = eventsPipeline.table; ``` ## Data Modeling ### Special ClickHouse Types (LowCardinality, Nullable, etc) ```ts filename="ClickHouseTypes.ts" copy export interface ClickHouseOptimizedExample { id: Key; stringField: string; numberField: number; decimalField: Decimal<10, 2>; // Precise decimal storage // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works lowCardinalityField: string & LowCardinality; // Faster queries for enum-like data nestedObject: { innerString: string; innerNumber: number; }; namedTupleField: { name: string; value: number; } & ClickHouseNamedTuple; // Optimized nested storage numberArray: number[]; mapField: Record; literalField: "optionA" | "optionB"; optionalField?: string; // Nullable field dateField: Date; } ``` ### Default values Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL. ```ts filename="Defaults.ts" copy interface Event { id: Key; // Static defaults (ClickHouse expression as a string literal) status: string & ClickHouseDefault<"'pending'">; // DEFAULT 'pending' retries: number & ClickHouseDefault<"0">; // DEFAULT 0 // Server-side timestamps createdAt: Date & ClickHouseDefault<"now()">; // DEFAULT now() // Decimal with default amount: Decimal<10, 2> & ClickHouseDefault<"0">; // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax } ); ``` The value passed into the `ClickHouseDefault<"">` tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on `Date` fields with `ClickHouseDefault`, use `WithDefault` as a fallback workaround. If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause. - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type. - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault`) → non-nullable column with default `18`. This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists. ### Database Selection By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field: ```ts filename="DatabaseOverride.ts" copy interface UserData { id: Key; name: string; email: string; } // Table in default database (from moose.config.toml) const defaultTable = new OlapTable("users"); // Table in specific database (e.g., "analytics") const analyticsTable = new OlapTable("users", { database: "analytics", orderByFields: ["id"] }); ``` To use custom databases, configure them in your `moose.config.toml`: ```toml [clickhouse_config] db_name = "local" additional_databases = ["analytics", "staging"] ``` The databases in `additional_databases` will be created automatically when you start your Moose application. ### Primary Keys and Sorting You must configure table indexing using one of these approaches: 1. Define at least one `Key` in your table schema 2. Specify `orderByFields` in the table config 3. Use both (all `Key` fields must come first in the `orderByFields` array) ```ts filename="PrimaryKeyConfig.ts" copy // Approach 1: Using primary key only interface Record1 { id: Key; // Primary key field field1: string; field2: number; } const table1 = new OlapTable("table1"); // id is the primary key ``` ### Order By Fields Only ```ts filename="OrderByFieldsOnly.ts" copy // Approach 2: Using orderByFields only interface SchemaWithoutPrimaryKey { field1: string; field2: number; field3: Date; } const tableWithOrderByFieldsOnly = new OlapTable("table2", { orderByFields: ["field1", "field2"] // Specify ordering without primary key }); ``` ### Order By Expression Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely. ```ts filename="OrderByExpression.ts" copy // Use a ClickHouse expression for ORDER BY interface Events { userId: string; createdAt: Date; eventType: string; } const tableWithOrderByExpression = new OlapTable("events", { // Equivalent to orderByFields: ["userId", "createdAt", "eventType"] orderByExpression: "(userId, createdAt, eventType)", }); // Advanced: functions inside expression const tableWithMonthBucketing = new OlapTable("events_by_month", { orderByExpression: "(userId, toYYYYMM(createdAt))", }); // No sorting (ClickHouse tuple() means empty ORDER BY) const unsortedTable = new OlapTable("events_unsorted", { orderByExpression: "tuple()", }); ``` ### Using Both Primary Key and Order By Fields ```ts filename="ComboKeyAndOrderByFields.ts" copy // Approach 3: Using both (primary key must be first) interface SchemaWithKey { id: Key; // Primary key field field1: string; field2: number; } const tableWithKeyAndOrderByFields = new OlapTable("table3", { orderByFields: ["id", "field1"] // Primary key must be first }); ``` ### Using Multiple Primary Keys ```ts filename="MultiKeyTable.ts" copy interface MultiKeyRecord { key1: Key; key2: Key; field1: string; } const multiKeyTable = new OlapTable("multi_key_table", { orderByFields: ["key1", "key2", "field1"] // Multiple keys must come first }); ``` ### Table engines By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration. ```ts filename="TableEngine.ts" copy // Default MergeTree engine const table = new OlapTable("table", { orderByFields: ["id"] }); // Use engine configuration for other engines const dedupTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "version", // Optional: keeps row with highest version isDeleted: "deleted" // Optional: soft delete when deleted=1 }); ``` #### Deduplication (`ReplacingMergeTree`) Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key: ```ts filename="DeduplicatedTable.ts" copy // Basic deduplication const table = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"] }); // With version column (keeps record with highest version) const versionedTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at" // Column that determines which version to keep }); // With soft deletes (requires ver parameter) const softDeleteTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at", isDeleted: "deleted" // UInt8 column: 1 marks row for deletion }); ``` ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately. **Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key. **Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type. For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree). #### Collapsing State Changes (`CollapsingMergeTree`) Use the `CollapsingMergeTree` engine to efficiently track object state changes by collapsing insert/cancel row pairs: ```ts filename="CollapsingTable.ts" copy interface UserActivity { userId: Key; pageViews: number; duration: number; sign: Int8; // Required: 1 = state row, -1 = cancel row } // Track user activity with state collapsing const activityTable = new OlapTable("user_activity", { engine: ClickHouseEngines.CollapsingMergeTree, sign: "sign", orderByFields: ["userId"] }); ``` To update a record, insert two rows: 1. A cancel row (`sign = -1`) with the old values 2. A state row (`sign = 1`) with the new values During background merges, ClickHouse collapses matching pairs, keeping only the latest state. **Important**: Use aggregation in queries to get accurate results: ```sql SELECT userId, sum(pageViews * sign) as pageViews FROM user_activity GROUP BY userId HAVING sum(sign) > 0 ``` #### Versioned Collapsing (`VersionedCollapsingMergeTree`) Use `VersionedCollapsingMergeTree` when you need collapsing with out-of-order inserts. The version column ensures correct collapse ordering regardless of insertion order: ```ts filename="VersionedCollapsingTable.ts" copy interface UserState { userId: Key; pageViews: number; duration: number; sign: Int8; // Required: 1 = state row, -1 = cancel row version: number; // Required: version for ordering state changes } // Track user state with versioned collapsing const stateTable = new OlapTable("user_state", { engine: ClickHouseEngines.VersionedCollapsingMergeTree, sign: "sign", ver: "version", orderByFields: ["userId"] }); ``` **CollapsingMergeTree**: Requires strictly consecutive insertion order. Use when you control insertion order (e.g., single writer). **VersionedCollapsingMergeTree**: Uses a version column to handle out-of-order inserts. Use when multiple threads/sources insert data or order isn't guaranteed. Both engines require the `sign` column to be `Int8` type. For more details, see the ClickHouse documentation on [CollapsingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/collapsingmergetree) and [VersionedCollapsingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/versionedcollapsingmergetree). #### Streaming from S3 (`S3Queue`) Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added: S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details. ```ts filename="S3StreamingTable.ts" copy // Use direct configuration (S3Queue does not support orderByFields) ); ``` S3Queue is a streaming engine and does not support `orderByFields` or ORDER BY clauses. Configure only engine-specific parameters like `s3Path`, `format`, and `settings`. **Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities. **Solution**: Use `mooseRuntimeEnv` for runtime credential resolution: ```ts filename="SecureS3Streaming.ts" copy // ✅ RECOMMENDED: Runtime environment variable resolution ); ``` **Then set environment variables:** ```bash filename="Terminal" copy export AWS_ACCESS_KEY_ID="AKIA..." export AWS_SECRET_ACCESS_KEY="your-secret-key" moose prod up ``` **Benefits:** - Credentials never embedded in Docker images - Supports credential rotation (changing passwords triggers table recreation) - Different credentials per environment (dev/staging/prod) - Clear error messages if environment variables are missing S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas. #### Direct S3 Access (`S3`) Use the `S3` engine for direct read/write access to S3 storage without streaming semantics: ```ts filename="S3Table.ts" copy // S3 table with credentials (recommended with mooseRuntimeEnv) ); // Public S3 bucket (no authentication) ); ``` - **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data. - **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion. Both engines support the same credential management and format options. #### Iceberg The `IcebergS3` engine provides read-only access to Iceberg tables stored in S3: ```ts filename="IcebergTable.ts" copy // Iceberg table with AWS credentials (recommended with mooseRuntimeEnv) ); ``` - IcebergS3 tables are **read-only** and provide access to the latest state of your Iceberg table - `orderByFields`, `orderByExpression`, `partitionBy`, and `sampleByExpression` are not supported - The table automatically reflects the current state of the Iceberg table in S3 - Supported formats: **Parquet** and **ORC** only #### In-Memory Buffer (`Buffer`) The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds: ```ts filename="BufferTable.ts" copy // First create the destination table ); // Then create buffer that writes to it ); ``` - Data in buffer is **lost if server crashes** before flush - Not suitable for critical data that must be durable - Best for high-throughput scenarios where minor data loss is acceptable - Buffer and destination table must have identical schemas - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer). #### Distributed Tables (`Distributed`) The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling: ```ts filename="DistributedTable.ts" copy // Distributed table across cluster ); ``` - Requires a configured ClickHouse cluster with remote_servers configuration - The local table must exist on all cluster nodes - Distributed tables are virtual - data is stored in local tables - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables - The `cluster` name must match a cluster defined in your ClickHouse configuration For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). #### Replicated Engines Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants: - `ReplicatedMergeTree` - Replicated version of MergeTree - `ReplicatedReplacingMergeTree` - Replicated with deduplication - `ReplicatedAggregatingMergeTree` - Replicated with aggregation - `ReplicatedSummingMergeTree` - Replicated with summation - `ReplicatedCollapsingMergeTree` - Replicated with state collapsing - `ReplicatedVersionedCollapsingMergeTree` - Replicated with versioned collapsing ```ts filename="ReplicatedEngines.ts" copy // Basic replicated table with explicit paths const replicatedTable = new OlapTable("records", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/records", replicaName: "{replica}", orderByFields: ["id"] }); // Replicated with deduplication const replicatedDedup = new OlapTable("dedup_records", { engine: ClickHouseEngines.ReplicatedReplacingMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records", replicaName: "{replica}", ver: "updated_at", isDeleted: "deleted", orderByFields: ["id"] }); // For ClickHouse Cloud or Boreal (no parameters needed) const cloudReplicated = new OlapTable("cloud_records", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"] }); ``` ##### Configuring Replication Replicated engines support three configuration approaches. Choose the one that fits your deployment: ###### Default Omit all replication parameters. Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments: ```ts filename="DefaultReplication.ts" copy const table = new OlapTable("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"] // No keeper_path, replica_name, or cluster needed }); ``` Moose auto-injects: `/clickhouse/tables/{database}/{shard}/{table_name}` and `{replica}` in local development. ClickHouse Cloud uses its own patterns automatically. ###### Cluster For multi-node deployments, specify a cluster name to use `ON CLUSTER` DDL operations: ```ts filename="ClusterReplication.ts" copy const table = new OlapTable("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"], cluster: "default" // References cluster from moose.config.toml }); ``` **Configuration in `moose.config.toml`:** ```toml [[clickhouse_config.clusters]] name = "default" ``` **Use when:** - Running multi-node self-managed ClickHouse with cluster configuration - Need `ON CLUSTER` DDL for distributed operations ###### Replication Paths For custom replication topology, specify both `keeper_path` and `replica_name`: ```ts filename="ExplicitReplication.ts" copy const table = new OlapTable("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/my_table", replicaName: "{replica}", orderByFields: ["id"] }); ``` **Use when:** - Need custom replication paths for advanced configurations - Both parameters must be provided together **Cannot mix approaches:** Specifying both `cluster` and explicit `keeper_path`/`replica_name` will cause an error. Choose one approach. **Cluster is a deployment directive:** Changing `cluster` won't recreate your table -— it only affects future DDL operations. For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication). #### Kafka Use the `Kafka` engine to consume data directly from Kafka compatible topics. For more details on the Kafka engine, see the [ClickHouse documentation on Kafka integration](https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka). Kafka tables are streaming interfaces that don't persist data. Use a [MaterializedView](/moose/olap/model-materialized-view) to continuously move data to a table. ```ts filename="KafkaTable.ts" copy interface KafkaEvent { eventId: string; userId: string; timestamp: number; // Unix seconds for JSONEachRow } // Kafka table - reads from topic, doesn't persist ); // MaterializedView moves data to MergeTree for persistence const cols = kafkaSource.columns; ); ``` - **No ORDER BY**: Kafka is a streaming engine and doesn't support `orderByFields` - **No ALTER TABLE**: Column or settings changes require DROP and CREATE (Moose handles this automatically) - **No direct SELECT**: Query data from the MaterializedView destination table, not the Kafka table ## Externally Managed Tables If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`. ```ts filename="ExternallyManagedTable.ts" copy // Table managed by external system const externalTable = new OlapTable("external_users", { orderByFields: ["id", "timestamp"], lifeCycle: LifeCycle.EXTERNALLY_MANAGED // Moose won't create or modify this table }); ``` Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation. ## Invalid Configurations ```ts filename="InvalidConfig.ts" copy // Error: No primary key or orderByFields interface BadRecord1 { field1: string; field2: number; } const badTable1 = new OlapTable("bad_table1"); // Error: Primary key not first in orderByFields interface BadRecord2 { id: Key; field1: string; } const badTable2 = new OlapTable("bad_table2", { orderByFields: ["field1", "id"] // Wrong order - primary key must be first }); // Error: Nullable field in orderByFields interface BadRecord3 { id: Key; field1: string; field2?: number; } const badTable3 = new OlapTable("bad_table3", { orderByFields: ["id", "field2"] // Can't have nullable field in orderByFields }); ``` ## Development Workflow ### Local Development with Hot Reloading One of the powerful features of Moose is its integration with the local development server: 1. Start your local development server with `moose dev` 2. When you define or modify an `OlapTable` in your code and save the file: - The changes are automatically detected - The TypeScript compiler plugin processes your schema definitions - The infrastructure is updated in real-time to match your code changes - Your tables are immediately available for testing For example, if you add a new field to your schema: ```ts filename="HotReloading.ts" copy // Before interface BasicSchema { id: Key; name: string; } // After adding a field interface BasicSchema { id: Key; name: string; createdAt: Date; // New field } ``` The Moose framework will: 1. Detect the change when you save the file 2. Update the table schema in the local ClickHouse instance 3. Make the new field immediately available for use ### Verifying Your Tables You can verify your tables were created correctly using: ```bash filename="Terminal" copy # List all tables in your local environment moose ls ``` #### Connecting to your local ClickHouse instance You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file: ```toml filename="moose.config.toml" copy [clickhouse_config] db_name = "local" user = "panda" password = "pandapass" use_ssl = false host = "localhost" host_port = 18123 native_port = 9000 ```