# Moose / Olap / Model Table Documentation – TypeScript ## Included Files 1. moose/olap/model-table/model-table.mdx ## Modeling Tables Source: moose/olap/model-table/model-table.mdx Model your database schema in code using native TypeScript/Python typing # Modeling Tables ## Overview Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing. You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more. ```ts filename="FirstTable.ts" copy interface MyFirstTable { id: Key; name: string; age: number; } // Create a table named "first_table" ); // For deduplication, use the ReplacingMergeTree factory ); // Now you can: // - Write to this table from streams // - Query it directly // - Use it as a source for materialized views ``` ### Creating Tables in Ingestion Pipelines For end-to-end data flows, create tables as part of an ingestion pipeline: ```ts filename="PipelineTable.ts" // Define your schema interface UserEvent { id: Key; userId: string; timestamp: Date; eventType: string; } // Create a complete ingestion pipeline with a table const eventsPipeline = new IngestPipeline("user_events", { ingestApi: true, // Creates a REST API endpoint at POST localhost:4000/ingest/user_events stream: true, // Creates Kafka/Redpanda topic table: { // Creates and configures the table named "user_events" orderByFields: ["id", "timestamp"] } }); // Access the table component when needed const eventsTable = eventsPipeline.table; ``` ## Data Modeling ### Special ClickHouse Types (LowCardinality, Nullable, etc) ```ts filename="ClickHouseTypes.ts" copy export interface ClickHouseOptimizedExample { id: Key; stringField: string; numberField: number; decimalField: Decimal<10, 2>; // Precise decimal storage // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works lowCardinalityField: string & LowCardinality; // Faster queries for enum-like data nestedObject: { innerString: string; innerNumber: number; }; namedTupleField: { name: string; value: number; } & ClickHouseNamedTuple; // Optimized nested storage numberArray: number[]; mapField: Record; literalField: "optionA" | "optionB"; optionalField?: string; // Nullable field dateField: Date; } ``` ### Default values Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL. ```ts filename="Defaults.ts" copy interface Event { id: Key; // Static defaults (ClickHouse expression as a string literal) status: string & ClickHouseDefault<"'pending'">; // DEFAULT 'pending' retries: number & ClickHouseDefault<"0">; // DEFAULT 0 // Server-side timestamps createdAt: Date & ClickHouseDefault<"now()">; // DEFAULT now() // Decimal with default amount: Decimal<10, 2> & ClickHouseDefault<"0">; // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax } ); ``` The value passed into the `ClickHouseDefault<"">` tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on `Date` fields with `ClickHouseDefault`, use `WithDefault` as a fallback workaround. If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause. - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type. - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault`) → non-nullable column with default `18`. This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists. ### Database Selection By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field: ```ts filename="DatabaseOverride.ts" copy interface UserData { id: Key; name: string; email: string; } // Table in default database (from moose.config.toml) const defaultTable = new OlapTable("users"); // Table in specific database (e.g., "analytics") const analyticsTable = new OlapTable("users", { database: "analytics", orderByFields: ["id"] }); ``` To use custom databases, configure them in your `moose.config.toml`: ```toml [clickhouse_config] db_name = "local" additional_databases = ["analytics", "staging"] ``` The databases in `additional_databases` will be created automatically when you start your Moose application. ### Primary Keys and Sorting You must configure table indexing using one of these approaches: 1. Define at least one `Key` in your table schema 2. Specify `orderByFields` in the table config 3. Use both (all `Key` fields must come first in the `orderByFields` array) ```ts filename="PrimaryKeyConfig.ts" copy // Approach 1: Using primary key only interface Record1 { id: Key; // Primary key field field1: string; field2: number; } const table1 = new OlapTable("table1"); // id is the primary key ``` ### Order By Fields Only ```ts filename="OrderByFieldsOnly.ts" copy // Approach 2: Using orderByFields only interface SchemaWithoutPrimaryKey { field1: string; field2: number; field3: Date; } const tableWithOrderByFieldsOnly = new OlapTable("table2", { orderByFields: ["field1", "field2"] // Specify ordering without primary key }); ``` ### Order By Expression Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely. ```ts filename="OrderByExpression.ts" copy // Use a ClickHouse expression for ORDER BY interface Events { userId: string; createdAt: Date; eventType: string; } const tableWithOrderByExpression = new OlapTable("events", { // Equivalent to orderByFields: ["userId", "createdAt", "eventType"] orderByExpression: "(userId, createdAt, eventType)", }); // Advanced: functions inside expression const tableWithMonthBucketing = new OlapTable("events_by_month", { orderByExpression: "(userId, toYYYYMM(createdAt))", }); // No sorting (ClickHouse tuple() means empty ORDER BY) const unsortedTable = new OlapTable("events_unsorted", { orderByExpression: "tuple()", }); ``` ### Using Both Primary Key and Order By Fields ```ts filename="ComboKeyAndOrderByFields.ts" copy // Approach 3: Using both (primary key must be first) interface SchemaWithKey { id: Key; // Primary key field field1: string; field2: number; } const tableWithKeyAndOrderByFields = new OlapTable("table3", { orderByFields: ["id", "field1"] // Primary key must be first }); ``` ### Using Multiple Primary Keys ```ts filename="MultiKeyTable.ts" copy interface MultiKeyRecord { key1: Key; key2: Key; field1: string; } const multiKeyTable = new OlapTable("multi_key_table", { orderByFields: ["key1", "key2", "field1"] // Multiple keys must come first }); ``` ### Table engines By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration. ```ts filename="TableEngine.ts" copy // Default MergeTree engine const table = new OlapTable("table", { orderByFields: ["id"] }); // Use engine configuration for other engines const dedupTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "version", // Optional: keeps row with highest version isDeleted: "deleted" // Optional: soft delete when deleted=1 }); ``` #### Deduplication (`ReplacingMergeTree`) Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key: ```ts filename="DeduplicatedTable.ts" copy // Basic deduplication const table = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"] }); // With version column (keeps record with highest version) const versionedTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at" // Column that determines which version to keep }); // With soft deletes (requires ver parameter) const softDeleteTable = new OlapTable("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at", isDeleted: "deleted" // UInt8 column: 1 marks row for deletion }); ``` ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately. **Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key. **Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type. For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree). #### Streaming from S3 (`S3Queue`) Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added: S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details. ```ts filename="S3StreamingTable.ts" copy // Use direct configuration (S3Queue does not support orderByFields) ); ``` S3Queue is a streaming engine and does not support `orderByFields` or ORDER BY clauses. Configure only engine-specific parameters like `s3Path`, `format`, and `settings`. **Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities. **Solution**: Use `mooseRuntimeEnv` for runtime credential resolution: ```ts filename="SecureS3Streaming.ts" copy // ✅ RECOMMENDED: Runtime environment variable resolution ); ``` **Then set environment variables:** ```bash filename="Terminal" copy export AWS_ACCESS_KEY_ID="AKIA..." export AWS_SECRET_ACCESS_KEY="your-secret-key" moose prod up ``` **Benefits:** - Credentials never embedded in Docker images - Supports credential rotation (changing passwords triggers table recreation) - Different credentials per environment (dev/staging/prod) - Clear error messages if environment variables are missing S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas. #### Direct S3 Access (`S3`) Use the `S3` engine for direct read/write access to S3 storage without streaming semantics: ```ts filename="S3Table.ts" copy // S3 table with credentials (recommended with mooseRuntimeEnv) ); // Public S3 bucket (no authentication) ); ``` - **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data. - **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion. Both engines support the same credential management and format options. #### In-Memory Buffer (`Buffer`) The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds: ```ts filename="BufferTable.ts" copy // First create the destination table ); // Then create buffer that writes to it ); ``` - Data in buffer is **lost if server crashes** before flush - Not suitable for critical data that must be durable - Best for high-throughput scenarios where minor data loss is acceptable - Buffer and destination table must have identical schemas - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer). #### Distributed Tables (`Distributed`) The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling: ```ts filename="DistributedTable.ts" copy // Distributed table across cluster ); ``` - Requires a configured ClickHouse cluster with remote_servers configuration - The local table must exist on all cluster nodes - Distributed tables are virtual - data is stored in local tables - Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables - The `cluster` name must match a cluster defined in your ClickHouse configuration For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). #### Replicated Engines Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants: - `ReplicatedMergeTree` - Replicated version of MergeTree - `ReplicatedReplacingMergeTree` - Replicated with deduplication - `ReplicatedAggregatingMergeTree` - Replicated with aggregation - `ReplicatedSummingMergeTree` - Replicated with summation ```ts filename="ReplicatedEngines.ts" copy // Basic replicated table with explicit paths const replicatedTable = new OlapTable("records", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/records", replicaName: "{replica}", orderByFields: ["id"] }); // Replicated with deduplication const replicatedDedup = new OlapTable("dedup_records", { engine: ClickHouseEngines.ReplicatedReplacingMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records", replicaName: "{replica}", ver: "updated_at", isDeleted: "deleted", orderByFields: ["id"] }); // For ClickHouse Cloud or Boreal (no parameters needed) const cloudReplicated = new OlapTable("cloud_records", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"] }); ``` The `keeper_path` and `replica_name` parameters are **optional** for replicated engines: - **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse). - **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster. **Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime. For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication). ## Externally Managed Tables If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`. ```ts filename="ExternallyManagedTable.ts" copy // Table managed by external system const externalTable = new OlapTable("external_users", { orderByFields: ["id", "timestamp"], lifeCycle: LifeCycle.EXTERNALLY_MANAGED // Moose won't create or modify this table }); ``` Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation. ## Invalid Configurations ```ts filename="InvalidConfig.ts" copy // Error: No primary key or orderByFields interface BadRecord1 { field1: string; field2: number; } const badTable1 = new OlapTable("bad_table1"); // Error: Primary key not first in orderByFields interface BadRecord2 { id: Key; field1: string; } const badTable2 = new OlapTable("bad_table2", { orderByFields: ["field1", "id"] // Wrong order - primary key must be first }); // Error: Nullable field in orderByFields interface BadRecord3 { id: Key; field1: string; field2?: number; } const badTable3 = new OlapTable("bad_table3", { orderByFields: ["id", "field2"] // Can't have nullable field in orderByFields }); ``` ## Development Workflow ### Local Development with Hot Reloading One of the powerful features of Moose is its integration with the local development server: 1. Start your local development server with `moose dev` 2. When you define or modify an `OlapTable` in your code and save the file: - The changes are automatically detected - The TypeScript compiler plugin processes your schema definitions - The infrastructure is updated in real-time to match your code changes - Your tables are immediately available for testing For example, if you add a new field to your schema: ```ts filename="HotReloading.ts" copy // Before interface BasicSchema { id: Key; name: string; } // After adding a field interface BasicSchema { id: Key; name: string; createdAt: Date; // New field } ``` The Moose framework will: 1. Detect the change when you save the file 2. Update the table schema in the local ClickHouse instance 3. Make the new field immediately available for use ### Verifying Your Tables You can verify your tables were created correctly using: ```bash filename="Terminal" copy # List all tables in your local environment moose ls ``` #### Connecting to your local ClickHouse instance You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file: ```toml filename="moose.config.toml" copy [clickhouse_config] db_name = "local" user = "panda" password = "pandapass" use_ssl = false host = "localhost" host_port = 18123 native_port = 9000 ```