# Moose / Olap Documentation – TypeScript

## Included Files
1. moose/olap/apply-migrations.mdx
2. moose/olap/db-pull.mdx
3. moose/olap/external-tables.mdx
4. moose/olap/indexes.mdx
5. moose/olap/insert-data.mdx
6. moose/olap/model-materialized-view.mdx
7. moose/olap/model-table.mdx
8. moose/olap/model-view.mdx
9. moose/olap/planned-migrations.mdx
10. moose/olap/read-data.mdx
11. moose/olap/schema-change.mdx
12. moose/olap/schema-optimization.mdx
13. moose/olap/schema-versioning.mdx
14. moose/olap/supported-types.mdx
15. moose/olap/ttl.mdx

## Applying Migrations

Source: moose/olap/apply-migrations.mdx

How to apply migrations to your database

# Applying Migrations

This page covers OLAP migrations. For migrations across the MooseStack, see the Migrate docs.

## Overview

Migrations are designed for two complementary goals:

- Move fast locally by inferring changes from your code and applying them immediately to your local database.
- Be deliberate in production by executing a reviewed, versioned plan that matches your intent and protects data.

How to think about it:
- Development mode: You edit code, MooseStack infers the SQL and immediately applies it to local ClickHouse. Great for rapid iteration; not guaranteed to infer intent (e.g., renames).
- Production (planned) mode: You generate a plan from the target environment vs your code, review and commit the plan, and MooseStack executes it deterministically during deploy with drift checks.

What you need to do:
- In dev: just code. MooseStack handles local diffs automatically.
- In prod (OLAP):
  - Generate and save a plan:
    ```bash
    moose generate migration --url https://<remote-env> --token <api-token> --save
    ```
  - Review and edit the plan (`plan.yaml`) as needed
  - Commit the plan to source control
  - Deploy to production. MooseStack validates snapshots (current DB vs `remote_state.json`, desired code vs `local_infra_map.json`) and executes `plan.yaml` in order. If drift is detected, the deploy aborts; regenerate the plan and retry.

## Development Workflow

### Starting the Runtime

Use `moose dev` to start the MooseStack runtime with automatic migration detection:

```bash
moose dev

⡏ Starting local infrastructure
  Successfully started containers
     Validated clickhousedb-1 docker container
     Validated redpanda-1 docker container
  Successfully validated red panda cluster
     Validated temporal docker container
  Successfully ran local infrastructure
```

### Hot-Reloaded Migrations

MooseStack continuously monitors your code changes and applies migrations automatically. All changes are applied to your **local database only**.

```ts filename="app/tables/events.ts" copy
interface Event {
  id: Key<string>;
  name: string;
  createdAt: Date;
  status: string; // New field - will trigger migration
}

/planned-migrations`} ctaLabel="Planned Migrations (OLAP)" compact>
Use planned migrations to generate, review, and apply OLAP DDL plans deterministically.

### Generating Migration Plans

When using planned migrations for OLAP, you need to generate a migration plan from the remote environment. This is done by running the following command:

```bash
moose generate migration --url https://<remote-env> --token <api-token> --save
```

This generates a few files in the `migrations` directory:

- `plan.yaml`: The migration plan containing an ordered list of operations to apply to the remote database to bring it into alignment with your local code.
- `remote_state.json`: A snapshot of the remote database state at the time the plan was generated
- `local_infra_map.json`: A snapshot of the local database state at the time the plan was generated

The remote and local state are used to validate that the plan is still valid at the time of deployment. If there have been schema changes made to your live remote database since the plan was generated, the deployment will abort and you will need to regenerate the plan. This is to prevent you from dropping data unintentionally.

### Reviewing and Editing the Plan

You can review and edit the plan as needed. The plan is a YAML file that contains an ordered list of operations to apply to the remote database to bring it into alignment with your local code.

```yaml filename="migrations/plan.yaml" copy

```

### Applying the Plan

The plan is applied during deployment. MooseStack will validate that the remote database state matches the snapshot of the database state at the time the plan was generated, and applies `plan.yaml` in order; it aborts if snapshots don’t match current state.

## Migration Types

### Adding New Tables or Materialized Views

```typescript filename="index.ts" {4-7} copy

// Export the new table and materialized view to apply changes
export {
  newTable, // New table
  newMaterializedView // New materialized view
}
```

The dev mode will automatically detect the new table or materialized view and apply the changes to your local database. You will see a log like this in the terminal:

```bash filename="Terminal" copy
$ moose dev
⠋ Processing Infrastructure changes from file watcher
             +  Table: new_table Version None - id: String, a_column: String, some_other_column: Float64 -  - deduplicate: false
             +  Table: target_table Version None - id: String, a_column: String, some_other_column: Float64 - id - deduplicate: false
             +  SQL Resource: mv_to_target

```

The generated plan for this operation will look like this:

```yaml filename="migrations/plan.yaml" copy
- CreateTable:
    table:
      name: new_table
      columns:
      - name: id
        data_type: String
        required: true
        unique: false
        primary_key: true
        default: null
        annotations: []
      - name: a_column
        data_type: String
        required: true
        unique: false
        primary_key: false
        default: null
        annotations: []
      - name: some_other_column
        data_type: Float64
        required: true
        unique: false
        primary_key: false
        default: null
        annotations: []
      order_by:
      - id
      deduplicate: false
      engine: MergeTree
      version: null
      metadata:
        description: null
      life_cycle: FULLY_MANAGED
- CreateTable:
    table:
      name: target_table
      columns:
      - name: id
        data_type: String
        required: true
        unique: false
        primary_key: true
        default: null
        annotations: []
      - name: a_column
        data_type: String
        required: true
        unique: false
        primary_key: false
        default: null
        annotations: []
      - name: some_other_column
        data_type: Float64
        required: true
        unique: false
        primary_key: false
        default: null
        annotations: []
      order_by:
      - id
      deduplicate: false
      engine: MergeTree
      version: null
      metadata:
        description: null
      life_cycle: FULLY_MANAGED
- RawSQL:
    sql: "CREATE MATERIALIZED VIEW mv_to_target TO target_table AS SELECT * FROM source_table"
    description: Running setup SQL for resource mv_to_target
```

### Column Additions

Adding new fields to your data models:

```ts filename="Before" copy
interface AddedColumn {
  id: Key<string>;
  another_column: string;
  some_column: string;
}
```

```ts filename="After" copy
interface AddedColumn {
  id: Key<string>;
  another_column: string;
  some_column: string;
  new_column: number; // New field - migration applied
}
```

In dev mode, you will see a log like this:

```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
             ~ Table events:
                  Column changes:
                    + new_column: Int64
```

The generated plan for this operation will look like this:

```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
    table: "events"
    column:
      name: "new_column"
      data_type: "Int64"
```

### Column Removals

Removing fields from your data models:

```ts filename="Before" copy
interface RemovedColumn {
  id: Key<string>;
  another_column: string;
  some_column: string;
  old_column: number;
}

const selectStatement = sql`
  SELECT 
    toStartOfDay(${sourceTable.columns.a_date}) as day, 
    uniq(${sourceTable.columns.id}) as count, 
    SUM(${sourceTable.columns.a_number}) as sum 
    FROM ${sourceTable} 
    GROUP BY day
`;

const mv = new MaterializedView({
  selectStatement,
  selectTables: [sourceTable],
  targetTable: {
    name: "target_table",
    engine: ClickHouseEngines.MergeTree,
    orderByFields: ["day"],
  },
  materializedViewName: "mv_table_to_target,
});
```

```ts filename="After.ts" copy

interface TargetSchema {
  day: Date;
  count: number;
  sum: number;
  avg: number; // New column - migration applied
}

const selectStatement = sql`
  SELECT 
    toStartOfDay(${sourceTable.columns.a_date}) as day, 
    uniq(${sourceTable.columns.id}) as count, 
    sum(${sourceTable.columns.a_number}) as sum,
    avg(${sourceTable.columns.a_number}) as avg 
    FROM ${sourceTable} 
    GROUP BY day
`;

---

## Syncing External Tables

Source: moose/olap/db-pull.mdx

Refresh your external table models from an existing ClickHouse database

# Syncing External Tables

## What this is

Use `moose db pull` to refresh the definitions of tables you marked as `EXTERNALLY_MANAGED` from a live ClickHouse instance. It reads your code to find external tables, fetches their remote schemas, regenerates one external models file, and creates a small git commit if anything changed. If new external tables were added remotely (e.g., new CDC streams), they are added to the external models file as part of the same run.

## When to use it

- **External tables changed remotely**: a DBA, CDC, or ETL pipeline updated schema.
- **Keep types in sync**: update generated models without touching fully-managed tables.
- **Safe by design**: does not modify the database or your managed models.

This is a read-only sync for your code models. For concepts and modeling guidance, see [External Tables](/moose/olap/external-tables). To bootstrap a project from an existing DB, see [Initialize from ClickHouse](/moose/getting-started/from-clickhouse).

## Requirements

- Tables are defined with `lifeCycle: EXTERNALLY_MANAGED`
- A ClickHouse connection string (native or HTTP/S)

## Connection strings

`db pull` accepts both native and HTTP(S) URLs. Native strings are automatically converted to HTTP(S) with the appropriate ports.

Examples:

```bash filename="Terminal" copy
# Native (auto-converted to HTTPS + 8443)
moose db pull --connection-string "clickhouse://explorer@play.clickhouse.com:9440/default"

# HTTPS (explicit database via query param)
moose db pull --connection-string "https://play.clickhouse.com/?user=explorer&database=default"

# Local HTTP
moose db pull --connection-string "http://localhost:8123/?user=default&database=default"
```

## What gets written

`app/externalModels.ts`

`db pull` treats this file as the single source of truth for `EXTERNALLY_MANAGED` tables. It introspects the remote schema, updates existing external tables, and adds any newly detected external tables here. It does not modify models elsewhere in your codebase.

Keep all external tables in this file and import it once from your root (`app/index.ts`).

Important:

- The file is overwritten on every run (or at the path passed via `--file-path`).
- If you customize the path, ensure your root file imports it so Moose loads your external models.

## How it works

When you run `db pull` the CLI does the following:

- Loads your project’s infrastructure map and identifies tables marked as `EXTERNALLY_MANAGED`.
- Connects to the remote ClickHouse specified by `--connection-string` and introspects the live schemas for those tables.
- Regenerates a single external models file that mirrors the remote schema.
- Adds any newly detected external tables from the remote database to the generated file so your code stays in sync as sources evolve.
- Does not change any fully managed tables, your `app/index.ts`, or the database itself.
- Creates a small git commit if the generated file changed, so you can review and share the update.

### Example output

```ts filename="app/externalModels.ts"
// AUTO-GENERATED FILE. DO NOT EDIT.
// This file will be replaced when you run `moose db pull`.

// ...typed table definitions matching remote EXTERNALLY_MANAGED tables...
```

## Command

```bash filename="Terminal" copy
moose db pull --connection-string  [--file-path <OUTPUT_PATH>]
```

- **--connection-string**: Required. ClickHouse URL (native or HTTP/S)
- **--file-path**: Optional. Override the default output file. The file at this path will be regenerated (overwritten) on each run.

## Typical Use Cases

### Remote schema changed; update local types
  Your DBA, CDC pipeline (e.g., ClickPipes), or ETL job updated a table’s schema. To keep your code accurate and type-safe, refresh your external models so queries, APIs, and materialized views reference the correct columns and types.
  ```bash filename="Terminal" copy
  moose db pull --connection-string <YOUR_CLICKHOUSE_URL>
  ```
  This updates only `EXTERNALLY_MANAGED` models and leaves managed code untouched.

### Automatically run on dev startup (keep local fresh)
  In active development, schemas can drift faster than you commit updates. Running `db pull` on dev startup helps ensure your local code matches the live schema you depend on.
  ```bash filename="Terminal" copy
  export REMOTE_CLICKHOUSE_URL="clickhouse://<user>:<password>@<host>:<port>/<database>"
  ```
  Add to `moose.config.toml`:
  ```toml filename="moose.config.toml" copy
  [http_server_config]
  on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL"
  ```
  This runs once when the dev server first starts. To run after code reloads, use `on_reload_complete_script`. If you run this frequently, prefer HTTP(S) URLs and cache credentials via env/secrets to avoid friction.

### New project from an existing DB
  If you’re starting with an existing ClickHouse database, bootstrap code with `init --from-remote`, then use `db pull` over time to keep external models fresh:
  ```bash filename="Terminal" copy
  moose init my-project --from-remote $REMOTE_CLICKHOUSE_URL --language <typescript|python>
  ```

  Review the full getting started guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB.

### A new CDC/external table appeared; add it to code
  Your CDC pipeline created a new table (or exposed a new stream). Pull to add the new table to your external models file automatically.
  ```bash filename="Terminal" copy
  moose db pull --connection-string <YOUR_CLICKHOUSE_URL>
  ```
  The regenerated external models file will now include the newly discovered external table.

## Troubleshooting

- **No changes written**: Ensure tables are actually marked as `EXTERNALLY_MANAGED` and names match remote.
- **Unsupported types**: The CLI will list tables with unsupported types; they’re skipped in the generated file.
- **Auth/TLS errors**: Verify scheme/ports (8123 or 8443) and credentials; try HTTPS if native URL fails.
- **Git commit issues**: The command attempts a lightweight commit; commit manually if your working tree is dirty.

## Related

- **External Tables**: <a href="/moose/olap/external-tables">concepts and configuration</a>
- **Initialize from ClickHouse**: <a href="/moose/getting-started/from-clickhouse">bootstrap projects from an existing DB</a>
- **Supported Types**: <a href="/moose/olap/supported-types">mapping and constraints</a>

---

## External Tables

Source: moose/olap/external-tables.mdx

Connect to externally managed database tables and CDC services

# External Tables

## Overview

External tables allow you to connect Moose to database tables that are managed outside of your application. This is essential when working with:

- **CDC (Change Data Capture) services** like ClickPipes, Debezium, or AWS DMS
- **Legacy database tables** managed by other teams
- **Third-party data sources** with controlled schema evolution

## When to Use External Tables

## Configuration

Set `lifeCycle: LifeCycle.EXTERNALLY_MANAGED` to tell Moose not to modify the table schema:

```ts filename="ExternalTableExample.ts" copy

interface CdcUserData {
  id: string;
  name: string;
  email: string;
  updated_at: Date;
}

// Connect to CDC-managed table
const cdcUserTable = new OlapTable("cdc_users", {
  lifeCycle: LifeCycle.EXTERNALLY_MANAGED
});
```

## Getting Models for External Tables

### New project: initialize from your existing ClickHouse

If you don’t yet have a Moose project, use init-from-remote to bootstrap models from your existing ClickHouse:

```bash filename="Terminal" copy
moose init my-project --from-remote <YOUR_CLICKHOUSE_URL> --language <typescript|python>
```

What happens:

- Moose introspects your database and generates table models in your project.
- If Moose detects ClickPipes (or other CDC-managed) tables, it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file:
  - TypeScript: `app/externalModels.ts`
  - Python: `app/external_models.py`
- This is a best-effort detection to separate CDC-managed tables from those you may want Moose to manage in code.

How detection works (ClickPipes/PeerDB example):

- Moose looks for PeerDB-specific fields that indicate CDC ownership and versions, such as `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`, and related metadata columns.
- When these are present, the table will be marked `EXTERNALLY_MANAGED` and emitted into the external models file automatically.

```ts filename="externalModels.ts" copy

export interface foo {
  id: string & typia.tags.Format<"uuid">;
  name: string;
  description: string | undefined;
  status: string;
  priority: number & ClickHouseInt<"int32">;
  is_active: boolean;
  metadata: string | undefined;
  tags: string[];
  score: (string & ClickHouseDecimal<10, 2>) | undefined;
  large_text: string | undefined;
  created_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<6>;
  updated_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<6>;
  _peerdb_synced_at: string & typia.tags.Format<"date-time"> & ClickHousePrecision<9> & ClickHouseDefault<"now64()">;
  _peerdb_is_deleted: number & ClickHouseInt<"int8">;
  _peerdb_version: number & ClickHouseInt<"int64">;
}

);
```

### Existing project: mark additional external tables

If there are other tables in your DB that are not CDC-managed but you want Moose to treat as external (not managed by code):

1) Mark them as external in code

```ts copy
// In a file you control (not the external file yet)
const table = new OlapTable("my_table", {
  lifeCycle: LifeCycle.EXTERNALLY_MANAGED
});
```

2) Move them into the external models file

- Move the model definitions to your external file (`app/externalModels.ts` or `app/external_models.py`).
- Ensure your root file still loads only the external models via a single import:
  - Add `import "./externalModels";` in your `app/index.ts` file.

This keeps truly external tables out of your managed code path, while still making them available locally (and in tooling) without generating production DDL.

## Important Considerations

`EXTERNALLY_MANAGED` tables reflect schemas owned by your CDC/DBA/ETL processes. Do not change their field shapes in code.

If you accidentally edited an external model, revert to the source of truth by running **DB Pull**: [/moose/olap/db-pull](/moose/olap/db-pull).

Locally, externally managed tables are created/kept in sync in your development ClickHouse so you can develop against them and **seed data**. See **Seed (ClickHouse)** in the CLI: [/moose/moose-cli#seed-clickhouse](/moose/moose-cli#seed-clickhouse).

Moose will **not** apply schema changes to `EXTERNALLY_MANAGED` tables in production. If you edit these table models in code, those edits will not produce DDL operations in the migration plan (they will not appear in `plan.yaml`).

For more on how migration plans are generated and what shows up in `plan.yaml`, see [/moose/olap/planned-migrations](/moose/olap/planned-migrations).

## Staying in sync with remote schema

For `EXTERNALLY_MANAGED` tables, keep your code in sync with the live database by running DB Pull. You can do it manually or automate it in dev.

```bash filename="Terminal" copy
moose db pull --connection-string <YOUR_CLICKHOUSE_URL>
```

Use DB Pull to regenerate your external models file from the remote schema. To run it automatically during development, see the script hooks in [the local development guide](/moose/local-dev#script-execution-hooks).

---

## Secondary Indexes

Source: moose/olap/indexes.mdx

Specifying indexes with Moose OLAP

## Indexes for ClickHouse tables

Moose lets you declare secondary/data-skipping indexes directly in your table definitions.
Moose generates the ClickHouse `INDEX` clauses on create and
plans `ALTER TABLE ADD/DROP INDEX` operations when you change them later.

### When to use indexes

- Use indexes to optimize selective predicates on large tables, especially string and high-cardinality columns.
- Common types: `minmax`, `Set(max_rows)`, `ngrambf_v1(...)`, `bloom_filter`.

### TypeScript

```ts

interface Events {
  id: string;
  user: string;
  message: string;
}

);
```

### Python

```python
from moose_lib.dmv2.olap_table import OlapTable, OlapConfig, MergeTreeEngine
from pydantic import BaseModel

class Events(BaseModel):
  id: str
  user: str
  message: str

events_table = OlapTable[Events](
  "Events",
  OlapConfig(
    engine=MergeTreeEngine(),
    order_by_fields=["id"],
    indexes=[
      OlapConfig.TableIndex(name="idx_user", expression="user", type="minmax", granularity=1),
      OlapConfig.TableIndex(name="idx_message_ngrams", expression="message", type="ngrambf_v1", arguments=["3","256","1","123"], granularity=1),
    ],
  ),
)
```

### How Moose applies changes

- On create, Moose emits `INDEX ...` entries inside `CREATE TABLE`.
- On change, Moose plans `ALTER TABLE DROP INDEX <name>` then `ADD INDEX ...` if the definition changed; pure adds/drops are applied as single operations.

---

## Inserting Data

Source: moose/olap/insert-data.mdx

Insert data into OLAP tables using various methods

# Inserting Data

Inserting data into your database is a common task. MooseStack provides a few different ways to insert data into your database.

If a table column is modeled as optional in your app type but has a ClickHouse default, Moose treats incoming records as optional at the API/stream boundary, but the ClickHouse table stores the column as required with a DEFAULT clause. If you omit the field in the payload, ClickHouse fills it with the default at insert time.

`field?: number & ClickHouseDefault<"18">` or `WithDefault<number, "18">`

## From a Stream (Streaming Ingest)
When you need to stream data into your ClickHouse tables, you can set the `Stream.destination` as a reference to the `OlapTable` you want to insert into. This will automatically provision a synchronization process that batches and inserts data into the table.  

```ts filename="StreamInsert.ts" copy

interface Event {
    id: Key<string>;
    userId: string;
    timestamp: Date;
    eventType: string;
}

const eventsTable = new OlapTable("Events");

const stream = new Stream("Events", {
    destination: eventsTable // automatically syncs the stream to the table in ClickHouse-optimized batches
});

```

[ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically batches your data into ClickHouse-optimized batches of up to 100,000 records, with automatic flushing every second. It also handles at-least-once delivery and retries on connection errors to ensure your data is never lost.

## From a Workflow (Batch Insert)

If you have data source better suited for batch patterns, use a workflow and the direct `insert()` method to land data into your tables:

```ts filename="WorkflowInsert.ts" copy

interface Event {
  id: Key<string>;
  userId: string;
  timestamp: Date;
  eventType: string;
}

const eventsTable = new OlapTable("user_events");

const etlTask = new Task<null, void>({
    name: "ETL",
    run: async () => {
        const result = await eventsTable.insert([
            { id: "evt_1", userId: "user_123", timestamp: new Date(), eventType: "click" },
            { id: "evt_2", userId: "user_456", timestamp: new Date(), eventType: "view" }
            // ... more records of type Event
        ]);
    }
})

)
```

## From a Client App 

### Via REST API
In your Moose code, you can leverage the built in [MooseAPI module](/moose/apis) to place a `POST` REST API endpoint in front of your streams and tables to allow you to insert data from external applications.  

<Tabs.Tab>

```ts filename="IngestApi.ts" copy

const ingestApi = new IngestApi("user_events", {
    destination: events_stream
});
```

</Tabs.Tab>

<Tabs.Tab>
Alternatively, use `IngestPipeline` instead of standalone `IngestApi`, `Stream` `OlapTable` components:

```ts filename="IngestPipeline.ts" copy

const eventsPipeline = new IngestPipeline("user_events", {
   ingestApi: true,
   stream: true,
   table: {
    orderByFields: ["id", "timestamp"],
    engine: ClickHouseEngines.ReplacingMergeTree,
   }
})
```

</Tabs.Tab>

With these APIs you can leverage the built-in OpenAPI client integration to generate API clients in your own language to connect to your pipelines from external applications.

### Coming Soon: MooseClient
We're working on a new client library that you can use to interact with your Moose pipelines from external applications.

Join the community slack to stay updated and let us know if you're interested in helping us build it. 

## Direct Data Insertion

The `OlapTable` provides an `insert()` method that allows you to directly insert data into ClickHouse tables with validation and error handling.

### Inserting Arrays of Records

```ts filename="DirectInsert.ts" copy

interface UserEvent {
  id: Key<string>;
  userId: string;
  timestamp: Date;
  eventType: string;
}

const eventsTable = new OlapTable("user_events");

// Insert single record or array of records
const result = await eventsTable.insert([
  { id: "evt_1", userId: "user_123", timestamp: new Date(), eventType: "click" },
  { id: "evt_2", userId: "user_456", timestamp: new Date(), eventType: "view" }
]);

console.log(`Successfully inserted: ${result.successful} records`);
console.log(`Failed: ${result.failed} records`);
```

ClickHouse strongly recommends batching inserts. You should avoid inserting single records in to tables, and consider using Moose Streams and Ingest Pipelines if your data source sends events as individual records.

### Handling Large Batch Inserts

For large datasets, use Node.js streams for memory-efficient processing:
```ts filename="StreamInsert.ts" copy

const dataStream = new Readable({
  objectMode: true,
  read() {
    // Stream implementation
  }
});

const result = await eventsTable.insert(dataStream, {
  strategy: 'fail-fast'  // Note: 'isolate' not supported with streams
});
```

### Validation Methods

Before inserting data, you can validate it using the following methods:

```ts filename="ValidationMethods.ts" copy
// Type guard with compile-time type narrowing
if (eventsTable.isValidRecord(unknownData)) {
  // TypeScript now knows unknownData is UserEvent
  console.log(unknownData.userId); // Type-safe access
}

// Detailed validation with error reporting
const validationResult = eventsTable.validateRecord(unknownData);
if (validationResult.success) {
  console.log("Valid data:", validationResult.data);
} else {
  console.log("Validation errors:", validationResult.errors);
}

// Assert validation (throws on failure)
try {
  const validData = eventsTable.assertValidRecord(unknownData);
  // Use validData with full type safety
} catch (error) {
  console.log("Validation failed:", error.message);
}
```

### Error Handling Strategies

Choose from three error handling strategies based on your reliability requirements:

#### Fail-Fast Strategy (Default)

```ts filename="FailFast.ts" copy
// Stops immediately on any error
const result = await eventsTable.insert(data, {
  strategy: 'fail-fast'
});
```

#### Discard Strategy

```ts filename="Discard.ts" copy
// Discards invalid records, continues with valid ones
const result = await eventsTable.insert(data, {
  strategy: 'discard',
  allowErrors: 10,           // Allow up to 10 failed records
  allowErrorsRatio: 0.05     // Allow up to 5% failure rate
});
```

#### Isolate Strategy

```ts filename="Isolate.ts" copy
// Retries individual records to isolate failures
const result = await eventsTable.insert(data, {
  strategy: 'isolate',
  allowErrorsRatio: 0.1
});

// Access detailed failure information
if (result.failedRecords) {
  result.failedRecords.forEach(failed => {
    console.log(`Record ${failed.index} failed: ${failed.error}`);
  });
}
```

### Performance Optimization

The insert API includes several performance optimizations:

- **Memoized connections**: ClickHouse clients are reused across insert calls
- **Batch processing**: Optimized batch sizes for large datasets
- **Async inserts**: Automatic async insert mode for datasets > 1000 records
- **Connection management**: Use `close_client()` when completely done

```ts filename="Performance.ts" copy
// For high-throughput scenarios
const result = await eventsTable.insert(largeDataset, {
  validate: false,  // Skip validation for performance
  strategy: 'discard'
});

// Clean up when completely done (optional)
await eventsTable.closeClient();
```

## Best Practices

---

## Creating Materialized Views

Source: moose/olap/model-materialized-view.mdx

Create and configure materialized views for data transformations

# Modeling Materialized Views

## Overview

Materialized views are write-time transformations in ClickHouse. A static `SELECT` populates a destination table from one or more sources. You query the destination like any other table. The `MaterializedView` class wraps [ClickHouse `MATERIALIZED VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view) and keeps the `SELECT` explicit. When you edit the destination schema in code and update the `SELECT` accordingly, Moose applies the corresponding DDL, orders dependent updates, and backfills as needed, so the pipeline stays consistent as you iterate.

In local dev, Moose Migrate generates and applies DDL to your local database.

Today, destination schemas are declared in code and kept in sync manually with your `SELECT`. Moose Migrate coordinates DDL and dependencies when you make those changes. A future enhancement will infer the destination schema from the `SELECT` and update it automatically.

This dependency awareness is critical for [cascading materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view-with-dependencies). Moose Migrate [orders DDL across views and tables](https://www.fiveonefour.com/blog/Moose-SQL-Getting-DDL-Dependencies-in-Order) to avoid failed migrations and partial states.

### Basic Usage

```ts filename="BasicUsage.ts" copy

// Define the schema of the transformed rows-- this is static and it must match the results of your SELECT. It also represents the schema of your entire destination table.
interface TargetSchema {
  id: string;
  average_rating: number;
  num_reviews: number;
}
);
```

The ClickHouse `MATERIALIZED VIEW` object acts like a trigger: on new inserts into the source table(s), it runs the SELECT and writes the transformed rows to the destination.

### Quick Reference

```typescript filename="ViewOptions.ts"

interface MaterializedViewConfig {
  // Static SELECT that computes the destination rows
  selectStatement: string | Sql;

  // Tables/views the query reads from
  selectTables: (OlapTable<any> | View)[];

  // Name of the ClickHouse MATERIALIZED VIEW object
  materializedViewName: string;

  // Destination table where materialized rows are stored
  targetTable?:
    | OlapTable
    | {
        name: string;
        engine?: ClickHouseEngines;
        orderByFields?: (keyof T & string)[];
      };

  /** @deprecated prefer targetTable */
  tableName?: string;
  /** @deprecated prefer targetTable */
  engine?: ClickHouseEngines;
  /** @deprecated prefer targetTable */
  orderByFields?: (keyof T & string)[];
}
```

## Modeling the Target Table

The destination table is where the transformed rows are written by the materialized view. You can model it in two ways:

<Tabs.Tab>

### Option 1 — Define target table inside the MaterializedView (most cases)

- Simple, co-located lifecycle: the destination table is created/updated/dropped with the MV.
- Best for: projection/denormalization, filtered serving tables, enrichment joins, and most rollups.

```ts filename="InlineTarget.ts" copy

interface Dest { id: string; value: number }

new MaterializedView({
  selectStatement: sql`SELECT id, toInt32(value) AS value FROM ${sourceTable}`,
  selectTables: [sourceTable],
  targetTable: { name: "serving_table", orderByFields: ["id"] }, // MergeTree by default
  materializedViewName: "mv_to_serving_table",
});
```

</Tabs.Tab>

<Tabs.Tab>

### Option 2 — Decoupled: reference a standalone `OlapTable`

Certain use cases may benefit from a separate lifecycle for the target table that is managed independently from the MV.

```ts filename="DecoupledTarget.ts" copy

interface Dest { id: string; value: number }

const targetTable = new OlapTable("target_table", {
  engine: ClickHouseEngines.MergeTree,
  orderByFields: ["id"],
});

new MaterializedView({
  selectStatement: sql`SELECT id, toInt32(value) AS value FROM ${sourceTable}`,
  selectTables: [sourceTable],
  targetTable: targetTable,
  materializedViewName: "mv_to_target_table",
});
```

</Tabs.Tab>

### Basic Transformation, Cleaning, Filtering, Denormalization
Create a narrower, query-optimized table from a wide source. Apply light transforms (cast, rename, parse) at write time.

  ```ts filename="Denormalization.ts" copy

  interface Dest { id: string; value: number; created_at: string }

  new MaterializedView({
    selectStatement: sql`
      SELECT id, toInt32(value) AS value, created_at
      FROM ${sourceTable}
      WHERE active = 1
    `,
    selectTables: [sourceTable],
    targetTable: { name: "proj_table" },
    materializedViewName: "mv_to_proj_table",
  });
  ```

### Aggregations

<Tabs.Tab>
### Simple Additive Rollups
When you want to maintain running sums (counts, totals) that are additive per key, use the `SummingMergeTree` engine:

  ```ts filename="Summing.ts" copy

  interface DailyCounts { 
    day: string; 
    user_id: string; 
    events: number; 
  }

  const stmt = sql`
    SELECT 
      toDate(${events.columns.timestamp}) AS day, 
      ${events.columns.user_id} AS user_id, 
      count(*) AS events
    FROM ${events}
    GROUP BY day, user_id
  `;

  new MaterializedView({
    selectStatement: stmt,
    selectTables: [events],
    targetTable: {
      name: "daily_counts",
      engine: ClickHouseEngines.SummingMergeTree,
      orderByFields: ["day", "user_id"],
    },
    materializedViewName: "mv_to_daily_counts",
  });
  ```

</Tabs.Tab>
<Tabs.Tab>
#### Complex Aggregations

When you want to compute complex aggregation metrics that are not just simple additive operations (sum, count, avg, etc), but instead uses more complex anlaytical functions: (topK,percentile, etc), create a target table with the `AggregatingMergeTree` engine.

  ```ts filename="AggTransform.ts" copy

  interface MetricsById {
    id: string;
    avg_rating: number & Aggregated<"avg", [number]>;
    daily_uniques: number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]>;
  }

  //  All Aggregate Functions in this query have a [functionName][State]() suffix.
  const stmt = sql`
    SELECT
      ${events.columns.id} AS id,
      avgState(${events.columns.rating})      AS avg_rating,
      uniqExactState(${events.columns.user_id}) AS daily_uniques
    FROM ${events}
    GROUP BY ${events.columns.id}
  `;

  new MaterializedView({
    selectStatement: stmt,
    selectTables: [events],
    targetTable: {
      name: "metrics_by_id",
      engine: ClickHouseEngines.AggregatingMergeTree,
      orderByFields: ["id"],
    },
    materializedViewName: "mv_metrics_by_id",
  });
  ```

Jump to the [Advanced: AggregatingMergeTree transformations](#advanced-aggregatingmergetree-transformations) section for more details.

</Tabs.Tab>

### Fan-in Patterns
When you have multiple sources that you want to merge into a single destination table, its best to create an OlapTable and reference it in each MV that needs to write to it:

```ts filename="FanIn.ts" copy
interface DailyCounts { day: string; user_id: string; events: number }

// Create the destination table explicitly
const daily = new OlapTable("daily_counts", {
  engine: ClickHouseEngines.SummingMergeTree,
  orderByFields: ["day", "user_id"],
});

// MV 1 - write to the daily_counts table
const webStmt = sql`SELECT toDate(ts) AS day, user_id, 1 AS events FROM ${webEvents}`;
const mv1 = new MaterializedView({
  selectStatement: webStmt,
  selectTables: [webEvents],
  targetTable: daily,
  materializedViewName: "mv_web_to_daily_counts",
});

// MV 2 - write to the daily_counts table
const mobileStmt = sql`SELECT toDate(ts) AS day, user_id, 1 AS events FROM ${mobileEvents}`;
const mv2 = new MaterializedView({
  selectStatement: mobileStmt,
  selectTables: [mobileEvents],
  targetTable: daily,
  materializedViewName: "mv_mobile_to_daily_counts",
});
```

### Blue/green schema migrations
Create a new table for a breaking schema change and use an MV to copy data from the old table; when complete, switch reads to the new table and drop just the MV and old table.

For more information on how to use materialized views to perform blue/green schema migrations, see the [Schema Versioning](./schema-versioning) guide.

## Defining the transformation

The `selectStatement` is a static SQL query that Moose runs to transform data from your source table(s) into rows for the destination table.

Transformations are defined as ClickHouse SQL queries. We strongly recommend using the ClickHouse SQL reference and functions overview to help you develop your transformations.

- Use the Moose `sql` template to interpolate tables and columns safely. This gives type-checked column references and prevents runtime parameters.
- Reference tables and columns via objects in your project (e.g., `${sourceTable}`, `${sourceTable.columns.id}`) rather than string literals.

```ts filename="Transformation.ts" copy

interface Dest { id: string; name: string; day: string }

const transformation = sql`
  SELECT
    ${users.columns.id}        AS id,
    ${users.columns.name}      AS name,
    toDate(${events.columns.ts}) AS day
  FROM ${events}
  JOIN ${users} ON ${events.columns.user_id} = ${users.columns.id}
  WHERE ${events.columns.active} = 1
`;

new MaterializedView({
  selectStatement: transformation,
  selectTables: [events, users],
  targetTable: { name: "user_activity_by_day" },
  materializedViewName: "mv_user_activity_by_day",
});
```

The columns returned by your `SELECT` must exactly match the destination table schema.

- Use column aliases (`AS target_column_name`) to align names.
- All destination columns must be present in the `SELECT`, or the materialized view won't be created. Adjust your transformation or table schema so they match.

Go to the [Advanced: Writing SELECT statements to Aggregated tables](#writing-select-statements-to-aggregated-tables) section for more details.

## Backfill Destination Tables
When the MaterializedView is created, Moose backfills the destination once by running your `SELECT` (so you start with a fully populated table).

Materialized views that source from S3Queue tables are **not backfilled** automatically. S3Queue tables only process new files added to S3 after the table is created - there is no historical data to backfill from. The MV will start populating as new files arrive in S3.

You can see the SQL that Moose will run to backfill the destination table when you generate the [Migration Plan](./migration-plan).

During dev mode, as soon as you save the MaterializedView, Moose will run the backfill and you can see the results in the destination table by querying it in your local ClickHouse instance.

## Query Destination Tables

You can query the destination table like any other table.

<Tabs.Tab>
For inline or decoupled target tables, you can reference target table columns and tables directly in your queries:

```ts filename="Query.ts" copy

// Inline-defined destination table from earlier examples
const q = sql`
SELECT 
  ${mv.targetTable.columns.id},
  ${mv.targetTable.columns.value} 
FROM ${mv.targetTable} 
ORDER BY ${mv.targetTable.columns.id} 
LIMIT 10`;
```

</Tabs.Tab>

<Tabs.Tab>
If you define your target table outside of the MaterializedView, you can also just reference the table by its variable name in your queries:

```ts filename="QueryDecoupled.ts" copy

const targetTable = new OlapTable<{ id: string; average_rating: number }>("target_table")

// Assuming `targetTable` is the OlapTable you created explicitly
const q = sql`
  SELECT ${targetTable.columns.id}, ${targetTable.columns.average_rating}
  FROM ${targetTable}
  WHERE ${targetTable.columns.id} = 'abc'
`;
```

</Tabs.Tab>

Go to the [Querying Aggregated tables](#querying-aggregated-tables) section for more details on how to query Aggregated tables.

## Advanced: Aggregations + Materialized Views

This section dives deeper into advanced patterns and tradeoffs when building aggregated materialized views.

### Target Tables with `AggregatingMergeTree`

When using an `AggregatingMergeTree` target table, you must use the `AggregateFunction` type to model the result of the aggregation functions:

  ```ts filename="AggTransform.ts" copy

  interface MetricsById {
    id: string;
    /**
     * Result of avgState(events.rating)
     *   - avgState(number) returns number, so model the type as number
     *   - Aggregated arg type is [number] because the column (events.rating) is a number
     *   - Aggregated function name is "avg"
     */
    avg_rating: number & Aggregated<"avg", [number]>;
    /** 
    * Result of uniqExactState(events.user_id)
    *  - uniqExact returns an integer; use number & ClickHouseInt<"uint64"> for precision
    *  - Aggregated arg type is [string] because the column (events.user_id) is a string
    *  - Aggregated function name is "uniqExact"
    */
    daily_uniques: number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]>;
  }

  //  All Aggregate Functions in this query have a [functionName][State]() suffix.
  const stmt = sql`
    SELECT
      ${events.columns.id} AS id,
      avgState(${events.columns.rating})      AS avg_rating,
      uniqExactState(${events.columns.user_id}) AS daily_uniques
    FROM ${events}
    GROUP BY ${events.columns.id}
  `;

  new MaterializedView({
    selectStatement: stmt,
    selectTables: [events],
    targetTable: {
      name: "metrics_by_id",
      engine: ClickHouseEngines.AggregatingMergeTree,
      orderByFields: ["id"],
    },
    materializedViewName: "mv_metrics_by_id",
  });
  ```

  - Using `avg()`/`uniqExact()` in the SELECT instead of `avgState()`/`uniqExactState()`
  - Forgetting to annotate the schema with `Aggregated<...>` so the target table can be created correctly
  - Mismatch between `GROUP BY` keys in your `SELECT` and the `orderByFields` of your target table

### Modeling columns with `AggregateFunction`

- Pattern: `U & Aggregated<"agg_func_name", [Types]>`
- `U` is the read-time type (e.g., `number`, `string`)
- `agg_func_name` is the aggregation name (e.g., `avg`, `uniqExact`)
- `Types` are the argument types. These are the types of the columns that are being aggregated.

```ts filename="FunctionToTypeMapping.ts" copy
number & Aggregated<"avg", [number]> // avgState(col: number)
number & ClickHouseInt<"uint64"> & Aggregated<"uniqExact", [string]> // uniqExactState(col: string)
number & ClickHouseInt<"uint64"> & Aggregated<"count", []> // countState(col: any)

string & Aggregated<"argMax", [string, Date]> // argMaxState(col: string, value: Date)
string & Aggregated<"argMin", [string, Date]> // argMinState(col: string, value: Date)

number & Aggregated<"corr", [number, number]> // corrState(col1: number, col2: number)

```

### Writing SELECT statements to Aggregated tables

When you write to an `AggregatingMergeTree` table, you must add a `State` suffix to the aggregation functions in your `SELECT` statement.

```ts filename="AggTransform.ts" copy

interface MetricsById {
  id: string;
  avg_rating: number & Aggregated<"avg", [number]>;
  total_reviews: number & Aggregated<"sum", [number]>;
}

const aggStmt = sql`
  SELECT
    ${reviews.columns.id} AS id,
    avgState(${reviews.columns.rating}) AS avg_rating,
    countState(${reviews.columns.id})   AS total_reviews
  FROM ${reviews}
  GROUP BY ${reviews.columns.id}
`;

const mv = new MaterializedView({
  selectStatement: aggStmt,
  selectTables: [reviews],
  targetTable: {
    name: "metrics_by_id",
    engine: ClickHouseEngines.AggregatingMergeTree,
    orderByFields: ["id"],
  },
  materializedViewName: "mv_metrics_by_id",
});
```

Why states? Finalized values (e.g., `avg()`) are not incrementally mergeable. Storing states lets ClickHouse maintain results efficiently as new data arrives. Docs: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/index and https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state

### Querying Aggregated Tables
When you query a table with an `AggregatingMergeTree` engine, you must use aggregate functions with the `Merge` suffix (e.g., `avgMerge`) `or rely on Moose’s `Aggregated` typing plus `sql` to auto-finalize at query time.

```ts filename="QueryAgg.ts" copy

// Auto-finalized via Aggregated + sql
const cols = mv.targetTable.columns; // mv from earlier Agg example
const autoFinalized = sql`
  SELECT ${cols.avg_rating}, ${cols.total_reviews}
  FROM ${mv.targetTable}
  WHERE ${cols.id} = '123'
`;

// Manual finalization (explicit ...Merge)
const manual = sql`
  SELECT
    avgMerge(avg_rating)   AS avg_rating,
    countMerge(total_reviews) AS total_reviews
  FROM metrics_by_id
  WHERE id = '123'
`;
```

## Choosing the right engine

- Use `MergeTree` for copies/filters/enrichment without aggregation semantics.
- Use `SummingMergeTree` when all measures are additive, and you want compact, eventually-consistent sums.
- Use `AggregatingMergeTree` for non-additive metrics and advanced functions; store states and finalize on read.
- Use `ReplacingMergeTree` for dedup/upserts or as an idempotent staging layer before rollups.

---

## Modeling Tables

Source: moose/olap/model-table.mdx

Model your database schema in code using native TypeScript/Python typing

# Modeling Tables

## Overview

Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.

You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.

```ts filename="FirstTable.ts" copy
interface MyFirstTable {
  id: Key<string>;
  name: string;
  age: number;
}

// Create a table named "first_table"
);

// For deduplication, use the ReplacingMergeTree factory
);

// Now you can:
// - Write to this table from streams
// - Query it directly
// - Use it as a source for materialized views
```

### Creating Tables in Ingestion Pipelines

For end-to-end data flows, create tables as part of an ingestion pipeline:

```ts filename="PipelineTable.ts"

// Define your schema
interface UserEvent {
  id: Key<string>;
  userId: string;
  timestamp: Date;
  eventType: string;
}

// Create a complete ingestion pipeline with a table
const eventsPipeline = new IngestPipeline("user_events", {
  ingestApi: true,    // Creates a REST API endpoint at POST localhost:4000/ingest/user_events
  stream: true,    // Creates Kafka/Redpanda topic
  table: {         // Creates and configures the table named "user_events"
    orderByFields: ["id", "timestamp"]
  }
});

// Access the table component when needed
const eventsTable = eventsPipeline.table;
```

## Data Modeling

### Special ClickHouse Types (LowCardinality, Nullable, etc)

```ts filename="ClickHouseTypes.ts" copy

export interface ClickHouseOptimizedExample {
  id: Key<string>;
  stringField: string;
  numberField: number;
  decimalField: Decimal<10, 2>;                         // Precise decimal storage
  // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works
  lowCardinalityField: string & LowCardinality;         // Faster queries for enum-like data
  nestedObject: {
    innerString: string;
    innerNumber: number;
  };
  namedTupleField: {
    name: string;
    value: number;
  } & ClickHouseNamedTuple;                             // Optimized nested storage
  numberArray: number[];
  mapField: Record<string, number>;
  literalField: "optionA" | "optionB";
  optionalField?: string;                               // Nullable field
  dateField: Date;
}
```

### Default values

Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.

```ts filename="Defaults.ts" copy

interface Event {
  id: Key<string>;
  // Static defaults (ClickHouse expression as a string literal)
  status: string & ClickHouseDefault<"'pending'">;   // DEFAULT 'pending'
  retries: number & ClickHouseDefault<"0">;          // DEFAULT 0
  // Server-side timestamps
  createdAt: Date & ClickHouseDefault<"now()">;      // DEFAULT now()
  // Decimal with default
  amount: Decimal<10, 2> & ClickHouseDefault<"0">;
  // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax
}

);
```

The value passed into the `ClickHouseDefault<"">` tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on `Date` fields with `ClickHouseDefault`, use `WithDefault<Date, "now()">` as a fallback workaround.

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

  - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type.
  - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault<number, "18">`) → non-nullable column with default `18`.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

### Database Selection

By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field:

```ts filename="DatabaseOverride.ts" copy

interface UserData {
  id: Key<string>;
  name: string;
  email: string;
}

// Table in default database (from moose.config.toml)
const defaultTable = new OlapTable("users");

// Table in specific database (e.g., "analytics")
const analyticsTable = new OlapTable("users", {
  database: "analytics",
  orderByFields: ["id"]
});
```

To use custom databases, configure them in your `moose.config.toml`:

```toml
[clickhouse_config]
db_name = "local"
additional_databases = ["analytics", "staging"]
```

The databases in `additional_databases` will be created automatically when you start your Moose application.

### Primary Keys and Sorting

You must configure table indexing using one of these approaches:
1. Define at least one `Key` in your table schema
2. Specify `orderByFields` in the table config
3. Use both (all `Key` fields must come first in the `orderByFields` array)

```ts filename="PrimaryKeyConfig.ts" copy

// Approach 1: Using primary key only
interface Record1 {
  id: Key<string>;  // Primary key field
  field1: string;
  field2: number;
}

const table1 = new OlapTable("table1");  // id is the primary key
```

### Order By Fields Only

```ts filename="OrderByFieldsOnly.ts" copy
// Approach 2: Using orderByFields only
interface SchemaWithoutPrimaryKey {
  field1: string;
  field2: number;
  field3: Date;
}

const tableWithOrderByFieldsOnly = new OlapTable("table2", {
  orderByFields: ["field1", "field2"]  // Specify ordering without primary key
});
```

### Order By Expression

Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.

```ts filename="OrderByExpression.ts" copy
// Use a ClickHouse expression for ORDER BY
interface Events {
  userId: string;
  createdAt: Date;
  eventType: string;
}

const tableWithOrderByExpression = new OlapTable("events", {
  // Equivalent to orderByFields: ["userId", "createdAt", "eventType"]
  orderByExpression: "(userId, createdAt, eventType)",
});

// Advanced: functions inside expression
const tableWithMonthBucketing = new OlapTable("events_by_month", {
  orderByExpression: "(userId, toYYYYMM(createdAt))",
});

// No sorting (ClickHouse tuple() means empty ORDER BY)
const unsortedTable = new OlapTable("events_unsorted", {
  orderByExpression: "tuple()",
});
```

### Using Both Primary Key and Order By Fields

```ts filename="ComboKeyAndOrderByFields.ts" copy
// Approach 3: Using both (primary key must be first)
interface SchemaWithKey {
  id: Key<string>;  // Primary key field
  field1: string;
  field2: number;
}

const tableWithKeyAndOrderByFields = new OlapTable("table3", {
  orderByFields: ["id", "field1"]  // Primary key must be first
});
```

### Using Multiple Primary Keys

```ts filename="MultiKeyTable.ts" copy
interface MultiKeyRecord {
  key1: Key<string>;
  key2: Key<number>;
  field1: string;
}

const multiKeyTable = new OlapTable("multi_key_table", {
  orderByFields: ["key1", "key2", "field1"]  // Multiple keys must come first
});
```

### Table engines
By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration.

```ts filename="TableEngine.ts" copy

// Default MergeTree engine
const table = new OlapTable("table", {
  orderByFields: ["id"]
});

// Use engine configuration for other engines
const dedupTable = new OlapTable("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "version",  // Optional: keeps row with highest version
  isDeleted: "deleted"  // Optional: soft delete when deleted=1
});
```

#### Deduplication (`ReplacingMergeTree`)
Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key:

```ts filename="DeduplicatedTable.ts" copy
// Basic deduplication
const table = new OlapTable("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"]
});

// With version column (keeps record with highest version)
const versionedTable = new OlapTable("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "updated_at"  // Column that determines which version to keep
});

// With soft deletes (requires ver parameter)
const softDeleteTable = new OlapTable("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "updated_at",
  isDeleted: "deleted"  // UInt8 column: 1 marks row for deletion
});
```

ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.

**Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.

**Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.

For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree).

#### Streaming from S3 (`S3Queue`)
Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added:

S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details.

```ts filename="S3StreamingTable.ts" copy

// Use direct configuration (S3Queue does not support orderByFields)
);
```

S3Queue is a streaming engine and does not support `orderByFields` or ORDER BY clauses. Configure only engine-specific parameters like `s3Path`, `format`, and `settings`.

**Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.

**Solution**: Use `mooseRuntimeEnv` for runtime credential resolution:

```ts filename="SecureS3Streaming.ts" copy

// ✅ RECOMMENDED: Runtime environment variable resolution
);
```

**Then set environment variables:**
```bash filename="Terminal" copy
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod up
```

**Benefits:**
- Credentials never embedded in Docker images
- Supports credential rotation (changing passwords triggers table recreation)
- Different credentials per environment (dev/staging/prod)
- Clear error messages if environment variables are missing

S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.

#### Direct S3 Access (`S3`)
Use the `S3` engine for direct read/write access to S3 storage without streaming semantics:

```ts filename="S3Table.ts" copy

// S3 table with credentials (recommended with mooseRuntimeEnv)
);

// Public S3 bucket (no authentication)
);
```

- **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data.
- **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion.

Both engines support the same credential management and format options.

#### In-Memory Buffer (`Buffer`)
The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:

```ts filename="BufferTable.ts" copy

// First create the destination table
);

// Then create buffer that writes to it
);
```

- Data in buffer is **lost if server crashes** before flush
- Not suitable for critical data that must be durable
- Best for high-throughput scenarios where minor data loss is acceptable
- Buffer and destination table must have identical schemas
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables

For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer).

#### Distributed Tables (`Distributed`)
The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling:

```ts filename="DistributedTable.ts" copy

// Distributed table across cluster
);
```

- Requires a configured ClickHouse cluster with remote_servers configuration
- The local table must exist on all cluster nodes
- Distributed tables are virtual - data is stored in local tables
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables
- The `cluster` name must match a cluster defined in your ClickHouse configuration

For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).

#### Replicated Engines
Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:

- `ReplicatedMergeTree` - Replicated version of MergeTree
- `ReplicatedReplacingMergeTree` - Replicated with deduplication
- `ReplicatedAggregatingMergeTree` - Replicated with aggregation
- `ReplicatedSummingMergeTree` - Replicated with summation

```ts filename="ReplicatedEngines.ts" copy

// Basic replicated table with explicit paths
const replicatedTable = new OlapTable("records", {
  engine: ClickHouseEngines.ReplicatedMergeTree,
  keeperPath: "/clickhouse/tables/{database}/{shard}/records",
  replicaName: "{replica}",
  orderByFields: ["id"]
});

// Replicated with deduplication
const replicatedDedup = new OlapTable("dedup_records", {
  engine: ClickHouseEngines.ReplicatedReplacingMergeTree,
  keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records",
  replicaName: "{replica}",
  ver: "updated_at",
  isDeleted: "deleted",
  orderByFields: ["id"]
});

// For ClickHouse Cloud or Boreal (no parameters needed)
const cloudReplicated = new OlapTable("cloud_records", {
  engine: ClickHouseEngines.ReplicatedMergeTree,
  orderByFields: ["id"]
});
```

The `keeper_path` and `replica_name` parameters are **optional** for replicated engines:

- **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse).

- **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster.

**Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime.

For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication).

## Externally Managed Tables
If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`.

```ts filename="ExternallyManagedTable.ts" copy

// Table managed by external system
const externalTable = new OlapTable("external_users", {
  orderByFields: ["id", "timestamp"],
  lifeCycle: LifeCycle.EXTERNALLY_MANAGED  // Moose won't create or modify this table
});
```

Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation.

## Invalid Configurations

```ts filename="InvalidConfig.ts" copy
// Error: No primary key or orderByFields
interface BadRecord1 {
  field1: string;
  field2: number;
}
const badTable1 = new OlapTable("bad_table1");

// Error: Primary key not first in orderByFields
interface BadRecord2 {
  id: Key<string>;
  field1: string;
}
const badTable2 = new OlapTable("bad_table2", {
  orderByFields: ["field1", "id"]  // Wrong order - primary key must be first
});

// Error: Nullable field in orderByFields
interface BadRecord3 {
  id: Key<string>;
  field1: string;
  field2?: number;
}
const badTable3 = new OlapTable("bad_table3", {
  orderByFields: ["id", "field2"]  // Can't have nullable field in orderByFields
});
```

## Development Workflow

### Local Development with Hot Reloading

One of the powerful features of Moose is its integration with the local development server:

1. Start your local development server with `moose dev`
2. When you define or modify an `OlapTable` in your code and save the file:
   - The changes are automatically detected
   - The TypeScript compiler plugin processes your schema definitions
   - The infrastructure is updated in real-time to match your code changes
   - Your tables are immediately available for testing

For example, if you add a new field to your schema:

```ts filename="HotReloading.ts" copy
// Before
interface BasicSchema {
  id: Key<string>;
  name: string;
}

// After adding a field
interface BasicSchema {
  id: Key<string>;
  name: string;
  createdAt: Date;  // New field
}
```

The Moose framework will:
1. Detect the change when you save the file
2. Update the table schema in the local ClickHouse instance
3. Make the new field immediately available for use

### Verifying Your Tables

You can verify your tables were created correctly using:
```bash filename="Terminal" copy
# List all tables in your local environment
moose ls
```

#### Connecting to your local ClickHouse instance
You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:

```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```

---

## Modeling Views

Source: moose/olap/model-view.mdx

Define standard ClickHouse Views for read-time projections

# Modeling Views

## Overview
Views are read-time projections in ClickHouse. A static `SELECT` defines the view over one or more base tables or other views. Moose wraps [ClickHouse `VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view) with a simple `View` class in TypeScript. You provide the view name, the `SELECT`, and the list of source tables/views so Moose can order DDL correctly during migrations.

Use `View` when you want a virtual read-time projection and don’t need write-time transformation or a separate storage table. For write-time pipelines and backfills, use a Materialized View instead.

## Basic Usage

```ts filename="BasicUsage.ts" copy

`.

---

## Planned Migrations (OLAP)

Source: moose/olap/planned-migrations.mdx

Generate, review, and safely execute ClickHouse DDL plans

# Planned Migrations

Migration planning is a new way to have more fine-grained control over HOW database schema changes are applied to your database when you deploy your code into production. 

## Why planned migrations?

Most database migrations are designed under the assumption that your code is the sole owner of the database schema. In OLAP databases, we have to be more careful and assume that schema changes can happen at any time:
- The database schema is shared with other services (e.g. Change Data Capture services like ClickPipes)
- Other users (e.g. analysts) of the database may have credentials that let them change the schema

This is why the plan is generated from the remote environment, and validated against the live state of the database at the time of deployment. If it detects a drift, it will abort the deployment and require you to regenerate the plan, to make sure you are not dropping data unintentionally.

Planned migrations apply only to OLAP (ClickHouse) schema changes. Streaming, APIs, and processes are unaffected by this flow.

## What this does

- Generates an ordered set of ClickHouse operations and writes them to `./migrations/plan.yaml`
- Saves two validation snapshots for drift detection:
  - `./migrations/remote_state.json` (state when plan was created)
  - `./migrations/local_infra_map.json` (desired state from your local code)
- When enabled, validates state and executes the exact reviewed operations

## Prerequisites

```toml file="moose.config.toml"
[features]
olap = true
ddl_plan = true
```
## Generating a Plan

Once done editing your code in your feature branch, you can generate a plan that diffs your local code against your live remote database:

**For Moose server deployments:**
```bash filename="Terminal" copy
moose generate migration --url https://<remote-env> --token <api-token> --save
```

**For serverless deployments:**
```bash filename="Terminal" copy
moose generate migration --clickhouse-url clickhouse://user:pass@host:port/db --save
```

Outputs:

```text
./migrations/plan.yaml
./migrations/remote_state.json
./migrations/local_infra_map.json
```

What each file contains:
- `remote_state.json`: The state of the remote database when the plan was generated.
- `local_infra_map.json`: The state of the local code when the plan was generated.
- `plan.yaml`: The plan to apply to the remote database based on the diff between the two states.

You will commit the entire `migrations/` directory to version control, and Moose will automatically apply the plan when you deploy the code to production.

## Review and edit the plan

Moose makes some assumptions about your schema changes, such as renaming a column instead of dropping and adding. You can modify the plan to override these assumptions.

Open `plan.yaml` in your PR. Operations are ordered (teardown first, then setup) to avoid dependency issues. Review like regular code. You can also edit the plan to override the default assumptions Moose makes.

```yaml filename="migrations/plan.yaml" copy
# Drop a deprecated column
- DropTableColumn:
    table: "events"
    column_name: "deprecated_field"

# Rename a column to match code
- RenameTableColumn:
    table: "events"
    before_column_name: "createdAt"
    after_column_name: "created_at"

# Add a new nullable column after created_at
- AddTableColumn:
    table: "events"
    column:
      name: "status"
      data_type: "String"
      required: false
      unique: false
      primary_key: false
      default: null
      annotations: []
      comment: null
    after_column: "created_at"

# Change a column type to Nullable(Float64)
- ModifyTableColumn:
    table: "events"
    before_column:
      name: "value"
      data_type: "Float64"
      required: false
      unique: false
      primary_key: false
      default: null
      annotations: []
      comment: null
    after_column:
      name: "value"
      data_type:
        Nullable:
          nullable: "Float64"
      required: false
      unique: false
      primary_key: false
      default: null
      annotations: []
      comment: null

# Create a simple view via raw SQL
- RawSql:
    sql:
      - "CREATE VIEW IF NOT EXISTS `events_by_user` AS SELECT user_id, count() AS c FROM events GROUP BY user_id"
    description: "Creating view events_by_user"
```

You can edit the plan to override the default assumptions Moose makes.

### When to edit the plan
There are two main reasons to edit the plan:

1. To "override" the default assumptions Moose makes when it cannot infer the intent of your schema changes, such as renaming a column instead of dropping and adding.
2. To add new operations that are not covered by the default assumptions, such as adding a backfill operation to a new column.

#### Rename a column instead of drop/add

When you rename a column, Moose will default to dropping and adding the column. However, you can override this by using the `RenameTableColumn` operation:

```ts

interface SourceSchema {
  created_at: Date;
  color: string;
}

interface TargetSchema {

  createdAt: Date;
  color: string;
}

 sourceTable = new OlapTable("source_table");

 targetTable = new OlapTable("target_table");

```

```yaml filename="migrations/plan.yaml" copy
- DropTableColumn:
    table: source_table
    column_name: created_at
- AddTableColumn:
    table: source_table
    column:
      name: createdAt
      data_type: DateTime
      required: true
      unique: false
      primary_key: false
      default: null
      annotations: []
    after_column: color
```

In the plan, you can override this by using the `RenameTableColumn` operation:

```yaml filename="migrations/plan.yaml" copy
created_at: 2025-08-20T05:35:31.668353Z
  - RenameTableColumn:
      table: source_table
      before_column_name: created_at
      after_column_name: createdAt
```

#### Add a backfill operation to a new column

When you add a new column, Moose will default to backfilling the column based on the value in the `default` field. 

If your field is a `DateTime`, you can edit the plan to set the default value to the current timestamp:

```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
    table: "source_table"
    column:
      name: "created_at"
      data_type: "DateTime"
      required: false
      unique: false
      default: NOW ## Specify the default value to the current timestamp
```

You can also override the the default behavior by using the `RawSql` operation to define your own custom backfill logic:

```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
    table: "source_table"
    column:
      name: "created_at"
      data_type: "DateTime"
      required: false
      unique: false
      default: null
- RawSql:
    sql:
      - "UPDATE events SET created_at = toDateTime(created_at_ms / 1000) WHERE created_at IS NULL"
    description: "Backfill created_at from created_at_ms"
```

## Deployment Flows

### Moose Server Deployments

For Moose server deployments (with `moose prod` running), migrations are applied automatically on startup. Generate plans using:

```bash filename="Terminal" copy
moose generate migration --url https://<remote-env> --token <api-token> --save
```

When you deploy, Moose validates the plan and executes it automatically.

### Serverless Deployments

For serverless deployments (no Moose server), you manage migrations manually using the ClickHouse connection directly:

```toml file="moose.config.toml"
[state_config]
storage = "clickhouse"

[features]
olap = true
data_model_v2 = true
```

**Workflow:**

1. **Generate the plan** from your ClickHouse database:
```bash filename="Terminal" copy
moose generate migration --clickhouse-url <url> --save
```

2. **Review** the generated `./migrations/` files in your PR

3. **Execute the plan** against your ClickHouse with CI/CD or manually:
```bash filename="Terminal" copy
moose migrate --clickhouse-url <url>
```

Before applying the plan, Moose will first validate that the snapshot of your database that was taken when you generated the plan is still the same as the current database state. If it is not, Moose will abort the deployment. If it is, Moose will execute the plan in `plan.yaml` against your production database.

Execution rules:
- If current tables in your live production database differ from `remote_state.json`, Moose aborts (remote drift since planning).
- If desired tables in your local code differ from `local_infra_map.json`, Moose aborts (code changed since planning).
- If both match, `plan.yaml` operations are executed in order against ClickHouse.

## Troubleshooting
- Failure to connect to remote database? Make sure you have [your admin API key setup correctly](./apis/auth#admin-endpoints)
- Plan rejected due to drift: Re-generate a plan against the current remote, review, and retry.
- No execution in moose server deployments: Ensure `ddl_plan = true` and `./migrations/plan.yaml` exists.
- OLAP disabled: Ensure `[features].olap = true`.

---

## Querying Data

Source: moose/olap/read-data.mdx

Query OLAP tables using SQL with type safety

# Querying Data

Moose provides type-safe SQL querying for your `OlapTable` and `MaterializedView` instances. Use cases include:

- Building APIs to expose your data to client/frontend applications
- Building transformation pipelines inside your database with materialized views

## Querying with MooseClient

Use `MooseClient` to query data from existing tables and materialized views.

### Basic Querying

```ts filename="BasicQuerying.ts"

const client = new MooseClient();

// Query existing table
const query = sql`
  SELECT id, name, email
  FROM ${UserTable}
  WHERE status = 'active'
  LIMIT 10
`;

const result = await client.query.execute(query);
const data = await result.json();
```

### Querying Materialized Views

```ts filename="QueryMaterializedView.ts"

const client = new MooseClient();

// Query existing materialized view
const query = sql`
  SELECT user_id, total_orders, average_order_value
  FROM user_stats_view
  WHERE total_orders > 10
  ORDER BY average_order_value DESC
`;

const result = await client.query.execute(query);
```

## Select With Column and Table References

```ts filename="TypedReferences.ts"

// Reference table columns with type safety
const cols = UserTable.columns;
const query = sql`
  SELECT 
    ${cols.id},
    ${cols.name},
    ${cols.email}
  FROM ${UserTable}
  WHERE ${cols.status} = 'active'
`;

// Multiple table references
const joinQuery = sql`
  SELECT 
    ${UserTable.columns.id},
    ${UserTable.columns.name},
    ${OrderTable.columns.order_value}
  FROM ${UserTable}
  JOIN ${OrderTable} ON ${UserTable.columns.id} = ${OrderTable.columns.user_id}
`;
```

When you query a materialized view, you reference the `MaterializedView.targetTable` to get the columns of the target table.

```ts filename="TypedReferences.ts"

const query = sql`
  SELECT 
    ${ExampleMaterializedView.targetTable.columns.id},
    ${ExampleMaterializedView.targetTable.columns.name},
    ${ExampleMaterializedView.targetTable.columns.email}
  FROM ${ExampleMaterializedView.targetTable}
`;
```

In ClickHouse, when you query a Materialized View that has columns of type `AggregateFunction` in the result set, ordinarily you would need to run:

```sql
SELECT sumMerge(amount) FROM {ExampleMaterializedView}
```

When querying this with Moose, you can just reference the column name in the `sql` template literal. The interpolation will be replaced with the correct ClickHouse function:

```ts filename="TypedReferences.ts"

const query = sql`
  SELECT ${ExampleMaterializedView.targetTable.columns.amount}
  FROM ${ExampleMaterializedView.targetTable}
`;

// This will be replaced with:
// SELECT sumMerge(amount) FROM {ExampleMaterializedView}
```

## Filtering with WHERE Clauses

```ts filename="WhereClauses.ts"

// Multiple WHERE conditions
const filterQuery = sql`
  SELECT ${UserTable.columns.id}, ${UserTable.columns.name}
  FROM ${UserTable}
  WHERE ${UserTable.columns.status} = 'active'
    AND ${UserTable.columns.created_at} > '2024-01-01'
    AND ${UserTable.columns.email} ILIKE ${'%' + searchTerm + '%'}
`;

// Using IN clauses
const inQuery = sql`
  SELECT * FROM ${UserTable}
  WHERE ${UserTable.columns.id} IN (${userIds})
`;

// Using BETWEEN
const rangeQuery = sql`
  SELECT * FROM ${UserTable}
  WHERE ${UserTable.columns.age} BETWEEN ${minAge} AND ${maxAge}
`;
```

## Dynamic Query Building

Use the `sql` template literal to build safe queries:

```ts filename="SqlTemplateLiterals.ts"

// Safe interpolation with sql template literal
const status = 'active';
const limit = 10;

const query = sql`
  SELECT id, name, email
  FROM ${UserTable}
  WHERE ${UserTable.columns.status} = ${status}
  LIMIT ${limit}
`;

// Conditional WHERE clauses
interface FilterParams {
  minAge?: number;
  status?: "active" | "inactive";
  searchText?: string;
}

const buildConditionalQuery = (filters: FilterParams) => {
  let conditions = [];

  if (filters.minAge !== undefined) {
    conditions.push(sql`age >= ${filters.minAge}`);
  }

  if (filters.status) {
    conditions.push(sql`status = ${filters.status}`);
  }

  if (filters.searchText) {
    conditions.push(sql`(name ILIKE ${'%' + filters.searchText + '%'} OR email ILIKE ${'%' + filters.searchText + '%'})`);
  }

  let query = sql`SELECT * FROM ${UserTable}`;

  if (conditions.length > 0) {
    query = sql`${query} WHERE ${conditions.join(' AND ')}`;
  }

  return sql`${query} ORDER BY created_at DESC`;
};
```

## Building APIs

To build REST APIs that expose your data, see the [Bring Your Own API Framework documentation](/moose/app-api-frameworks) for comprehensive examples and patterns using Express, Koa, Fastify, or FastAPI.

## Common Pitfalls

- **Column name typos**: Use `UserTable.columns.columnName` for autocomplete
- **Type mismatches**: Ensure your schema types match ClickHouse types
- **Missing imports**: Import your table definitions before using them
- **Template literal syntax**: Use backticks `sql` not regular strings
- **Forgetting await**: Always await `client.query.execute()`

## Performance Optimization
If your query is slower than expected, there are a few things you can check:

- If using filters, try to filter on a column that is defined in the `orderByFields` of the table
- For common queries, consider [creating a materialized view](/stack/olap/create-materialized-view) to pre-compute the result set

## Further Reading

---

## schema-change

Source: moose/olap/schema-change.mdx

# Handling Failed Migrations

One of the main benefits of the Moose local development environment is that you can detect breaking schema changes before they happen in production. This can be specifically useful for identifying incompatible data type changes when you change a column's data type and the generated migration cannot cast the existing data to the new type.

This page describes how to recover from a failed migration in dev and gives a playbook for safely achieving the desired type change.

## What happened
You changed a column’s data type on a table that already has data. The dev migration tried to run an in-place ALTER and ClickHouse created a mutation that failed (incompatible cast, nullability, defaults, etc.).

Symptoms:
- Failed migration in dev
- A stuck mutation on the table
- Reverting your code type alone doesn’t help until the mutation is cleared

## Quick recovery (dev)
Follow these steps to get unblocked quickly.

### View the terminal logs to see the failing mutation
In your terminal, you should see a message like this:

```txt
⢹ Processing Infrastructure changes from file watcher
  ~ Table events:
      Column changes:
        ~ value: String -> Float64
  Applying: ALTER TABLE events MODIFY COLUMN value Float64
  ClickHouse mutation created: mutation_id='00000001-0000-4000-8000-000000000123'
  Error: Code: 368. Conversion failed: cannot parse 'abc' as Float64 (column: value)
  Status: mutation failed; table may be partially transformed
```

Copy the mutation ID from the terminal logs and run the following command to kill the mutation.

### Kill the mutation
- If you have the `mutation_id`:
```sql
KILL MUTATION WHERE mutation_id = '<id>';
```

- If you didn’t capture the ID, find it and kill by table:
```sql
SELECT mutation_id, command, is_done, latest_fail_reason
FROM system.mutations
WHERE database = currentDatabase() AND table = '<your_table>'
ORDER BY create_time DESC;

KILL MUTATION WHERE database = currentDatabase() AND table = '<your_table>';
```

ClickHouse ALTERs are implemented as asynchronous mutations, not transactional. If a mutation fails mid-way, some parts may have been rewritten while others were not, leaving the table partially transformed. The failed mutation also remains queued until you kill it. Clear the mutation first, then proceed.

Soon, Moose will automatically generate a local DDL plan that kills the mutation and "rolls back" the transformation to the data that was changed before the failure occurred. 

### Revert your code to match the current DB schema
- Change the column type in code back to the previous (working) type
- Save your changes; let `moose dev` resync. You should be able to query the table again

If the table only has disposable dev data, you can also `TRUNCATE TABLE <db>.<table>` or drop/recreate the table and let `moose dev` rebuild it. Only do this in dev.

## Safely achieving the desired type change
Instead of editing the column type in place, you can add a new column with the target type and backfill the data. This is the recommended approach.

### Add a new column + backfill

```ts filename="app/tables/events.ts" copy
enum StatusEnum {
  "active" = "active",
  "inactive" = "inactive",
}

interface Event {
  id: Key<string>;
  name: string;
  createdAt: Date;
  status: string;
  status_v2: StatusEnum; // New field - will trigger migration
}
```

Then, generate a plan to add the new column and backfill the data.

```bash
moose generate migration --url <YOUR_PRODUCTION_DATABASE_URL> --save --token <MOOSE_ADMIN_API_KEY>
```

Open the generated `/migrations/plan.yaml` file. You'll see the `AddTableColumn` operation to add the new column. Right after it, you can add a `RawSql` operation to backfill the data. Here you can write an `ALTER TABLE` statement to update the new column with the data from the old column:

```yaml filename="migrations/plan.yaml"
- AddTableColumn:
    table: "events"
    column:
      name: "status_v2"
      data_type:
        Nullable:
          nullable: "StatusEnum"
      default: null
- RawSql:
    sql:
      - "ALTER TABLE events UPDATE status_v2 = toStatusEnumOrNull(status) WHERE status_v2 IS NULL"
    description: "Backfill status_v2 from status"
```

Then, when writing to the table, double write to both columns.

This allows for all surrounding processes and applications that rely on the old column to continue working, and you can later deprecate the old column and rename the new column when you are ready.

### Later, deprecate the old column and rename the new column

Once the column backfill is complete and you are ready to deprecate the old column, you can rename the new column to the old column name and apply this in a new, subsequent PR.

In your code, you can rename the column and deprecate the old column:

```ts filename="app/tables/events.ts" copy
interface Event {
  id: Key<string>;
  name: string;
  createdAt: Date;
  status_old: string; // rename status to status_old
  status: StatusEnum; // rename status_v2 to status
}
```

Initially you'll see two `DeleteTableColumn` operations, followed by two `AddTableColumn` operations. 

*IMPORTANT*: DELETE ALL FOUR GENERATED `DeleteTableColumn` AND `AddTableColumn` OPERATIONS WITH THE FOLLOWING:

```yaml filename="migrations/plan.yaml"
- RenameTableColumn:
    table: "events"
    before_column_name: "status"
    after_column_name: "status_old"
- RenameTableColumn:
    table: "events"
    before_column_name: "status_v2"
    after_column_name: "status"
```

Once the old column is no longer needed, you can drop it in a third PR.

```yaml filename="migrations/plan.yaml"
- DropTableColumn:
    table: "events"
    column_name: "status_old"
```

## Common breaking cases
- String -> Int/Float: can fail on non-numeric rows; prefer `toInt64OrNull(...)`/`toFloat64OrNull(...)` + backfill
- Nullable(T) -> T (NOT NULL): fails if any NULLs exist and no default is provided; backfill then drop nullability
- Narrowing types (e.g., Int64 -> Int32): fails if values overflow; validate and transform first

Read about migration planning and how to use it to safely manage schema changes in production.

---

## schema-optimization

Source: moose/olap/schema-optimization.mdx

# Schema Optimization

Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements.

## Data Types

Keep the following best practices in mind when defining your column types:

### Avoid Nullable Columns

Nullable columns in ClickHouse have significant performance overhead.

Instead of using `?` or `| undefined`, add the  `& ClickHouseDefault<"...">` to your column type
.

```ts filename="AvoidNullable.ts"

// ❌ Bad: Using nullable columns
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  description?: string;  // Nullable
  createdAt: Date;
}

// ✅ Good: Use default values instead
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  description: string & ClickHouseDefault<"''">  // DEFAULT ''
  createdAt: Date;
}

const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["id", "createdAt"]
});
```

### Use `LowCardinality` where possible

`LowCardinality` is ClickHouse's most efficient string type for columns with limited unique values.

```ts filename="LowCardinality.ts"

interface UserEvent {
  id: string;
  userId: string;
  eventType: string & LowCardinality;  // ✅ Good for limited values
  status: "active" | "inactive" | "pending"; // ✅ Literals become LowCardinality automatically
  country: string & LowCardinality;    // ✅ Good for country codes
  userAgent: string;                  // ❌ Keep as String for high cardinality
  createdAt: Date;
}

const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["id", "createdAt"]
});
```

### Pick the right Integer types

Choose the smallest integer type that fits your data range to save storage and improve performance.

```ts filename="IntegerTypes.ts"

interface UserEvent {
  id: string;
  userId: string;
  age: UInt8;                             // ✅ 0-255 (1 byte)
  score: Int16;                           // ✅ -32,768 to 32,767 (2 bytes)
  viewCount: UInt32;                      // ✅ 0 to ~4 billion (4 bytes)
  timestamp: UInt64;                      // ✅ Unix timestamp (8 bytes)
  eventType: string;
  createdAt: Date;
}

// Integer type ranges:
// UInt8:  0 to 255
// UInt16: 0 to 65,535
// UInt32: 0 to 4,294,967,295
// UInt64: 0 to 18,446,744,073,709,551,615
// Int8:   -128 to 127
// Int16:  -32,768 to 32,767
// Int32:  -2,147,483,648 to 2,147,483,647
// Int64:  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
```

### Use the right precision for `DateTime`

Choose appropriate DateTime precision based on your use case to balance storage and precision.

```ts filename="DateTimePrecision.ts"

interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  createdAt: DateTime;             // ✅ Second precision (default)
  updatedAt: DateTime64<3>;        // ✅ Millisecond precision
  processedAt: DateTime64<6>;      // ✅ Microsecond precision
  loggedAt: DateTime64<9>;         // ✅ Nanosecond precision
}

const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["id", "createdAt"]
});
```

### Use Decimal over Float

Use `Decimal` for financial and precise calculations to avoid floating-point precision issues.

```ts filename="DecimalOverFloat.ts"

interface Order {
  id: string;
  userId: string;
  amount: Decimal<10, 2>;         // ✅ 10 total digits, 2 decimal places
  tax: Decimal<8, 2>;             // ✅ 8 total digits, 2 decimal places
  discount: Decimal<5, 2>;        // ✅ 5 total digits, 2 decimal places
  total: Decimal<12, 2>;          // ✅ 12 total digits, 2 decimal places
  createdAt: Date;
}

// ❌ Bad: Using float for financial data
interface BadOrder {
  id: string;
  amount: number;  // Float - can cause precision issues
  tax: number;     // Float - can cause precision issues
}

const ordersTable = new OlapTable("orders", {
  orderByFields: ["id", "createdAt"]
});
```

### Use `NamedTuple` over `Nested`

`NamedTuple` is more efficient than `Nested` for structured data in ClickHouse.

```ts filename="NamedTupleOverNested.ts"

interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  location: {
    latitude: number;
    longitude: number;
    city: string;
    country: string;
  } & ClickHouseNamedTuple;
  metadata: {
    version: string;
    source: string;
    priority: number;
  } & ClickHouseNamedTuple;
  createdAt: Date;
}

// ❌ Bad: Using Nested (less efficient)
interface BadUserEvent {
  id: string;
  location: {
    latitude: number;
    longitude: number;
  };
}

const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["id", "createdAt"]
});
```

## Ordering

### Choose columns that you will use in WHERE and GROUP BY clauses

Optimize your `orderByFields` (or `orderByExpression`) for your most common query patterns.

```ts filename="OrderByOptimization.ts"

interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  status: string;
  createdAt: Date;
  country: string;
}

// ✅ Good: Optimized for common query patterns
const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["userId", "createdAt", "eventType"]  // Most common filters first
});

// Common queries this optimizes for:
// - WHERE userId = ? AND createdAt > ?
// - WHERE userId = ? AND eventType = ?
// - GROUP BY userId, eventType
```

### `ORDER BY` should prioritize LowCardinality columns first

Place `LowCardinality` columns first in your `orderByFields` (or reflect this priority in your `orderByExpression`) for better compression and query performance.

```ts filename="LowCardinalityOrdering.ts"

interface UserEvent {
  id: string;
  userId: string;
  eventType: LowCardinality<string>;    // ✅ Low cardinality
  status: LowCardinality<string>;       // ✅ Low cardinality
  country: LowCardinality<string>;      // ✅ Low cardinality
  createdAt: Date;                      // High cardinality
  sessionId: string;                    // High cardinality
}

// ✅ Good: LowCardinality columns first
const userEventsTable = new OlapTable("user_events", {
  orderByFields: ["eventType", "status", "country", "createdAt", "sessionId"]
});

// ❌ Bad: High cardinality columns first
const badUserEventsTable = new OlapTable("user_events", {
  orderByFields: ["createdAt", "sessionId", "eventType", "status"]  // Less efficient
});
```

---

## Schema Versioning with Materialized Views

Source: moose/olap/schema-versioning.mdx

Use table versions and materialized views to migrate breaking schema changes safely

# Table Versioning & Blue/Green Migrations

## Overview

Changing a table's storage layout (engine or sorting key) in ClickHouse requires a full table rewrite. Doing it in-place can block or slow concurrent reads and writes due to heavy merges and metadata changes, creating real risk for production workloads. Blue/Green avoids this by creating a new versioned table and migrating data live via a materialized view, so traffic continues uninterrupted.

**When to use it**: 
- Change the **table engine** (e.g., MergeTree → ReplacingMergeTree)
- Update **ORDER BY fields** (sorting keys) to better match query patterns
- Reshape **primary keys** or perform type changes that require a rewrite

**How Moose does it**:
1. Define a new table with the same logical name and a bumped `version`, setting the new  `orderByFields` and/or `engine` ([Table modeling](/moose/olap/model-table)).
2. Create a [Materialized view](/moose/olap/model-materialized-view) that selects from the old table and writes to the new one; Moose backfills once and keeps the view live for new inserts.
3. Later on, cut over readers/writers to the new export and clean up old resources ([Applying migrations](/moose/olap/apply-migrations)).

Setting `config.version` on an `OlapTable` changes only the underlying table name (suffixes dots with underscores). Your code still refers to the logical table you exported.

## High-level workflow

## Example: change sorting key (ORDER BY)

Assume the original `events` table orders by `id` only. We want to update the
sorting key to optimize reads by ordering on `id, createdAt`.

### Original table (version 0.0)

```ts filename="app/tables/events.ts" copy

interface EventV0 {
  id: string;
  name: string;
  createdAt: Date;
}

);
```

### New table (bump to version 0.1)

Create a new table with the same logical name, but set `version: "0.1"` and update the ordering to `id, createdAt`. Moose will create `events_0_1` in ClickHouse.

```ts filename="app/tables/events_v01.ts" copy

interface EventV1 {
  id: Key<string>;
  name: string;
  createdAt: Date;
}

);
```

### Create the materialized view to migrate data

Create a materialized view that:
- SELECTs from the old table (`events_v0`)
- copies fields 1:1 to the new table
- writes into the versioned target table (`events_v1`)

Pass the versioned `OlapTable` instance as `targetTable`. If you only pass a `tableName`, Moose will create an unversioned target.

```ts filename="app/views/migrate_events_to_v01.ts" copy

);
```

What happens when you export this view:
- Moose creates the versioned table if needed
- Moose creates the MATERIALIZED VIEW and immediately runs a one-time backfill (`INSERT INTO ... SELECT ...`)
- ClickHouse keeps the view active: any new inserts into `events` automatically flow into `events_0_1`

## Cutover and cleanup

- Update readers to query the new table export (`eventsV1`).
- Update writers/streams to produce to the new table if applicable.
- After verifying parity and retention windows, drop the old table and the migration view.

## Notes and tips

- Use semantic versions like `0.1`, `1.0`, `1.1`. Moose will render `events_1_1` as the physical name.
- Keep the migration view simple and deterministic. If you need complex transforms, prefer explicit SQL in the `selectStatement`.
- Very large backfills can take time. Consider deploying during low-traffic windows.

---

## Supported Column Types

Source: moose/olap/supported-types.mdx

Complete guide to defining columns for ClickHouse tables in Moose

# Supported Column Types

Moose supports a comprehensive set of ClickHouse column types across both TypeScript and Python libraries. This guide covers all supported types, their syntax, and best practices for defining table schemas.

## Basic Types

### String Types

```typescript
interface User {
  string: string;                             // String
  lowCardinality: string & LowCardinality;    // LowCardinality(String)
  uuid: string & tags.Format<"uuid">;         // UUID (with typia tags)
}
```
| ClickHouse Type | TypeScript | Description |
|------|------------|--------|
| `String` | `string` | Variable-length string |
| `LowCardinality(String)` | `string & LowCardinality` | Optimized for repeated values |
| `UUID` | `string & tags.Format<"uuid">` | UUID format strings |

### Numeric Types

### Integer Types

```typescript

interface Metrics {
  user_id: Int32;           // Int32
  count: UInt64;            // UInt64
  small_value: Int8;        // Int8
}

// Alternative: You can still use the verbose syntax if preferred
interface MetricsVerbose {
  user_id: number & ClickHouseInt<"int32">;
  count: number & ClickHouseInt<"uint64">;
  small_value: number & ClickHouseInt<"int8">;
}
```

| ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description |
|------|------------|------------|--------|
| `Int8` | `Int8` | `number & ClickHouseInt<"int8">` | -128 to 127 |
| `Int16` | `Int16` | `number & ClickHouseInt<"int16">` | -32,768 to 32,767 |
| `Int32` | `Int32` | `number & ClickHouseInt<"int32">` | -2,147,483,648 to 2,147,483,647 |
| `Int64` | `Int64` | `number & ClickHouseInt<"int64">` | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| `UInt8` | `UInt8` | `number & ClickHouseInt<"uint8">` | 0 to 255 |
| `UInt16` | `UInt16` | `number & ClickHouseInt<"uint16">` | 0 to 65,535 |
| `UInt32` | `UInt32` | `number & ClickHouseInt<"uint32">` | 0 to 4,294,967,295 |
| `UInt64` | `UInt64` | `number & ClickHouseInt<"uint64">` | 0 to 18,446,744,073,709,551,615 |

### Floating Point Types

```typescript

interface SensorData {
  temperature: Float32;                     // Float32
  humidity: Float64;                        // Float64
  pressure: number;                         // Default Float64
}

// Alternative: You can still use the verbose syntax if preferred
interface SensorDataVerbose {
  temperature: number & tags.Type<"float">; // Float32
  humidity: number;                         // Float64
}
```
| ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description |
|-----------------|------------|------------|---------------------|
| `Float32`       | `Float32` | `number & tags.Type<"float">` | 32-bit floating point |
| `Float64`       | `Float64` or `number` | `number` | 64-bit floating point (default) |

### Decimal Types

```typescript

interface FinancialData {
  amount: Decimal<10, 2>;                     // Decimal(10,2)
  rate: Decimal<5, 4>;                        // Decimal(5,4)
  fee: Decimal<8, 3>;                         // Decimal(8,3)
}

// Alternative: You can still use the verbose syntax if preferred
interface FinancialDataVerbose {
  amount: string & ClickHouseDecimal<10, 2>;  // Decimal(10,2)
  rate: string & ClickHouseDecimal<5, 4>;     // Decimal(5,4)
}
```

| ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description |
|------|------------|------------|---------------------|
| `Decimal(P,S)` | `Decimal<P,S>` | `string & ClickHouseDecimal<P,S>` | Fixed-point decimal with P total digits, S decimal places |

### Boolean Type

```typescript
interface User {
  is_active: boolean;
  verified: boolean;
}
```
| ClickHouse Type | TypeScript | Description |
|------|------------|--------|
| `Boolean` | `boolean` | `boolean` |

### Date and Time Types

```typescript

interface Event {
  created_at: DateTime;                               // DateTime
  updated_at: DateTime64<3>;                         // DateTime(3)
  logged_at: DateTime64<6>;                          // DateTime(6) - microsecond precision
  birth_date: Date;                                   // Date
}

// Alternative: You can still use the verbose syntax if preferred
interface EventVerbose {
  created_at: Date;                                   // DateTime
  updated_at: Date & ClickHousePrecision<3>;         // DateTime(3)
  birth_date: Date;                                   // Date
}
```

| ClickHouse Type | TypeScript (New Helper) | TypeScript (Verbose) | Description |
|-----------------|------------|------------|-------------|
| `Date`          | `Date`     | `Date`     | Date only   |
| `Date16`        | `Date`     | `Date`     | Compact date format |
| `DateTime`      | `DateTime` | `Date`     | Date and time |
| `DateTime(P)`   | `DateTime64` | `Date & ClickHousePrecision` | DateTime with precision (P=0-9) |

### Network Types

```typescript

interface NetworkEvent {
  source_ip: string & tags.Format<"ipv4">;
  dest_ip: string & tags.Format<"ipv6">;
}
```
| ClickHouse Type | TypeScript | Description |
|------|------------|--------|
| `IPv4` | `string & tags.Format<"ipv4">` | IPv4 addresses |
| `IPv6` | `string & tags.Format<"ipv6">` | IPv6 addresses |

## Complex Types

### Geometry Types

Moose supports ClickHouse geometry types. Use the helpers in each language to get type-safe models and correct ClickHouse mappings.

```typescript
  ClickHousePoint,
  ClickHouseRing,
  ClickHouseLineString,
  ClickHouseMultiLineString,
  ClickHousePolygon,
  ClickHouseMultiPolygon,
} from "@514labs/moose-lib";

interface GeoTypes {
  point: ClickHousePoint;                 // Point → [number, number]
  ring: ClickHouseRing;                   // Ring → Array<[number, number]>
  lineString: ClickHouseLineString;       // LineString → Array<[number, number]>
  multiLineString: ClickHouseMultiLineString; // MultiLineString → Array<Array<[number, number]>>
  polygon: ClickHousePolygon;             // Polygon → Array<Array<[number, number]>>
  multiPolygon: ClickHouseMultiPolygon;   // MultiPolygon → Array<Array<Array<[number, number]>>>
}
```

| ClickHouse Type | TypeScript |
|------|------------|
| `Point` | `ClickHousePoint` |
| `Ring` | `ClickHouseRing` |
| `LineString` | `ClickHouseLineString` |
| `MultiLineString` | `ClickHouseMultiLineString` |
| `Polygon` | `ClickHousePolygon` |
| `MultiPolygon` | `ClickHouseMultiPolygon` |

Geometry coordinates are represented as numeric pairs `[x, y]` (TypeScript) or `tuple[float, float]` (Python).

### Array Types

Arrays are supported for all basic types and some complex types.

```typescript
interface User {
  tags: string[];           // Array(String)
  scores: number[];         // Array(Float64)
  metadata: Record<string, any>[];  // Array(Json)
  tuple: {
    name: string;
    age: number;
  } & ClickHouseNamedTuple[]; // Array(Tuple(String, Int32))
}
```

### Map Types

Maps store key-value pairs with specified key and value types.

```typescript
interface User {
  preferences: Record<string, string>;     // Map(String, String)
  metrics: Record<string, number>;         // Map(String, Float64)
}
```

### Nested Types

Nested types allow embedding complex objects within tables.

```typescript
interface Address {
  street: string;
  city: string;
  zip: string;
}

interface User {
  name: string;
  address: Address;  // Nested type
}
```

### Named Tuple Types

Named tuples provide structured data with named fields.

```typescript

interface Point {
  x: number;
  y: number;
}

interface Shape {
  center: Point & ClickHouseNamedTuple;  // Named tuple
  radius: number;
}
```

### Enum Types

Enums map to ClickHouse enums with string or integer values.

```typescript
enum UserRole {
  ADMIN = "admin",
  USER = "user",
  GUEST = "guest"
}

interface User {
  role: UserRole;  // Enum with string values
}
```

## Special Types

### JSON Type

The `Json` type stores arbitrary JSON data with optional schema configuration for performance and type safety.

#### Basic JSON (Unstructured)

For completely dynamic JSON data without any schema:

```typescript
interface Event {
  metadata: Record<string, any>;  // Basic JSON - accepts any structure
  config: any;                    // Basic JSON - fully dynamic
}
```

#### Rich JSON with Type Configuration

For better performance and validation, you can define typed fields within your JSON using `ClickHouseJson`. This creates a ClickHouse `JSON` column with explicit type hints for specific paths.

```typescript

// Define the structure for your JSON payload
interface PayloadStructure {
  name: string;
  count: number;
  timestamp?: Date;
}

interface Event {
  id: string;
  // JSON with typed paths - better performance, allows extra fields
  payload: PayloadStructure & ClickHouseJson;

  // JSON with performance tuning options
  metadata: PayloadStructure & ClickHouseJson<
    256,      // max_dynamic_paths: limit tracked paths (default: no limit)
    16,       // max_dynamic_types: limit type variations (default: no limit)
    ["skip.me"],           // skip_paths: exclude specific paths
    ["^tmp\\."]            // skip_regexps: exclude paths matching regex
  >;
}
```

#### Configuration Options

| Option | Type | Description |
|--------|------|-------------|
| `max_dynamic_paths` | `number` | Maximum number of unique JSON paths to track. Helps control memory usage for highly variable JSON structures. |
| `max_dynamic_types` | `number` | Maximum number of type variations allowed per path. Useful when paths may contain different types. |
| `skip_paths` | `string[]` | Array of exact JSON paths to ignore during ingestion (e.g., `["temp", "debug.info"]`). |
| `skip_regexps` | `string[]` | Array of regex patterns for paths to exclude (e.g., `["^tmp\\.", ".*_internal$"]`). |

#### Benefits of Typed JSON

1. **Better Performance**: ClickHouse can optimize storage and queries for known paths
2. **Type Safety**: Validates that specified paths match expected types
3. **Flexible Schema**: Allows additional fields beyond typed paths
4. **Memory Control**: Configure limits to prevent unbounded resource usage

- **Basic JSON** (`any`, `Dict[str, Any]`): Use when JSON structure is completely unknown or rarely queried
- **Rich JSON** (`ClickHouseJson`): Use when you have known fields that need indexing/querying, but want to allow additional dynamic fields

#### Example: Product Event Tracking

```typescript

interface ProductProperties {
  category: string;
  price: number;
  inStock: boolean;
}

interface ProductEvent {
  eventId: Key<string>;
  timestamp: DateTime;
  // Typed paths for common fields, but allows custom properties
  properties: ProductProperties & ClickHouseJson<
    128,                    // Track up to 128 unique paths
    8,                      // Allow up to 8 type variations per path
    ["_internal"],          // Ignore internal fields
    ["^debug_"]             // Ignore debug fields
  >;
}
```

With this schema, you can send events like:
```json
{
  "eventId": "evt_123",
  "timestamp": "2025-10-22T12:00:00Z",
  "properties": {
    "category": "electronics",     // Typed field ✓
    "price": 99.99,                // Typed field ✓
    "inStock": true,               // Typed field ✓
    "customTag": "holiday-sale",   // Extra field - accepted ✓
    "brandId": 42,                 // Extra field - accepted ✓
    "_internal": "ignored"         // Skipped by skip_paths ✓
  }
}
```

### Nullable Types

All types support nullable variants using optional types.

```typescript
interface User {
  name: string;           // Required
  email?: string;         // Nullable
  age?: number;           // Nullable
}
```

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

  - Optional without default (e.g., `field?: number`) → ClickHouse Nullable type.
  - Optional with default (e.g., `field?: number & ClickHouseDefault<"18">` or `WithDefault<number, "18">`) → non-nullable column with default `18`.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

### SimpleAggregateFunction

`SimpleAggregateFunction` is designed for use with `AggregatingMergeTree` tables. It stores pre-aggregated values that are automatically merged when ClickHouse combines rows with the same primary key.

```typescript

interface DailyStats {
  date: DateTime;
  userId: string;
  totalViews: number & SimpleAggregated<"sum", number>;
  maxScore: number & SimpleAggregated<"max", number>;
  lastSeen: DateTime & SimpleAggregated<"anyLast", DateTime>;
}

const statsTable = new OlapTable("daily_stats", {
  engine: ClickHouseEngines.AggregatingMergeTree,
  orderByFields: ["date", "userId"],
});
```

See [ClickHouse docs](https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction) for the complete list of functions.

## Table Engines

Moose supports all common ClickHouse table engines:

| Engine | Python | Description |
|--------|------------|-------------|
| `MergeTree` | `ClickHouseEngines.MergeTree` | Default engine |
| `ReplacingMergeTree` | `ClickHouseEngines.ReplacingMergeTree` | Deduplication |
| `SummingMergeTree` | `ClickHouseEngines.SummingMergeTree` | Aggregates numeric columns |
| `AggregatingMergeTree` | `ClickHouseEngines.AggregatingMergeTree` | Advanced aggregation |
| `ReplicatedMergeTree` | `ClickHouseEngines.ReplicatedMergeTree` | Replicated version of MergeTree |
| `ReplicatedReplacingMergeTree` | `ClickHouseEngines.ReplicatedReplacingMergeTree` | Replicated with deduplication |
| `ReplicatedSummingMergeTree` | `ClickHouseEngines.ReplicatedSummingMergeTree` | Replicated with aggregation |
| `ReplicatedAggregatingMergeTree` | `ClickHouseEngines.ReplicatedAggregatingMergeTree` | Replicated with advanced aggregation |

```typescript

const userTable = new OlapTable("users", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id", "updated_at"]
});
```

## Best Practices

### Type Selection

- **Use specific integer types** when you know the value ranges to save storage
- **Prefer `Float64`** for most floating-point calculations unless storage is critical
- **Use `LowCardinality`** for string columns with repeated values
- **Choose appropriate DateTime precision** based on your accuracy needs

### Performance Considerations

- **Order columns by cardinality** (low to high) for better compression
- **Use `ReplacingMergeTree`** for tables with frequent updates
- **Specify `orderByFields` or `orderByExpression`** for optimal query performance
- **Consider `LowCardinality`** for string columns with < 10,000 unique values

---

## ttl

Source: moose/olap/ttl.mdx

## TTL (Time-to-Live) for ClickHouse Tables

Moose lets you declare ClickHouse TTL directly in your data model:
- Table-level TTL via the `ttl` option on `OlapTable` config
- Column-level TTL via `ClickHouseTTL` on individual fields

### When to use TTL
- Automatically expire old rows to control storage cost
- Mask or drop sensitive columns earlier than the full row expiry

### TypeScript

```ts

interface Event {
  id: Key<string>;
  timestamp: DateTime;
  email: string & ClickHouseTTL<"timestamp + INTERVAL 30 DAY">; // column TTL
}

);
```

### Python

```python
from typing import Annotated
from moose_lib import OlapTable, OlapConfig, Key, ClickHouseTTL
from pydantic import BaseModel
from datetime import datetime

class Event(BaseModel):
    id: Key[str]
    timestamp: datetime
    email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY")]

events = OlapTable[Event](
    "Events",
    OlapConfig(
        order_by_fields=["id", "timestamp"],
        ttl="timestamp + INTERVAL 90 DAY DELETE",
    ),
)
```

### Notes
- Expressions must be valid ClickHouse TTL expressions, but do not include the leading `TTL` keyword.
- Column TTLs are independent from the table TTL and can be used together.
- Moose will apply TTL changes via migrations using `ALTER TABLE ... MODIFY TTL` and `MODIFY COLUMN ... TTL`.

### Related
- See `Modeling Tables` for defining your schema
- See `Applying Migrations` to roll out TTL changes