# Moose / Olap / Model Table Documentation – Python

## Included Files
1. moose/olap/model-table/model-table.mdx

## Modeling Tables

Source: moose/olap/model-table/model-table.mdx

Model your database schema in code using native TypeScript/Python typing

# Modeling Tables

## Overview

Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.

You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.

```py filename="FirstTable.py" copy
from pydantic import BaseModel
from moose_lib import Key, OlapTable
from pydantic import BaseModel

class MyFirstTable(BaseModel):
    id: Key[str]
    name: str
    age: int

# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")

# No export needed - Python modules are automatically discovered
```

## Basic Usage

### Standalone Tables

Create a table directly for custom data flows or when you need fine-grained control:

```py filename="StandaloneTable.py"
from moose_lib import Key, OlapTable
from pydantic import BaseModel

class ExampleSchema(BaseModel):
    id: Key[str]
    date_field: Date
    numeric_field: float
    boolean_field: bool

# Create a standalone table named "example_table"
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine

example_table = OlapTable[ExampleSchema]("example_table", OlapConfig(
  order_by_fields=["id", "date_field"],
  engine=ReplacingMergeTreeEngine()
))
```

### Creating Tables in Ingestion Pipelines

For end-to-end data flows, create tables as part of an ingestion pipeline:

```py filename="PipelineTable.py"
from moose_lib import IngestPipeline, Key, OlapTable
from pydantic import BaseModel

class UserEvent(BaseModel):
    id: Key[str]
    user_id: str
    timestamp: Date
    event_type: str

from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine

events_pipeline = IngestPipeline[UserEvent]("user_events", IngestPipelineConfig(
  ingest_api=True, # Creates a REST API endpoint at POST localhost:4000/ingest/user_events
  stream=True, # Creates a Kafka/Redpanda topic
  table=OlapConfig(      # Creates and configures the table named "user_events"
    order_by_fields=["id", "timestamp"],
    engine=ReplacingMergeTreeEngine()
  )
))

# Access the table component when needed:
events_table = events_pipeline.get_table()
```

## Data Modeling

### Special ClickHouse Types (LowCardinality, Nullable, etc)

```py filename="ClickHouseTypes.py" copy
from moose_lib import Key, clickhouse_decimal, ClickHouseNamedTuple
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime

class Customer(BaseModel):
    name: str
    address: str

class Order(BaseModel):
    order_id: Key[str]
    amount: clickhouse_decimal(10, 2)
    status: Literal["Paid", "Shipped", "Delivered"] # translated to LowCardinality(String) in ClickHouse
    created_at: datetime
    customer: Annotated[Customer, "ClickHouseNamedTuple"]
```

### Default values

Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.

```py filename="Defaults.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, clickhouse_default, clickhouse_decimal
from datetime import datetime

class Event(BaseModel):
    id: Key[str]
    # Static defaults
    status: Annotated[str, clickhouse_default("'pending'")]  # DEFAULT 'pending'
    retries: Annotated[int, clickhouse_default("0")]         # DEFAULT 0
    # Server-side timestamps
    created_at: Annotated[datetime, clickhouse_default("now()")]
    # Decimal with default
    amount: Annotated[float, clickhouse_decimal(10, 2)] = 0

events = OlapTable[Event]("events", {
    "orderByFields": ["id", "created_at"],
})
```
The value passed into the `clickhouse_default` function can either be a string literal or a stringified ClickHouse SQL expression.

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

  - Optional without default → ClickHouse Nullable type.
  - Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

### Database Selection

By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field:

```py filename="DatabaseOverride.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel

class UserData(BaseModel):
    id: Key[str]
    name: str
    email: str

# Table in default database (from moose.config.toml)
default_table = OlapTable[UserData]("users")

# Table in specific database (e.g., "analytics")
analytics_table = OlapTable[UserData]("users", OlapConfig(
    database="analytics",
    order_by_fields=["id"]
))
```

To use custom databases, configure them in your `moose.config.toml`:

```toml
[clickhouse_config]
db_name = "local"
additional_databases = ["analytics", "staging"]
```

The databases in `additional_databases` will be created automatically when you start your Moose application.

### Primary Keys and Sorting

You must configure table indexing using one of these approaches:
1. Define at least one `Key` in your table schema
2. Specify `orderByFields` in the table config
3. Use both (all `Key` fields must come first in the `orderByFields` array)

```py filename="PrimaryKeyConfig.py" copy
from moose_lib import Key, OlapTable
from pydantic import BaseModel

class Record1(BaseModel):
    id: Key[str] # Primary key field
    field1: str
    field2: int

table1 = OlapTable[Record1]("table1") # id is the primary key
```

### Order By Fields Only

Leverage the `OlapConfig` class to configure your table:

```py filename="OrderByFieldsOnly.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime

class SchemaWithoutPrimaryKey(BaseModel):
    field1: str
    field2: int
    field3: datetime

table2 = OlapTable[SchemaWithoutPrimaryKey]("table2", OlapConfig(
  order_by_fields=["field1", "field2"] # Specify ordering without primary key
))
```

### Order By Expression

Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.

```py filename="OrderByExpression.py" copy
from moose_lib import OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime

class Events(BaseModel):
    user_id: str
    created_at: datetime
    event_type: str

# Equivalent to order_by_fields=["user_id", "created_at", "event_type"]
events = OlapTable[Events]("events", OlapConfig(
    order_by_expression="(user_id, created_at, event_type)",
))

# Advanced: functions inside expression
events_by_month = OlapTable[Events]("events_by_month", OlapConfig(
    order_by_expression="(user_id, toYYYYMM(created_at))",
))

# No sorting
unsorted = OlapTable[Events]("events_unsorted", OlapConfig(
    order_by_expression="tuple()",
))
```

### Using Both Primary Key and Order By Fields

```py filename="ComboKeyAndOrderByFields.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel

class SchemaWithKey(BaseModel):
    id: Key[str]
    field1: str
    field2: int

table3 = OlapTable[SchemaWithKey]("table3", OlapConfig(
  order_by_fields=["id", "field1"] # Primary key must be first
))
```

### Using Multiple Primary Keys

```py filename="MultiKeyTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel

class MultiKeyRecord(BaseModel):
    key1: Key[str]
    key2: Key[int]
    field1: str

multi_key_table = OlapTable[MultiKeyRecord]("multi_key_table", OlapConfig(
  order_by_fields=["key1", "key2", "field1"] # Multiple keys must come first
))
```

### Table engines
By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration.

```py filename="TableEngine.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, ReplacingMergeTreeEngine

# Default MergeTree engine
table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"]
))

# Explicitly specify engine
dedup_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine()
))
```

#### Deduplication (`ReplacingMergeTree`)
Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key:

```py filename="DeduplicatedTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine

class Record(BaseModel):
    id: Key[str]
    updated_at: str  # Version column
    deleted: int = 0  # Soft delete marker (UInt8)

# Basic deduplication
table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine()
))

# With version column (keeps record with highest version)
versioned_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine(ver="updated_at")
))

# With soft deletes (requires ver parameter)
soft_delete_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine(
    ver="updated_at",
    is_deleted="deleted"  # UInt8 column: 1 marks row for deletion
  )
))
```

ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.

**Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.

**Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.

For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree).

#### Streaming from S3 (`S3Queue`)
Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added:

S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details.

```py filename="S3StreamingTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import S3QueueEngine

class S3Event(BaseModel):
    id: str
    timestamp: datetime
    data: dict

# Modern API using engine configuration
s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
    engine=S3QueueEngine(
        s3_path="s3://my-bucket/data/*.json",
        format="JSONEachRow",
        # ⚠️ WARNING: See security callout below about credentials
        aws_access_key_id="AKIA...",
        aws_secret_access_key="secret..."
    ),
    settings={
        "mode": "unordered",
        "keeper_path": "/clickhouse/s3queue/events"
    }
))
```

**Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.

**Solution**: Use `mooseRuntimeEnv` for runtime credential resolution:

```py filename="SecureS3Streaming.py" copy
from moose_lib import OlapTable, OlapConfig, moose_runtime_env
from moose_lib.blocks import S3QueueEngine

# ✅ RECOMMENDED: Runtime environment variable resolution
secure_s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
    engine=S3QueueEngine(
        s3_path="s3://my-bucket/data/*.json",
        format="JSONEachRow",
        aws_access_key_id=moose_runtime_env.get("AWS_ACCESS_KEY_ID"),
        aws_secret_access_key=moose_runtime_env.get("AWS_SECRET_ACCESS_KEY")
    ),
    settings={
        "mode": "unordered",
        "keeper_path": "/clickhouse/s3queue/events"
    }
))
```

**Then set environment variables:**
```bash filename="Terminal" copy
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod up
```

**Benefits:**
- Credentials never embedded in Docker images
- Supports credential rotation (changing passwords triggers table recreation)
- Different credentials per environment (dev/staging/prod)
- Clear error messages if environment variables are missing

S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.

#### Direct S3 Access (`S3`)
Use the `S3` engine for direct read/write access to S3 storage without streaming semantics:

```py filename="S3Table.py" copy
from moose_lib import OlapTable, OlapConfig, moose_runtime_env
from moose_lib.blocks import S3Engine

# S3 table with credentials (recommended with moose_runtime_env)
s3_data = OlapTable[DataRecord]("s3_data", OlapConfig(
    engine=S3Engine(
        path="s3://my-bucket/data/file.json",
        format="JSONEachRow",
        aws_access_key_id=moose_runtime_env.get("AWS_ACCESS_KEY_ID"),
        aws_secret_access_key=moose_runtime_env.get("AWS_SECRET_ACCESS_KEY"),
        compression="gzip"
    )
))

# Public S3 bucket (no authentication needed - just omit credentials)
public_s3 = OlapTable[DataRecord]("public_s3", OlapConfig(
    engine=S3Engine(
        path="s3://public-bucket/data/*.parquet",
        format="Parquet"
    )
))
```

- **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data.
- **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion.

Both engines support the same credential management and format options.

#### In-Memory Buffer (`Buffer`)
The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:

```py filename="BufferTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, BufferEngine

# First create the destination table
destination_table = OlapTable[Record]("destination", OlapConfig(
    engine=MergeTreeEngine(),
    order_by_fields=["id", "timestamp"]
))

# Then create buffer that writes to it
buffer_table = OlapTable[Record]("buffer", OlapConfig(
    engine=BufferEngine(
        target_database="local",
        target_table="destination",
        num_layers=16,
        min_time=10,        # Min 10 seconds before flush
        max_time=100,       # Max 100 seconds before flush
        min_rows=10000,     # Min 10k rows before flush
        max_rows=1000000,   # Max 1M rows before flush
        min_bytes=10485760, # Min 10MB before flush
        max_bytes=104857600 # Max 100MB before flush
    )
))
```

- Data in buffer is **lost if server crashes** before flush
- Not suitable for critical data that must be durable
- Best for high-throughput scenarios where minor data loss is acceptable
- Buffer and destination table must have identical schemas
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables

For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer).

#### Distributed Tables (`Distributed`)
The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling:

```py filename="DistributedTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import DistributedEngine

# Distributed table across cluster
distributed_table = OlapTable[Record]("distributed_data", OlapConfig(
    engine=DistributedEngine(
        cluster="my_cluster",
        target_database="default",
        target_table="local_table",
        sharding_key="cityHash64(id)"  # Optional: how to distribute data
    )
))
```

- Requires a configured ClickHouse cluster with remote_servers configuration
- The local table must exist on all cluster nodes
- Distributed tables are virtual - data is stored in local tables
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables
- The `cluster` name must match a cluster defined in your ClickHouse configuration

For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).

#### Replicated Engines
Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:

- `ReplicatedMergeTree` - Replicated version of MergeTree
- `ReplicatedReplacingMergeTree` - Replicated with deduplication
- `ReplicatedAggregatingMergeTree` - Replicated with aggregation
- `ReplicatedSummingMergeTree` - Replicated with summation

```py filename="ReplicatedEngines.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import (
    ReplicatedMergeTreeEngine,
    ReplicatedReplacingMergeTreeEngine,
    ReplicatedAggregatingMergeTreeEngine,
    ReplicatedSummingMergeTreeEngine
)

class Record(BaseModel):
    id: Key[str]
    updated_at: datetime
    deleted: int = 0

# Basic replicated table with explicit paths
replicated_table = OlapTable[Record]("records", OlapConfig(
    order_by_fields=["id"],
    engine=ReplicatedMergeTreeEngine(
        keeper_path="/clickhouse/tables/{database}/{shard}/records",
        replica_name="{replica}"
    )
))

# Replicated with deduplication
replicated_dedup = OlapTable[Record]("dedup_records", OlapConfig(
    order_by_fields=["id"],
    engine=ReplicatedReplacingMergeTreeEngine(
        keeper_path="/clickhouse/tables/{database}/{shard}/dedup_records",
        replica_name="{replica}",
        ver="updated_at",
        is_deleted="deleted"
    )
))

# For ClickHouse Cloud or Boreal (no parameters needed)
cloud_replicated = OlapTable[Record]("cloud_records", OlapConfig(
    order_by_fields=["id"],
    engine=ReplicatedMergeTreeEngine()
))
```

The `keeper_path` and `replica_name` parameters are **optional** for replicated engines:

- **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse).

- **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster.

**Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime.

For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication).

### Irregular column names and Python Aliases

If a ClickHouse column name isn't a valid Python identifier or starts with an underscore,
you can use a safe Python field name and set a Pydantic alias to the real column name.
MooseOLAP then uses the alias for ClickHouse DDL and data mapping,
so your model remains valid while preserving the true column name.

```python
from pydantic import BaseModel, Field

class CHUser(BaseModel):
    # ClickHouse: "_id" → safe Python attribute with alias
    UNDERSCORE_PREFIXED_id: str = Field(alias="_id")

    # ClickHouse: "user name" → replace spaces, keep alias
    user_name: str = Field(alias="user name")
```

## Externally Managed Tables
If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`.

```py filename="ExternallyManagedTable.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle

# Table managed by external system
external_table = OlapTable[UserData]("external_users", OlapConfig(
    order_by_fields=["id", "timestamp"],
    life_cycle=LifeCycle.EXTERNALLY_MANAGED  # Moose won't create or modify this table in prod mode
))
```

Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation.

## Invalid Configurations

```py filename="InvalidConfig.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from typing import Optional

class BadRecord1(BaseModel):
    field1: str
    field2: int

bad_table1 = OlapTable[BadRecord1]("bad_table1") ## No primary key or orderByFields

class BadRecord2(BaseModel):
    id: Key[str]
    field1: str

bad_table2 = OlapTable[BadRecord2]("bad_table2", OlapConfig(
  order_by_fields=["field1", "id"] # Wrong order - primary key must be first
))

class BadRecord3(BaseModel):
    id: Key[str]
    field1: str
    field2: Optional[int]

bad_table3 = OlapTable[BadRecord3]("bad_table3", OlapConfig(
  order_by_fields=["id", "field2"] # Can't have nullable field in orderByFields
))
```

## Development Workflow

### Local Development with Hot Reloading

One of the powerful features of Moose is its integration with the local development server:

1. Start your local development server with `moose dev`
2. When you define or modify an `OlapTable` in your code and save the file:
   - The changes are automatically detected
   - The TypeScript compiler plugin processes your schema definitions
   - The infrastructure is updated in real-time to match your code changes
   - Your tables are immediately available for testing

For example, if you add a new field to your schema:

```py filename="HotReloading.py" copy
# Before
class BasicSchema(BaseModel):
    id: Key[str]
    name: str

# After adding a field
class BasicSchema(BaseModel):
    id: Key[str]
    name: str
    created_at: datetime
```

The Moose framework will:
1. Detect the change when you save the file
2. Update the table schema in the local ClickHouse instance
3. Make the new field immediately available for use

### Verifying Your Tables

You can verify your tables were created correctly using:
```bash filename="Terminal" copy
# List all tables in your local environment
moose ls
```

#### Connecting to your local ClickHouse instance
You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:

```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```