# Moose / Olap / Supported Types Documentation – Python ## Included Files 1. moose/olap/supported-types/supported-types.mdx ## Supported Column Types Source: moose/olap/supported-types/supported-types.mdx Complete guide to defining columns for ClickHouse tables in Moose # Supported Column Types Moose supports a comprehensive set of ClickHouse column types across both TypeScript and Python libraries. This guide covers all supported types, their syntax, and best practices for defining table schemas. ## Basic Types ### String Types ```python from typing import Literal from uuid import UUID class User(BaseModel): string: str # String low_cardinality: Annotated[str, "LowCardinality"] # LowCardinality(String) uuid: UUID # UUID ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `String` | `str` | Variable-length string | | `LowCardinality(String)` | `str` with `Literal[str]` | Optimized for repeated values | | `UUID` | `UUID` | UUID format strings | ### Numeric Types ### Integer Types ```python from typing import Annotated class Metrics(BaseModel): user_id: Annotated[int, "int32"] # Int32 count: Annotated[int, "int64"] # Int64 small_value: Annotated[int, "uint8"] # UInt8 ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `Int8` | `Annotated[int, "int8"]` | -128 to 127 | | `Int16` | `Annotated[int, "int16"]` | -32,768 to 32,767 | | `Int32` | `Annotated[int, "int32"]` | -2,147,483,648 to 2,147,483,647 | | `Int64` | `Annotated[int, "int64"]` | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | | `UInt8` | `Annotated[int, "uint8"]` | 0 to 255 | | `UInt16` | `Annotated[int, "uint16"]` | 0 to 65,535 | | `UInt32` | `Annotated[int, "uint32"]` | 0 to 4,294,967,295 | | `UInt64` | `Annotated[int, "uint64"]` | 0 to 18,446,744,073,709,551,615 | ### Floating Point Types ```python from moose_lib import ClickhouseSize class SensorData(BaseModel): temperature: float # Float64 humidity: Annotated[float, ClickhouseSize(4)] # Float32 ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `Float64` | `float` | floating point number | ### Decimal Types ```python from moose_lib import clickhouse_decimal class FinancialData(BaseModel): amount: clickhouse_decimal(10, 2) # Decimal(10,2) rate: clickhouse_decimal(5, 4) # Decimal(5,4) ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `Decimal(P,S)` | `clickhouse_decimal(P,S)` | Fixed-point decimal | ### Boolean Type ```python class User(BaseModel): is_active: bool verified: bool ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `Boolean` | `bool` | `bool` | ### Date and Time Types ```python from datetime import date, datetime from moose_lib import ClickhouseSize, clickhouse_datetime64 class Event(BaseModel): created_at: datetime # DateTime updated_at: clickhouse_datetime64(3) # DateTime(3) birth_date: date # Date compact_date: Annotated[date, ClickhouseSize(2)] # Date16 ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `Date` | `date` | Date only | | `Date16` | `date` | `Annotated[date, ClickhouseSize(2)]` | Compact date format | | `DateTime` | `datetime` | Date and time | ### Network Types ```python from ipaddress import IPv4Address, IPv6Address class NetworkEvent(BaseModel): source_ip: IPv4Address dest_ip: IPv6Address ``` | ClickHouse Type | Python | Description | |------|------------|--------| | `IPv4` | `ipaddress.IPv4Address` | IPv4 addresses | | `IPv6` | `ipaddress.IPv6Address` | IPv6 addresses | ## Complex Types ### Geometry Types Moose supports ClickHouse geometry types. Use the helpers in each language to get type-safe models and correct ClickHouse mappings. ```python from moose_lib import Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon class GeoTypes(BaseModel): point: Point # tuple[float, float] ring: Ring # list[tuple[float, float]] line_string: LineString # list[tuple[float, float]] multi_line_string: MultiLineString # list[list[tuple[float, float]]] polygon: Polygon # list[list[tuple[float, float]]] multi_polygon: MultiPolygon # list[list[list[tuple[float, float]]]] ``` | ClickHouse Type | Python | |------|------------| | `Point` | `Point` (tuple[float, float]) | | `Ring` | `Ring` (list[tuple[float, float]]) | | `LineString` | `LineString` (list[tuple[float, float]]) | | `MultiLineString` | `MultiLineString` (list[list[tuple[float, float]]]) | | `Polygon` | `Polygon` (list[list[tuple[float, float]]]) | | `MultiPolygon` | `MultiPolygon` (list[list[list[tuple[float, float]]]]) | Geometry coordinates are represented as numeric pairs `[x, y]` (TypeScript) or `tuple[float, float]` (Python). ### Array Types Arrays are supported for all basic types and some complex types. ```python from typing import List, Dict, Any class User(BaseModel): tags: List[str] # Array(String) scores: List[float] # Array(Float64) metadata: List[Dict[str, Any]] # Array(Json) tuple: List[Tuple[str, int]] # Array(Tuple(String, Int32)) ``` ### Map Types Maps store key-value pairs with specified key and value types. ```python from typing import Dict class User(BaseModel): preferences: Dict[str, str] # Map(String, String) metrics: Dict[str, float] # Map(String, Float64) ``` ### Nested Types Nested types allow embedding complex objects within tables. ```python class Address(BaseModel): street: str city: str zip: str class User(BaseModel): name: str address: Address # Nested type ``` ### Named Tuple Types Named tuples provide structured data with named fields. ```python from typing import Annotated class Point(BaseModel): x: float y: float class Shape(BaseModel): center: Annotated[Point, "ClickHouseNamedTuple"] # Named tuple radius: float ``` ### Enum Types Enums map to ClickHouse enums with string or integer values. ```python from enum import Enum class UserRole(str, Enum): ADMIN = "admin" USER = "user" GUEST = "guest" class User(BaseModel): role: UserRole # Enum with string values ``` ## Special Types ### JSON Type The `Json` type stores arbitrary JSON data with optional schema configuration for performance and type safety. #### Basic JSON (Unstructured) For completely dynamic JSON data without any schema: ```python from typing import Any, Dict class Event(BaseModel): metadata: Dict[str, Any] # Basic JSON - accepts any structure config: Any # Basic JSON - fully dynamic ``` #### Rich JSON with Type Configuration For better performance and validation, you can define typed fields within your JSON using `ClickHouseJson`. This creates a ClickHouse `JSON` column with explicit type hints for specific paths. ```python from typing import Annotated from pydantic import BaseModel, ConfigDict from moose_lib.data_models import ClickHouseJson # Define the structure for your JSON payload class PayloadStructure(BaseModel): model_config = ConfigDict(extra='allow') # Required for JSON types name: str count: int timestamp: Optional[datetime] = None class Event(BaseModel): id: str # JSON with typed paths - better performance, allows extra fields payload: Annotated[PayloadStructure, ClickHouseJson()] # JSON with performance tuning options metadata: Annotated[PayloadStructure, ClickHouseJson( max_dynamic_paths=256, # Limit tracked paths max_dynamic_types=16, # Limit type variations skip_paths=("skip.me",), # Exclude specific paths skip_regexes=(r"^tmp\\.",) # Exclude paths matching regex )] ``` #### Configuration Options | Option | Type | Description | |--------|------|-------------| | `max_dynamic_paths` | `number` | Maximum number of unique JSON paths to track. Helps control memory usage for highly variable JSON structures. | | `max_dynamic_types` | `number` | Maximum number of type variations allowed per path. Useful when paths may contain different types. | | `skip_paths` | `string[]` | Array of exact JSON paths to ignore during ingestion (e.g., `["temp", "debug.info"]`). | | `skip_regexps` | `string[]` | Array of regex patterns for paths to exclude (e.g., `["^tmp\\.", ".*_internal$"]`). | #### Benefits of Typed JSON 1. **Better Performance**: ClickHouse can optimize storage and queries for known paths 2. **Type Safety**: Validates that specified paths match expected types 3. **Flexible Schema**: Allows additional fields beyond typed paths 4. **Memory Control**: Configure limits to prevent unbounded resource usage - **Basic JSON** (`any`, `Dict[str, Any]`): Use when JSON structure is completely unknown or rarely queried - **Rich JSON** (`ClickHouseJson`): Use when you have known fields that need indexing/querying, but want to allow additional dynamic fields #### Example: Product Event Tracking ```python from typing import Annotated, Optional from pydantic import BaseModel, ConfigDict from moose_lib import Key, ClickHouseJson from datetime import datetime class ProductProperties(BaseModel): model_config = ConfigDict(extra='allow') category: str price: float in_stock: bool class ProductEvent(BaseModel): event_id: Key[str] timestamp: datetime # Typed paths for common fields, but allows custom properties properties: Annotated[ProductProperties, ClickHouseJson( max_dynamic_paths=128, # Track up to 128 unique paths max_dynamic_types=8, # Allow up to 8 type variations per path skip_paths=("_internal",), # Ignore internal fields skip_regexes=(r"^debug_",) # Ignore debug fields )] ``` With this schema, you can send events like: ```python { "event_id": "evt_123", "timestamp": "2025-10-22T12:00:00Z", "properties": { "category": "electronics", # Typed field ✓ "price": 99.99, # Typed field ✓ "in_stock": True, # Typed field ✓ "custom_tag": "holiday-sale", # Extra field - accepted ✓ "brand_id": 42, # Extra field - accepted ✓ "_internal": "ignored" # Skipped by skip_paths ✓ } } ``` ### Nullable Types All types support nullable variants using optional types. ```python from typing import Optional class User(BaseModel): name: str # Required email: Optional[str] = None # Nullable age: Optional[int] = None # Nullable ``` If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause. - Optional without default → ClickHouse Nullable type. - Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`. This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists. ### SimpleAggregateFunction `SimpleAggregateFunction` is designed for use with `AggregatingMergeTree` tables. It stores pre-aggregated values that are automatically merged when ClickHouse combines rows with the same primary key. ```python from moose_lib import simple_aggregated, Key, OlapTable, OlapConfig, AggregatingMergeTreeEngine from pydantic import BaseModel from datetime import datetime class DailyStats(BaseModel): date: datetime user_id: str total_views: simple_aggregated('sum', int) max_score: simple_aggregated('max', float) last_seen: simple_aggregated('anyLast', datetime) stats_table = OlapTable[DailyStats]( "daily_stats", OlapConfig( engine=AggregatingMergeTreeEngine(), order_by_fields=["date", "user_id"] ) ) ``` See [ClickHouse docs](https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction) for the complete list of functions. ## Table Engines Moose supports all common ClickHouse table engines: | Engine | Python | Description | |--------|------------|-------------| | `MergeTree` | `ClickHouseEngines.MergeTree` | Default engine | | `ReplacingMergeTree` | `ClickHouseEngines.ReplacingMergeTree` | Deduplication | | `SummingMergeTree` | `ClickHouseEngines.SummingMergeTree` | Aggregates numeric columns | | `AggregatingMergeTree` | `ClickHouseEngines.AggregatingMergeTree` | Advanced aggregation | | `ReplicatedMergeTree` | `ClickHouseEngines.ReplicatedMergeTree` | Replicated version of MergeTree | | `ReplicatedReplacingMergeTree` | `ClickHouseEngines.ReplicatedReplacingMergeTree` | Replicated with deduplication | | `ReplicatedSummingMergeTree` | `ClickHouseEngines.ReplicatedSummingMergeTree` | Replicated with aggregation | | `ReplicatedAggregatingMergeTree` | `ClickHouseEngines.ReplicatedAggregatingMergeTree` | Replicated with advanced aggregation | ```python from moose_lib import ClickHouseEngines user_table = OlapTable("users", { "engine": ClickHouseEngines.ReplacingMergeTree, "orderByFields": ["id", "updated_at"] }) ``` ## Best Practices ### Type Selection - **Use specific integer types** when you know the value ranges to save storage - **Prefer `Float64`** for most floating-point calculations unless storage is critical - **Use `LowCardinality`** for string columns with repeated values - **Choose appropriate DateTime precision** based on your accuracy needs ### Performance Considerations - **Order columns by cardinality** (low to high) for better compression - **Use `ReplacingMergeTree`** for tables with frequent updates - **Specify `orderByFields` or `orderByExpression`** for optimal query performance - **Consider `LowCardinality`** for string columns with < 10,000 unique values