# Moose / Olap / Compression Documentation – Python ## Included Files 1. moose/olap/compression/compression.mdx ## compression Source: moose/olap/compression/compression.mdx # Compression Codecs Moose lets you specify ClickHouse compression codecs per-column to optimize storage and query performance. Different codecs work better for different data types, and you can chain multiple codecs together. ## When to use compression codecs - **Time series data**: Use `Delta` or `DoubleDelta` for timestamps and monotonically increasing values - **Floating point metrics**: Use `Gorilla` codec for sensor data, temperatures, and other float values - **Text and JSON**: Use `ZSTD` with compression levels (1-22) for large strings and JSON - **High cardinality data**: Combine specialized codecs with general-purpose compression (e.g., `Delta, LZ4`) ## Basic Usage ```python from typing import Annotated, Any from moose_lib import OlapTable, OlapConfig, Key, ClickHouseCodec, UInt64 from pydantic import BaseModel from datetime import datetime class Metrics(BaseModel): id: Key[str] # Delta codec for timestamps (monotonically increasing) timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")] # Gorilla codec for floating point sensor data temperature: Annotated[float, ClickHouseCodec("Gorilla, ZSTD(3)")] # DoubleDelta for counters and metrics request_count: Annotated[float, ClickHouseCodec("DoubleDelta, LZ4")] # ZSTD for text/JSON with compression level log_data: Annotated[Any, ClickHouseCodec("ZSTD(3)")] user_agent: Annotated[str, ClickHouseCodec("ZSTD(3)")] # Compress array elements tags: Annotated[list[str], ClickHouseCodec("LZ4")] event_ids: Annotated[list[UInt64], ClickHouseCodec("ZSTD(1)")] metrics_table = OlapTable[Metrics]( "Metrics", OlapConfig(order_by_fields=["id", "timestamp"]) ) ``` ## Codec Chains You can chain multiple codecs together. Data is processed by each codec in sequence (left-to-right). ```python class Events(BaseModel): # Delta compress timestamps, then apply LZ4 timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")] # Gorilla for floats, then ZSTD for extra compression value: Annotated[float, ClickHouseCodec("Gorilla, ZSTD(3)")] ``` ## Combining with Other Annotations Codecs work alongside other ClickHouse annotations: ```python from moose_lib import clickhouse_default, ClickHouseTTL, ClickHouseCodec class UserEvents(BaseModel): id: Key[str] timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")] # Codec + Default value status: Annotated[str, clickhouse_default("'pending'"), ClickHouseCodec("ZSTD(3)")] # Codec + TTL email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY"), ClickHouseCodec("ZSTD(3)")] # Codec + Numeric type event_count: Annotated[UInt64, ClickHouseCodec("DoubleDelta, LZ4")] ``` ## Syncing from Remote Tables When using `moose init --from-remote` to introspect existing ClickHouse tables, Moose automatically captures codec definitions and generates the appropriate annotations in your data models. ## Notes - Codec expressions must be valid ClickHouse codec syntax (without the `CODEC()` wrapper) - ClickHouse may normalize codecs by adding default parameters (e.g., `Delta` becomes `Delta(4)`) - Moose applies codec changes via migrations using `ALTER TABLE ... MODIFY COLUMN` - Not all codecs work with all data types - ClickHouse will validate during table creation ## Related - See [Supported Types](/moose/olap/supported-types) for all available column types - See [Schema Optimization](/moose/olap/schema-optimization) for other performance techniques - See [Applying Migrations](/moose/olap/apply-migrations) to roll out codec changes - See [ClickHouse Compression Codecs](https://clickhouse.com/docs/en/sql-reference/statements/create/table#column_compression_codec) for detailed codec documentation