Viewing:
Compression Codecs
Moose lets you specify ClickHouse compression codecs per-column to optimize storage and query performance. Different codecs work better for different data types, and you can chain multiple codecs together.
When to use compression codecs
- Time series data: Use
DeltaorDoubleDeltafor timestamps and monotonically increasing values - Floating point metrics: Use
Gorillacodec for sensor data, temperatures, and other float values - Text and JSON: Use
ZSTDwith compression levels (1-22) for large strings and JSON - High cardinality data: Combine specialized codecs with general-purpose compression (e.g.,
Delta, LZ4)
Basic Usage
import { OlapTable, Key, DateTime, Codec, UInt64 } from "@514labs/moose-lib";
interface Metrics {
id: Key<string>;
// Delta codec for timestamps (monotonically increasing)
timestamp: DateTime & ClickHouseCodec<"Delta, LZ4">;
// Gorilla codec for floating point sensor data
temperature: number & ClickHouseCodec<"Gorilla, ZSTD(3)">;
// DoubleDelta for counters and metrics
request_count: number & ClickHouseCodec<"DoubleDelta, LZ4">;
// ZSTD for text/JSON with compression level
log_data: Record<string, any> & ClickHouseCodec<"ZSTD(3)">;
user_agent: string & ClickHouseCodec<"ZSTD(3)">;
// Compress array elements
tags: string[] & ClickHouseCodec<"LZ4">;
event_ids: UInt64[] & ClickHouseCodec<"ZSTD(1)">;
}
export const MetricsTable = new OlapTable<Metrics>("Metrics", {
orderByFields: ["id", "timestamp"]
});from typing import Annotated, Any
from moose_lib import OlapTable, OlapConfig, Key, ClickHouseCodec, UInt64
from pydantic import BaseModel
from datetime import datetime
class Metrics(BaseModel):
id: Key[str]
# Delta codec for timestamps (monotonically increasing)
timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")]
# Gorilla codec for floating point sensor data
temperature: Annotated[float, ClickHouseCodec("Gorilla, ZSTD(3)")]
# DoubleDelta for counters and metrics
request_count: Annotated[float, ClickHouseCodec("DoubleDelta, LZ4")]
# ZSTD for text/JSON with compression level
log_data: Annotated[Any, ClickHouseCodec("ZSTD(3)")]
user_agent: Annotated[str, ClickHouseCodec("ZSTD(3)")]
# Compress array elements
tags: Annotated[list[str], ClickHouseCodec("LZ4")]
event_ids: Annotated[list[UInt64], ClickHouseCodec("ZSTD(1)")]
metrics_table = OlapTable[Metrics](
"Metrics",
OlapConfig(order_by_fields=["id", "timestamp"])
)Codec Chains
You can chain multiple codecs together. Data is processed by each codec in sequence (left-to-right).
interface Events {
// Delta compress timestamps, then apply LZ4
timestamp: DateTime & ClickHouseCodec<"Delta, LZ4">;
// Gorilla for floats, then ZSTD for extra compression
value: number & ClickHouseCodec<"Gorilla, ZSTD(3)">;
}class Events(BaseModel):
# Delta compress timestamps, then apply LZ4
timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")]
# Gorilla for floats, then ZSTD for extra compression
value: Annotated[float, ClickHouseCodec("Gorilla, ZSTD(3)")]Combining with Other Annotations
Codecs work alongside other ClickHouse annotations:
import { ClickHouseDefault, ClickHouseTTL } from "@514labs/moose-lib";
interface UserEvents {
id: Key<string>;
timestamp: DateTime & ClickHouseCodec<"Delta, LZ4">;
// Codec + Default value
status: string & ClickHouseDefault<"'pending'"> & ClickHouseCodec<"ZSTD(3)">;
// Codec + TTL
email: string & ClickHouseTTL<"timestamp + INTERVAL 30 DAY"> & ClickHouseCodec<"ZSTD(3)">;
// Codec + Numeric type
event_count: UInt64 & ClickHouseCodec<"DoubleDelta, LZ4">;
}from moose_lib import clickhouse_default, ClickHouseTTL, ClickHouseCodec
class UserEvents(BaseModel):
id: Key[str]
timestamp: Annotated[datetime, ClickHouseCodec("Delta, LZ4")]
# Codec + Default value
status: Annotated[str, clickhouse_default("'pending'"), ClickHouseCodec("ZSTD(3)")]
# Codec + TTL
email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY"), ClickHouseCodec("ZSTD(3)")]
# Codec + Numeric type
event_count: Annotated[UInt64, ClickHouseCodec("DoubleDelta, LZ4")]Syncing from Remote Tables
When using moose init --from-remote to introspect existing ClickHouse tables, Moose automatically captures codec definitions and generates the appropriate annotations in your data models.
Notes
- Codec expressions must be valid ClickHouse codec syntax (without the
CODEC()wrapper) - ClickHouse may normalize codecs by adding default parameters (e.g.,
DeltabecomesDelta(4)) - Moose applies codec changes via migrations using
ALTER TABLE ... MODIFY COLUMN - Not all codecs work with all data types - ClickHouse will validate during table creation
Related
- See Supported Types for all available column types
- See Schema Optimization for other performance techniques
- See Applying Migrations to roll out codec changes
- See ClickHouse Compression Codecs for detailed codec documentation