# Moose / Olap / Schema Optimization Documentation – TypeScript ## Included Files 1. moose/olap/schema-optimization/schema-optimization.mdx ## schema-optimization Source: moose/olap/schema-optimization/schema-optimization.mdx # Schema Optimization Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements. ## Data Types Keep the following best practices in mind when defining your column types: ### Avoid Nullable Columns Nullable columns in ClickHouse have significant performance overhead. Instead of using `?` or `| undefined`, add the `& ClickHouseDefault<"...">` to your column type . ```ts filename="AvoidNullable.ts" // ❌ Bad: Using nullable columns interface UserEvent { id: string; userId: string; eventType: string; description?: string; // Nullable createdAt: Date; } // ✅ Good: Use default values instead interface UserEvent { id: string; userId: string; eventType: string; description: string & ClickHouseDefault<"''"> // DEFAULT '' createdAt: Date; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Use `LowCardinality` where possible `LowCardinality` is ClickHouse's most efficient string type for columns with limited unique values. ```ts filename="LowCardinality.ts" interface UserEvent { id: string; userId: string; eventType: string & LowCardinality; // ✅ Good for limited values status: "active" | "inactive" | "pending"; // ✅ Literals become LowCardinality automatically country: string & LowCardinality; // ✅ Good for country codes userAgent: string; // ❌ Keep as String for high cardinality createdAt: Date; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Pick the right Integer types Choose the smallest integer type that fits your data range to save storage and improve performance. ```ts filename="IntegerTypes.ts" interface UserEvent { id: string; userId: string; age: UInt8; // ✅ 0-255 (1 byte) score: Int16; // ✅ -32,768 to 32,767 (2 bytes) viewCount: UInt32; // ✅ 0 to ~4 billion (4 bytes) timestamp: UInt64; // ✅ Unix timestamp (8 bytes) eventType: string; createdAt: Date; } // Integer type ranges: // UInt8: 0 to 255 // UInt16: 0 to 65,535 // UInt32: 0 to 4,294,967,295 // UInt64: 0 to 18,446,744,073,709,551,615 // Int8: -128 to 127 // Int16: -32,768 to 32,767 // Int32: -2,147,483,648 to 2,147,483,647 // Int64: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 ``` ### Use the right precision for `DateTime` Choose appropriate DateTime precision based on your use case to balance storage and precision. ```ts filename="DateTimePrecision.ts" interface UserEvent { id: string; userId: string; eventType: string; createdAt: DateTime; // ✅ Second precision (default) updatedAt: DateTime64<3>; // ✅ Millisecond precision processedAt: DateTime64<6>; // ✅ Microsecond precision loggedAt: DateTime64<9>; // ✅ Nanosecond precision } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ### Use Decimal over Float Use `Decimal` for financial and precise calculations to avoid floating-point precision issues. ```ts filename="DecimalOverFloat.ts" interface Order { id: string; userId: string; amount: Decimal<10, 2>; // ✅ 10 total digits, 2 decimal places tax: Decimal<8, 2>; // ✅ 8 total digits, 2 decimal places discount: Decimal<5, 2>; // ✅ 5 total digits, 2 decimal places total: Decimal<12, 2>; // ✅ 12 total digits, 2 decimal places createdAt: Date; } // ❌ Bad: Using float for financial data interface BadOrder { id: string; amount: number; // Float - can cause precision issues tax: number; // Float - can cause precision issues } const ordersTable = new OlapTable("orders", { orderByFields: ["id", "createdAt"] }); ``` ### Use `NamedTuple` over `Nested` `NamedTuple` is more efficient than `Nested` for structured data in ClickHouse. ```ts filename="NamedTupleOverNested.ts" interface UserEvent { id: string; userId: string; eventType: string; location: { latitude: number; longitude: number; city: string; country: string; } & ClickHouseNamedTuple; metadata: { version: string; source: string; priority: number; } & ClickHouseNamedTuple; createdAt: Date; } // ❌ Bad: Using Nested (less efficient) interface BadUserEvent { id: string; location: { latitude: number; longitude: number; }; } const userEventsTable = new OlapTable("user_events", { orderByFields: ["id", "createdAt"] }); ``` ## Ordering ### Choose columns that you will use in WHERE and GROUP BY clauses Optimize your `orderByFields` (or `orderByExpression`) for your most common query patterns. ```ts filename="OrderByOptimization.ts" interface UserEvent { id: string; userId: string; eventType: string; status: string; createdAt: Date; country: string; } // ✅ Good: Optimized for common query patterns const userEventsTable = new OlapTable("user_events", { orderByFields: ["userId", "createdAt", "eventType"] // Most common filters first }); // Common queries this optimizes for: // - WHERE userId = ? AND createdAt > ? // - WHERE userId = ? AND eventType = ? // - GROUP BY userId, eventType ``` ### `ORDER BY` should prioritize LowCardinality columns first Place `LowCardinality` columns first in your `orderByFields` (or reflect this priority in your `orderByExpression`) for better compression and query performance. ```ts filename="LowCardinalityOrdering.ts" interface UserEvent { id: string; userId: string; eventType: LowCardinality; // ✅ Low cardinality status: LowCardinality; // ✅ Low cardinality country: LowCardinality; // ✅ Low cardinality createdAt: Date; // High cardinality sessionId: string; // High cardinality } // ✅ Good: LowCardinality columns first const userEventsTable = new OlapTable("user_events", { orderByFields: ["eventType", "status", "country", "createdAt", "sessionId"] }); // ❌ Bad: High cardinality columns first const badUserEventsTable = new OlapTable("user_events", { orderByFields: ["createdAt", "sessionId", "eventType", "status"] // Less efficient }); ```