MooseStack

Moose OLAP

Schema Optimization

Schema Optimization

Viewing:

Schema Optimization Matters

Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements.

Data Types

Keep the following best practices in mind when defining your column types:

Avoid Nullable Columns

Nullable columns in ClickHouse have significant performance overhead.

Use ClickHouse defaults instead of nulls

Instead of using ? or | undefined, add the & ClickHouseDefault<"..."> to your column type .

AvoidNullable.ts
import { OlapTable } from "@514labs/moose-lib";
 
// ❌ Bad: Using nullable columns
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  description?: string;  // Nullable
  createdAt: Date;
}
 
// ✅ Good: Use default values instead
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  description: string & ClickHouseDefault<"''">  // DEFAULT ''
  createdAt: Date;
}
 
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["id", "createdAt"]
});

Use LowCardinality where possible

LowCardinality is ClickHouse’s most efficient string type for columns with limited unique values.

LowCardinality.ts
import { OlapTable, LowCardinality } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  eventType: string & LowCardinality;  // ✅ Good for limited values
  status: "active" | "inactive" | "pending"; // ✅ Literals become LowCardinality automatically
  country: string & LowCardinality;    // ✅ Good for country codes
  userAgent: string;                  // ❌ Keep as String for high cardinality
  createdAt: Date;
}
 
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["id", "createdAt"]
});

Pick the right Integer types

Choose the smallest integer type that fits your data range to save storage and improve performance.

IntegerTypes.ts
import { OlapTable, UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64 } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  age: UInt8;                             // ✅ 0-255 (1 byte)
  score: Int16;                           // ✅ -32,768 to 32,767 (2 bytes)
  viewCount: UInt32;                      // ✅ 0 to ~4 billion (4 bytes)
  timestamp: UInt64;                      // ✅ Unix timestamp (8 bytes)
  eventType: string;
  createdAt: Date;
}
 
// Integer type ranges:
// UInt8:  0 to 255
// UInt16: 0 to 65,535
// UInt32: 0 to 4,294,967,295
// UInt64: 0 to 18,446,744,073,709,551,615
// Int8:   -128 to 127
// Int16:  -32,768 to 32,767
// Int32:  -2,147,483,648 to 2,147,483,647
// Int64:  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Use the right precision for DateTime

Choose appropriate DateTime precision based on your use case to balance storage and precision.

DateTimePrecision.ts
import { OlapTable, DateTime, DateTime64 } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  createdAt: DateTime;             // ✅ Second precision (default)
  updatedAt: DateTime64<3>;        // ✅ Millisecond precision
  processedAt: DateTime64<6>;      // ✅ Microsecond precision
  loggedAt: DateTime64<9>;         // ✅ Nanosecond precision
}
 
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["id", "createdAt"]
});

Use Decimal over Float

Use Decimal for financial and precise calculations to avoid floating-point precision issues.

DecimalOverFloat.ts
import { OlapTable, Decimal } from "@514labs/moose-lib";
 
interface Order {
  id: string;
  userId: string;
  amount: Decimal<10, 2>;         // ✅ 10 total digits, 2 decimal places
  tax: Decimal<8, 2>;             // ✅ 8 total digits, 2 decimal places
  discount: Decimal<5, 2>;        // ✅ 5 total digits, 2 decimal places
  total: Decimal<12, 2>;          // ✅ 12 total digits, 2 decimal places
  createdAt: Date;
}
 
// ❌ Bad: Using float for financial data
interface BadOrder {
  id: string;
  amount: number;  // Float - can cause precision issues
  tax: number;     // Float - can cause precision issues
}
 
const ordersTable = new OlapTable<Order>("orders", {
  orderByFields: ["id", "createdAt"]
});

Use NamedTuple over Nested

NamedTuple is more efficient than Nested for structured data in ClickHouse.

NamedTupleOverNested.ts
import { OlapTable, ClickHouseNamedTuple } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  location: {
    latitude: number;
    longitude: number;
    city: string;
    country: string;
  } & ClickHouseNamedTuple;
  metadata: {
    version: string;
    source: string;
    priority: number;
  } & ClickHouseNamedTuple;
  createdAt: Date;
}
 
// ❌ Bad: Using Nested (less efficient)
interface BadUserEvent {
  id: string;
  location: {
    latitude: number;
    longitude: number;
  };
}
 
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["id", "createdAt"]
});

Ordering

Choose columns that you will use in WHERE and GROUP BY clauses

Optimize your orderByFields (or orderByExpression) for your most common query patterns.

OrderByOptimization.ts
import { OlapTable } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  eventType: string;
  status: string;
  createdAt: Date;
  country: string;
}
 
// ✅ Good: Optimized for common query patterns
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["userId", "createdAt", "eventType"]  // Most common filters first
});
 
// Common queries this optimizes for:
// - WHERE userId = ? AND createdAt > ?
// - WHERE userId = ? AND eventType = ?
// - GROUP BY userId, eventType

ORDER BY should prioritize LowCardinality columns first

Place LowCardinality columns first in your orderByFields (or reflect this priority in your orderByExpression) for better compression and query performance.

LowCardinalityOrdering.ts
import { OlapTable, LowCardinality } from "@514labs/moose-lib";
 
interface UserEvent {
  id: string;
  userId: string;
  eventType: LowCardinality<string>;    // ✅ Low cardinality
  status: LowCardinality<string>;       // ✅ Low cardinality
  country: LowCardinality<string>;      // ✅ Low cardinality
  createdAt: Date;                      // High cardinality
  sessionId: string;                    // High cardinality
}
 
// ✅ Good: LowCardinality columns first
const userEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["eventType", "status", "country", "createdAt", "sessionId"]
});
 
// ❌ Bad: High cardinality columns first
const badUserEventsTable = new OlapTable<UserEvent>("user_events", {
  orderByFields: ["createdAt", "sessionId", "eventType", "status"]  // Less efficient
});