Materialized Views

In-Database Transformations

Viewing:

Overview

Materialized views allow you to precompute and store the results of complex queries. This can be useful for improving query performance, reducing the load on your database, and providing a more efficient way to access data.

How it works:

Model your target query result set

Write the schema of your materialized view result set as a data model

Define your transformation logic

Write a SELECT query to compute the result set

Moose generates the infrastructure

Moose turns your query into a materialized view and a physical table to store the results

Basic Example

MaterializedView.ts

import { MaterializedView, sql } from "@514labs/moose-lib";
import { sourceTable } from "path/to/SourceTable";
 
// 1. Define the schema of the materialized view result set
interface TargetSchema {
  id: string;
  average_rating: number;
  num_reviews: number;
}
 
// 2. Write a SELECT query to compute the result set
const exampleMaterializedView = new MaterializedView<TargetSchema>({
  selectStatement: sql` SELECT * FROM ${sourceTable}`,
  selectTables: [sourceTable],
  tableName: "my_table",
  materializedViewName: "example_materialized_view",
});

MooseTip:

Use the sql function to interpolate tables and columns from your project

MaterializedView.py

from moose_lib import MaterializedView, sql
 
# 1. Define the schema of the materialized view result set
class TargetSchema(BaseModel):
    id: str
    average_rating: float
    num_reviews: int
 
# 2. Write a SELECT query to compute the result set
example_materialized_view = MaterializedView[TargetSchema](
    select_statement=f"SELECT * FROM source_table_name",
    table_name="my_table",
    materialized_view_name="example_materialized_view",
    select_tables=[source_table_name],
)

Key benefits:

Automatic refresh on new data

Supports incremental aggregations

Type-safe schema definitions

Configuration Options

ViewOptions.ts

interface MaterializedViewOptions<T> {
  selectStatement: sql; // sql query to compute the materialized view
  selectTables: (OlapTable<any> | View)[]; // tables the query reads from
  tableName: string; // name of the table where materialized view result set will be stored
  materializedViewName: string; // name of the materialized view in the database
 
  engine?: ClickHouseEngines; // default is MergeTree
  orderByFields?: (keyof T & string)[]; // default is the primary key of the source table
}

ViewOptions.py

from moose_lib import MaterializedView, sql
from source_table import source_table
 
class MaterializedViewOptions(BaseModel):
    select_statement: str
    table_name: str
    materialized_view_name: str
    select_tables: List[OlapTable | View]
    engine: ClickHouseEngines = ClickHouseEngines.MergeTree
    order_by_fields: List[str] = []

Working with Aggregations

Materialized views support aggregations using the Aggregated type in your schema definition and the AggregatingMergeTree engine. This is crucial for efficiently maintaining pre-aggregated data in real-time systems.

Why Use ClickHouse Aggregate Functions

ClickHouse’s aggregate functions offer several advantages when used in materialized views:

Incremental computation: ClickHouse can update aggregations incrementally when new data arrives, rather than recomputing the entire result set.
Memory efficiency: Aggregate states are stored in a compact format optimized for the specific function.
High performance: Native implementation of complex statistical functions that scale with your data volume.
Type safety: The Aggregated type in Moose ensures correct mapping between your code and ClickHouse aggregation types.

When using the AggregatingMergeTree engine, ClickHouse maintains the state of your aggregations between updates, making it much more efficient than re-aggregating from raw data each time.

Common Aggregate Functions

In TypeScript, the Aggregated type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.

Aggregated.ts

Aggregated<AggregationFunction extends string, ArgTypes extends any[]>

AggregationFunction: A string that specifies the aggregation function to be applied. Common functions include sum, avg, count, uniq, min, and max.
ArgTypes: An optional array of types that define the parameters the aggregation function will operate on. This ensures type safety and helps in mapping the function to the correct data types.

In Python, the Annotated type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.

Aggregated.py

from moose_lib import Annotated, AggregateFunction
 
Annotated[U, AggregateFunction(...)]

U: Represents the type of the aggregated result. For example, float for sums and averages, int for counts.
AggregateFunction(...): Specifies the aggregation operation to be performed. It includes:
- agg_func: A string indicating the aggregation function, such as “sum”, “avg”, “count”, “uniq”, “min”, or “max”.
- param_types: An optional list that defines the types of parameters the aggregation function will operate on. This ensures type safety and clarity in the schema definition.

MooseTip:

View the ClickHouse documentation for more information on the aggregate functions:

ClickHouse Aggregate Functions

Aggregation Example

Aggregations.ts

import { MaterializedView, ClickHouseEngines, Aggregated, sql } from "@514labs/moose-lib";
import { sourceTable } from "./SourceTable";
 
interface TargetTableSchema {
  id: string;
  average_rating: number & Aggregated<"avg", [number]>;
  num_reviews: number & Aggregated<"sum", [number]>;
}
 
const query = sql`
  SELECT 
    id, 
    avgState(${sourceTable.columns.rating}) as average_rating, 
    countState(${sourceTable.columns.id}) as num_reviews 
  FROM ${sourceTable}
  GROUP BY ${sourceTable.columns.id}
  `;
 
const exampleMaterializedView = new MaterializedView<TargetTableSchema>({
  selectStatement: query,
  selectTables: [sourceTable],
  tableName: "my_table",
  materializedViewName: "example_materialized_view",
  engine: ClickHouseEngines.AggregatingMergeTree, 
  orderByFields: ["id"],
});

Aggregations.py

from moose_lib import MaterializedView, ClickHouseEngines, AggregateFunction, MaterializedViewOptions
from app.source_table import source_table
 
class TargetTableSchema(BaseModel):
    id: str
    average_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
    num_reviews: Annotated[int, AggregateFunction(agg_func="sum", param_types=[int])]
 
query = f"""
  SELECT 
    id, 
    avgState(rating) as average_rating, 
    countState(id) as num_reviews 
  FROM {source_table.name}
  GROUP BY id
"""
 
example_materialized_view = MaterializedView[TargetTableSchema](MaterializedViewOptions(
    select_statement=query,
    table_name="my_table",
    materialized_view_name="example_materialized_view",
    select_tables=[source_table],
    engine=ClickHouseEngines.AggregatingMergeTree,
    order_by_fields=["id"],
))

State and Merge Suffixes

When using the AggregatingMergeTree engine, you need to use the State suffix at the end of your aggregate function in your SQL query:

avgState(rating) as average_rating, 
countState(id) as num_reviews

When querying the materialized view, you need to use the Merge suffix at the end of your aggregate function:

SELECT avgMerge(average_rating), countMerge(num_reviews) FROM my_table

If you use the Aggregated type in your schema, you don’t need to use the Merge suffix if you reference the field in your query using the sql function.

Aggregated.ts

const cols = exampleMaterializedView.targetTable.columns;
const query = sql`SELECT ${cols.average_rating}, ${cols.num_reviews} FROM my_table`;

Best Practices for Aggregations

When working with aggregated materialized views:

Choose the right engine: Always use AggregatingMergeTree for views with aggregate functions
Match schema to query: Ensure your TargetSchema fields match the aggregate functions in your query
Consider data volume: Aggregations are most valuable for large datasets where full scans are expensive
Group wisely: Be thoughtful about your GROUP BY clauses as they affect both storage and query patterns
Combine with filtering: Pre-filter data in your select statement to improve performance

When these practices are followed, materialized views with aggregations can provide dramatic performance improvements for analytical queries while maintaining near real-time data freshness.

Querying Materialized Views

You can query materialized views just like any other table in Moose. When instantiating a MaterializedView, Moose will create an OlapTable for the materialized view result set. You can access the OlapTable from the MaterializedView object:

QueryingMaterializedView.ts

import { sql } from "@514labs/moose-lib";
 
// Query the materialized view
const query = sql`SELECT * FROM ${exampleMaterializedView.targetTable}`;
 
// Retrieve the materialized view result set as an OlapTable
const table: OlapTable<TargetTableSchema> = exampleMaterializedView.targetTable;

QueryingMaterializedView.py

from moose_lib import OlapTable
 
## Querying the materialized view
query = f"SELECT * FROM {example_materialized_view.target_table}"
 
## Or
table: OlapTable[TargetTableSchema] = example_materialized_view.target_table

How It Works

When you create a materialized view in Moose:

A new table is automatically created in your database using the provided schema
The selectStatement is used to compute the initial results
The results are stored in the new table
In ClickHouse, materialized views function as triggers on the source table
Whenever new data is inserted into the source table, the materialized view automatically updates
The update uses the selectStatementselect_statement to compute the latest results from the source data

This ensures your precomputed views stay in sync with your underlying data without manual intervention.

OLAP Tables

Consumption APIs