Modeling Materialized Views
Viewing:
Overview
Materialized views allow you to precompute and store the results of complex queries. This can be useful for improving query performance, reducing the load on your database, and providing a more efficient way to access data.
Basic Usage
import { MaterializedView, sql } from "@514labs/moose-lib";
import { sourceTable } from "path/to/SourceTable";
// 1. Define the schema of the materialized view result set
interface TargetSchema {
id: string;
average_rating: number;
num_reviews: number;
}
// 2. Write a SELECT query to compute the result set
const exampleMaterializedView = new MaterializedView<TargetSchema>({
selectStatement: sql`SELECT * FROM ${sourceTable}`,
selectTables: [sourceTable],
tableName: "my_table",
materializedViewName: "example_materialized_view",
});
MooseTip:
Use the sql
function to interpolate tables and columns from your project
from moose_lib import MaterializedView, sql
# 1. Define the schema of the materialized view result set
class TargetSchema(BaseModel):
id: str
average_rating: float
num_reviews: int
# 2. Write a SELECT query to compute the result set
example_materialized_view = MaterializedView[TargetSchema](
select_statement=f"SELECT * FROM source_table_name",
table_name="my_table",
materialized_view_name="example_materialized_view",
select_tables=[source_table_name],
)
Key benefits:
Automatic refresh on new data
Supports incremental aggregations
Type-safe schema definitions
Configuration Options
interface MaterializedViewOptions<T> {
selectStatement: sql; // sql query to compute the materialized view
selectTables: (OlapTable<any> | View)[]; // tables the query reads from
tableName: string; // name of the table where materialized view result set will be stored
materializedViewName: string; // name of the materialized view in the database
engine?: ClickHouseEngines; // default is MergeTree
orderByFields?: (keyof T & string)[]; // default is the primary key of the source table
}
from moose_lib import MaterializedView, sql
from source_table import source_table
class MaterializedViewOptions(BaseModel):
select_statement: str
table_name: str
materialized_view_name: str
select_tables: List[OlapTable | View]
engine: ClickHouseEngines = ClickHouseEngines.MergeTree
order_by_fields: List[str] = []
Using Aggregations
Materialized views support aggregations using the Aggregated
type in your schema definition and the AggregatingMergeTree
engine. This is crucial for efficiently maintaining pre-aggregated data in real-time systems.
Why Use ClickHouse Aggregate Functions
ClickHouse’s aggregate functions offer several advantages when used in materialized views:
- Incremental computation: ClickHouse can update aggregations incrementally when new data arrives, rather than recomputing the entire result set.
- Memory efficiency: Aggregate states are stored in a compact format optimized for the specific function.
- High performance: Native implementation of complex statistical functions that scale with your data volume.
- Type safety: The
Aggregated
type in Moose ensures correct mapping between your code and ClickHouse aggregation types.
When using the AggregatingMergeTree
engine, ClickHouse maintains the state of your aggregations between updates, making it much more efficient than re-aggregating from raw data each time.
Common Aggregate Functions
In TypeScript, the Aggregated
type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.
Aggregated<AggregationFunction extends string, ArgTypes extends any[]>
AggregationFunction
: A string that specifies the aggregation function to be applied. Common functions includesum
,avg
,count
,uniq
,min
, andmax
.ArgTypes
: An optional array of types that define the parameters the aggregation function will operate on. This ensures type safety and helps in mapping the function to the correct data types.
In Python, the Annotated
type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.
from moose_lib import Annotated, AggregateFunction
Annotated[U, AggregateFunction(...)]
U
: Represents the type of the aggregated result. For example, float for sums and averages, int for counts.AggregateFunction(...)
: Specifies the aggregation operation to be performed. It includes:agg_func
: A string indicating the aggregation function, such as “sum”, “avg”, “count”, “uniq”, “min”, or “max”.param_types
: An optional list that defines the types of parameters the aggregation function will operate on. This ensures type safety and clarity in the schema definition.
MooseTip:
View the ClickHouse documentation for more information on the aggregate functions:
Aggregation Example
import { MaterializedView, ClickHouseEngines, Aggregated, sql } from "@514labs/moose-lib";
import { sourceTable } from "./SourceTable";
interface TargetTableSchema {
id: string;
average_rating: number & Aggregated<"avg", [number]>;
num_reviews: number & Aggregated<"sum", [number]>;
}
const query = sql`
SELECT
id,
avgState(${sourceTable.columns.rating}) as average_rating,
countState(${sourceTable.columns.id}) as num_reviews
FROM ${sourceTable}
GROUP BY ${sourceTable.columns.id}
`;
const exampleMaterializedView = new MaterializedView<TargetTableSchema>({
selectStatement: query,
selectTables: [sourceTable],
tableName: "my_table",
materializedViewName: "example_materialized_view",
engine: ClickHouseEngines.AggregatingMergeTree,
orderByFields: ["id"],
});
from moose_lib import MaterializedView, ClickHouseEngines, AggregateFunction, MaterializedViewOptions
from app.source_table import source_table
class TargetTableSchema(BaseModel):
id: str
average_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
num_reviews: Annotated[int, AggregateFunction(agg_func="sum", param_types=[int])]
query = f"""
SELECT
id,
avgState(rating) as average_rating,
countState(id) as num_reviews
FROM {source_table.name}
GROUP BY id
"""
example_materialized_view = MaterializedView[TargetTableSchema](MaterializedViewOptions(
select_statement=query,
table_name="my_table",
materialized_view_name="example_materialized_view",
select_tables=[source_table],
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
))
State and Merge Suffixes
When using the AggregatingMergeTree
engine, you need to use the State
suffix at the end of your aggregate function in your SQL query:
avgState(rating) as average_rating,
countState(id) as num_reviews
When querying the materialized view, you need to use the Merge
suffix at the end of your aggregate function:
SELECT avgMerge(average_rating), countMerge(num_reviews) FROM my_table
If you use the Aggregated
type in your schema, you don’t need to use the Merge
suffix if you reference the field in your query using the sql
function.
const cols = exampleMaterializedView.targetTable.columns;
const query = sql`SELECT ${cols.average_rating}, ${cols.num_reviews} FROM my_table`;
MooseTip:
The sql
template literal automatically handles the translation between State and Merge suffixes when you use the Aggregated
type in your schema.
Best Practices for Aggregations
When working with aggregated materialized views:
- Choose the right engine: Always use
AggregatingMergeTree
for views with aggregate functions - Match schema to query: Ensure your TargetSchema fields match the aggregate functions in your query
- Consider data volume: Aggregations are most valuable for large datasets where full scans are expensive
- Group wisely: Be thoughtful about your GROUP BY clauses as they affect both storage and query patterns
- Combine with filtering: Pre-filter data in your select statement to improve performance
When these practices are followed, materialized views with aggregations can provide dramatic performance improvements for analytical queries while maintaining near real-time data freshness.
How It Works
When you create a materialized view in Moose:
- A new table is automatically created in your database using the provided schema
- The
selectStatement
is used to compute the initial results - The results are stored in the new table
- In ClickHouse, materialized views function as triggers on the source table
- Whenever new data is inserted into the source table, the materialized view automatically updates
- The update uses the
selectStatement
select_statement
to compute the latest results from the source data
This ensures your precomputed views stay in sync with your underlying data without manual intervention.
Best Practices
Materialized View Best Practices
Use for complex aggregations
Materialized views are most valuable for expensive computations that are queried frequently
Choose appropriate engines
Use AggregatingMergeTree for aggregations, MergeTree for simple transformations
Design for your query patterns
Structure your materialized views based on how you'll query the data
Consider storage costs
Materialized views store data, so balance performance gains against storage requirements
Use State/Merge suffixes correctly
Use State suffix in creation queries, Merge suffix in query queries (or let sql handle it)