In-Database Transformations
Viewing typescript
switch to python
Overview
Materialized views allow you to precompute and store the results of complex queries. This can be useful for improving query performance, reducing the load on your database, and providing a more efficient way to access data.
How it works:
Model your target query result set
Write the schema of your materialized view result set as a data model
Define your transformation logic
Write a SELECT query to compute the result set
Moose generates the infrastructure
Moose turns your query into a materialized view and a physical table to store the results
Basic Example
import { MaterializedView, sql } from "@514labs/moose-lib";
// 1. Define the schema of the materialized view result set
interface TargetSchema {
id: string;
average_rating: number;
num_reviews: number;
}
// 2. Write a SELECT query to compute the result set
const exampleMaterializedView = new MaterializedView<TargetSchema>({
selectStatement: sql` SELECT * FROM source_table_name`,
tableName: "my_table",
materializedViewName: "example_materialized_view",
});
from moose_lib import MaterializedView, sql
# 1. Define the schema of the materialized view result set
class TargetSchema(BaseModel):
id: str
average_rating: float
num_reviews: int
# 2. Write a SELECT query to compute the result set
example_materialized_view = MaterializedView[TargetSchema](
select_statement=f"SELECT * FROM source_table_name",
table_name="my_table",
materialized_view_name="example_materialized_view",
)
Key benefits:
Automatic refresh on new data
Supports incremental aggregations
Type-safe schema definitions
Configuration Options
interface MaterializedViewOptions<T> {
selectStatement: string; // sql query to compute the materialized view
tableName: string; // name of the table where materialized view result set will be stored
materializedViewName: string; // name of the materialized view in the database
engine?: ClickHouseEngines; // default is MergeTree
orderByFields?: (keyof T & string)[]; // default is the primary key of the source table
}
from moose_lib import MaterializedView, sql
from source_table import source_table
class MaterializedViewOptions(BaseModel):
select_statement: str
table_name: str
materialized_view_name: str
engine: ClickHouseEngines = ClickHouseEngines.MergeTree
order_by_fields: List[str] = []
Working with Aggregations
Materialized views support aggregations using the Aggregated
type in your schema definition and the AggregatingMergeTree
engine. This is crucial for efficiently maintaining pre-aggregated data in real-time systems.
Materialized views support aggregations using the Annotated
type in your schema definition and the AggregatingMergeTree
engine. This is crucial for efficiently maintaining pre-aggregated data in real-time systems.
Why Use ClickHouse Aggregate Functions
ClickHouse’s aggregate functions offer several advantages when used in materialized views:
- Incremental computation: ClickHouse can update aggregations incrementally when new data arrives, rather than recomputing the entire result set
- Memory efficiency: Aggregate states are stored in a compact format optimized for the specific function
- High performance: Native implementation of complex statistical functions that scale with your data volume
- Type safety: The
Aggregated<T>
type in Moose ensures correct mapping between TypeScript and ClickHouse aggregation types
When using the AggregatingMergeTree
engine, ClickHouse maintains the state of your aggregations between updates, making it much more efficient than re-aggregating from raw data each time.
Common Aggregate Functions
Examples coming soon
Some of the most useful aggregate functions for materialized views include:
Function | Description | TypeScript Usage |
---|---|---|
sum | Sum of values | number & Aggregated<"sum"> |
avg | Average of values | number & Aggregated<"avg"> |
count | Count of records | number & Aggregated<"count"> |
uniq | Approximate unique count | number & Aggregated<"uniq"> |
min | Minimum value | number & Aggregated<"min"> |
max | Maximum value | number & Aggregated<"max"> |
Aggregation Example
import { MaterializedView, ClickHouseEngines } from "@514labs/moose-lib";
import { sourceTable } from "./SourceTable";
interface TargetSchema {
id: string;
average_rating: number & Aggregated<"avg">;
num_reviews: number & Aggregated<"sum">;
}
const query = sql`
SELECT
${sourceTable.columns["id"]},
AVG(${sourceTable.columns["rating"]}) as average_rating,
COUNT(${sourceTable.columns["id"]}) as num_reviews
FROM ${sourceTable.name}
GROUP BY ${sourceTable.columns["id"]}
`;
const exampleMaterializedView = new MaterializedView<TargetSchema>({
selectStatement: query,
tableName: "my_table",
materializedViewName: "example_materialized_view",
engine: ClickHouseEngines.AggregatingMergeTree,
orderByFields: ["id"],
});
Example coming soon
Best Practices for Aggregations
When working with aggregated materialized views:
- Choose the right engine: Always use
AggregatingMergeTree
for views with aggregate functions - Match schema to query: Ensure your TargetSchema fields match the aggregate functions in your query
- Consider data volume: Aggregations are most valuable for large datasets where full scans are expensive
- Group wisely: Be thoughtful about your GROUP BY clauses as they affect both storage and query patterns
- Combine with filtering: Pre-filter data in your select statement to improve performance
When these practices are followed, materialized views with aggregations can provide dramatic performance improvements for analytical queries while maintaining near real-time data freshness.
Querying Materialized Views
You can query materialized views just like any other table in Moose:
const query = sql`SELECT * FROM ${exampleMaterializedView.targetTable.name}`;
Example coming soon
How It Works
When you create a materialized view in Moose:
- A new table is automatically created in your database using the provided schema
- The
selectStatement
is used to compute the initial results - The results are stored in the new table
- In ClickHouse, materialized views function as triggers on the source table
- Whenever new data is inserted into the source table, the materialized view automatically updates
- The update uses the
selectStatement
select_statement
to compute the latest results from the source data
This ensures your precomputed views stay in sync with your underlying data without manual intervention.