ReplacingMergeTree extends MergeTree with automatic deduplication. During background merges, ClickHouse keeps only one row per unique ORDER BY key—useful for mutable data that receives updates.
import { OlapTable, ClickHouseEngines } from "@514labs/moose-lib"; interface User { id: string; name: string; email: string; updated_at: Date;} const users = new OlapTable<User>("users", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"]});| Option | Type | Description |
|---|---|---|
orderByFields | string[] | Columns that define uniqueness (required) |
ver | string | Version column—keeps the row with the highest value during merges |
isDeleted | string | Soft delete column (UInt8)—removes rows where value is 1 (requires ver) |
ver)Use the ver parameter to specify the column that determines which row survives during deduplication. The row with the highest version value is kept:
const users = new OlapTable<User>("users", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at" // Keep row with latest updated_at});Use isDeleted (requires ver) to specify the column that marks rows for deletion. Rows with isDeleted=1 are removed during merges:
import { OlapTable, ClickHouseEngines, UInt8 } from "@514labs/moose-lib"; interface ExampleModel { id: string; name: string; updated_at: Date; deleted: UInt8; // 0 = active, 1 = deleted} const table_with_soft_deletes = new OlapTable<ExampleModel>("table_with_soft_deletes", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at", isDeleted: "deleted"});The isDeleted column must be a UInt8 type. See Unsigned Integers for more information.
By default, deduplication happens automatically in the background after data is written into the table. If you need to guarantee deduplication at read time, you can use the FINAL modifier or the argMax function like this:
import { sql } from "@514labs/moose-lib"; // Force deduplication at query timeconst deduped = sql`SELECT * FROM ${users} FINAL WHERE ${users.columns.id} = '123'`; // Or use aggregation to pick the latest valuesconst latest = sql` SELECT argMax(${users.columns.name}, ${users.columns.updated_at}) AS name FROM ${users} WHERE ${users.columns.id} = '123'`;import { sql } from "@514labs/moose-lib"; // Force deduplication at query timeconst deduped = sql`SELECT * FROM ${users} FINAL WHERE ${users.columns.id} = '123'`; // Or use aggregation to pick the latest valuesconst latest = sql` SELECT argMax(${users.columns.name}, ${users.columns.updated_at}) AS name FROM ${users} WHERE ${users.columns.id} = '123'`;