Transforms
Transforms are the T side of Frank. They turn Bronze tables, other transform outputs, or custom code into Silver and Gold data products.
What a transform stores
A transform owns:
- Metadata: name, description, tags, tenant.
- Target: FIWARE Smart Data Model or custom schema.
- Sources: one or more input tables, aliases, join metadata, and ordering.
- Field mappings: source expressions, literals, runtime context fields, AI confidence, and ordering.
- Pattern config: optional transform pattern ID, version, and params.
- Materialization: table, view, incremental, merge keys, schedule config.
- Incremental config: full refresh, watermark, cursor field, tiebreaker, per-input read modes.
- Artifact reference: the currently hydrated runnable artifact.
- Runtime state: lifecycle stage, last run outcome, run stats, test results.
The editable spec and runnable artifact are separate. A transform can be changed without immediately replacing the executable artifact until hydration succeeds.
Lifecycle and runtime outcome
Frank split user intent from runtime truth:
| Field | Values | Meaning |
|---|---|---|
lifecycle_stage | draft, ready, retired | Whether the transform is usable or intentionally withdrawn. |
last_run_outcome | none, running, succeeded, failed | What the most recent run did. |
Readiness checks use both fields plus hydration state:
- A transform can run when it is hydrated, not retired, and no run is in flight.
- A transform can be scheduled after it has been hydrated and promoted to
ready. - Failed runs do not block manual retry.
Legacy status is derived for older consumers.
Input sources
Transforms support:
- One Bronze table.
- Multiple Bronze tables with joins.
- Other transform outputs for chaining.
- Mixed source tables and transform outputs for staged Silver-to-Gold flows.
This supports common patterns:
raw.postgres_orders -> stg_orders -> fct_daily_sales
raw.stripe_customers \
raw.salesforce_contacts -> dim_customer_360
raw.postgres_customers /Mapping kinds
AI and UI mapping flows support three field mapping kinds:
| Kind | Use it for | Required fields |
|---|---|---|
source_expression | A source column feeds a target field, optionally with SQL. | source_field |
literal | A constant value such as source system, schema version, or boolean flag. | literal_value, literal_type |
context | Runtime metadata such as tenant ID, pipeline ID, run start, transform name, or source name. | context_key |
Context keys are intentionally allowlisted:
tenant.id
pipeline.id
pipeline.run_id
run.started_at
transform.name
source.nameTransform patterns
Transform patterns live in backend/config/transform_patterns. They are synced at API startup and exposed through /api/v1/transform-patterns.
Families include:
- Projection: select and rename.
- Filtering: SQL predicates.
- Joining: left, inner, lookup.
- Aggregation: group by and window functions.
- Deduplication: first/latest row selection.
- Dimensions: upsert, SCD Type 1, SCD Type 2 merge.
- Validation: regex, enum, anomaly flags.
- Conversion: unit, currency, timezone.
- Geospatial: WKT parsing, H3 enrichment, H3 aggregation, point-in-polygon, nearest, distance, spatial joins.
- Python runner: containerized escape hatches such as
h3_enrich,fx_rate_ingest, and test patterns.
Each pattern defines params, required fields, runtime, template files, and validation rules.
Runtimes
Transform artifacts can target multiple runtimes:
| Runtime | Value | Use it for |
|---|---|---|
| Trino SQL | trino_sql | Direct SQL execution over Iceberg. |
| dbt SQL | dbt_sql | dbt-style model rendering and execution. |
| Python runner | python_runner | Containerized custom logic using frank-sdk. |
| Flink SQL | flink_sql | Future streaming SQL surface. |
The renderer produces the runnable artifact; the executor runs it through the correct runtime path.
Hydration
Hydration turns an editable transform spec into a concrete artifact:
- Resolve input context and source schemas.
- Apply transform pattern or field mappings.
- Render SQL/dbt/Python runner content.
- Validate required params and schema assumptions.
- Persist a
TransformArtifact. - Update
current_artifact_id.
Hydration is the boundary between design and execution.
Running transforms
Manual run path:
frankctl transforms trigger <transform-id>
frankctl transforms runs <transform-id>
frankctl transforms logs <transform-id> <run-id>Run statuses:
pending -> running -> completed | failed | cancelledThe API stores summary metadata in Postgres and detailed logs in Loki / runtime logs, keeping list views fast while preserving drill-down.
Incremental transforms
Transform-level incremental modes:
| Mode | Meaning |
|---|---|
full_refresh | Reprocess all input rows. |
watermark | Use a cursor field and tiebreaker for bounded incremental reads. |
Per-input read config lets fact tables run delta while dimension tables are read as full snapshots:
{
"iceberg.bronze.orders": { "mode": "delta" },
"iceberg.bronze.customers": { "mode": "full" }
}Python runner transforms
Python runner patterns execute in containers and use frank-sdk to read runtime config, connect to Trino, emit metrics, emit lineage, and return structured results. They are the right choice when SQL would be forced or when a domain library is the real implementation.
Use the SDK guide for the authoring contract: SDK.
When to use what
| Need | Use |
|---|---|
| Rename, cast, filter, map fields | Field mappings or select_rename / filter. |
| Join two or more tables | Join patterns or multi-source transform wizard. |
| Standardize into a known semantic schema | Target SDM + AI-suggested field mappings. |
| Generate custom logic | AI generate-transform, then review and publish. |
| Use domain Python libraries | Python runner pattern with frank-sdk. |
| Chain reusable steps | Pipeline DAG. |