Architecture Overview
Frank is a low-code EL/T platform for building governed lakehouse data products and publishing them into an ontology.
It combines four layers:
- Experience layer -- SvelteKit UI,
frankctl, Python pattern CLI, and API. - Control plane -- FastAPI, metadata models, pattern registries, schema libraries, and auth.
- Execution plane -- Temporal workers, Dagster assets, Trino execution, Kubernetes Python runner, and Martha AI workflows.
- Data plane -- Iceberg catalog, S3/MinIO object storage, Bronze/Silver/Gold tables, and ontology-core-v2.
The picture
Builders and operators
UI frankctl API / CI
| | |
+------------------+-------------------+
|
v
FastAPI control plane
+--------------------------+---------------------------+
| | |
v v v
Source registry Transform registry Pipeline registry
Pattern catalog Artifact hydration Versioned DAGs
Stream config Runtime metadata Sandbox/activation
| | |
v v v
Temporal source worker Dagster + transform worker Dagster / sandbox
Airbyte / dlt Trino / dbt / Python runner Step execution
| | |
+--------------------------+---------------------------+
|
v
Apache Iceberg lakehouse
Bronze -> Silver -> Gold datasets
|
v
Backing datasets and ontology sync
|
v
ontology-core-v2Control plane
The FastAPI application owns product state:
- Sources and streams.
- Source pattern definitions.
- Transform specs, sources, mappings, artifacts, runs, and lineage.
- Transform pattern registry.
- Pipeline versions, steps, edges, and sandbox results.
- Schedules.
- Dataset browsing.
- Schema libraries.
- Ontology entity type proxy.
- Backing datasets and ontology sync history.
- Identity policies.
- AI routes backed by Martha.
Postgres stores metadata. Iceberg stores data. The API derives tenant scope from auth and passes service identity where workers need to call back into protected endpoints.
Source execution
Source execution is asynchronous and handled by Temporal source workers.
Source pattern -> Source -> Discovery -> Streams -> Sync -> Bronze tablesAirbyte patterns use PyAirbyte and Dockerized source connectors. dlt patterns use Python-native source builders for REST, GraphQL, filesystem, Kafka, and related lightweight sources.
Both engines implement the same extraction contract:
- Discover stream schemas.
- Extract selected streams.
- Apply data envelope metadata.
- Track cursors.
- Batch writes.
- Write to Iceberg through shared naming helpers.
- Return structured sync status and logs.
Transform execution
Transforms separate design-time specs from runtime artifacts.
Transform spec -> Hydration -> Artifact -> Materialization -> TransformRunThe transform spec stores sources, mappings, pattern params, target schema, materialization, and incremental mode. Hydration renders executable content:
- Trino SQL.
- dbt-style SQL.
- Python-runner files and container config.
- Future runtime renderers such as Flink SQL.
The recommended execution path is Dagster-first:
API trigger -> Dagster materialization -> asset calls API /execute -> run status syncThis gives operators a Dagster run, asset key, logs, and consistent scheduling behavior.
Pipeline execution
Pipelines are versioned DAGs over transform steps.
Pipeline -> PipelineVersion -> PipelineStep + PipelineStepEdgeFrank validates DAGs with topological sorting and cycle detection. Versions are immutable and content-hashed. Sandbox runs execute a version before activation. Activation links or creates transforms for each step and promotes the pipeline to an operational state.
AI execution
Frank owns workflow definitions for AI assistance and seeds them into Martha.
Frank API -> Martha workflow -> LLM / tools -> structured result -> Frank specAI routes return typed payloads for schema matching, field mapping, pattern params, SQL review, code generation, CI fixes, transform publishing, and pipeline composition.
Ontology execution
Ontology integration turns curated Iceberg tables into semantic entities.
Silver/Gold table -> BackingDataset -> OntologySyncRun -> ontology-core-v2 entitiesBacking datasets map columns to entity properties and relationships. Sync runs track snapshots, cursors, rows synced, workflow IDs, logs, and drift signals.
Observability
Frank uses:
- Structured logs for API and worker events.
- Loki for persistent run logs.
- OpenTelemetry for API and worker traces.
- Dagster run IDs for materialization tracking.
- Temporal workflow IDs for async operations.
- TransformRun, SyncRun, and OntologySyncRun records for fast UI/API history.
Design principles
EL and T are separate
Sources sync raw data. Transforms model data. A source can be active without a transform; a transform can remain usable while an upstream source needs attention.
Specs and artifacts are separate
Users edit transform specs. Hydration produces artifacts. Execution uses artifacts. This keeps design iteration away from runtime stability.
Patterns are product surface
Source and transform patterns are not hidden implementation details. They are how Frank expands connector coverage and transform capability while keeping UI and API behavior consistent.
Ontology is a publication layer
Frank does not treat ontology sync as an afterthought. Backing datasets, identity policies, entity type browsing, mapping suggestions, health checks, and sync history are first-class product features.