How does OptimaFlo organize my data?

Three layers: raw data as-is, cleaned data (deduplicated and validated), and business-ready data (metrics and reports). All stored on Apache Iceberg. You get ACID transactions, schema changes, and time-travel queries out of the box.

How does the automatic engine selection work?

OptimaFlo picks the right engine for each query based on data size. DuckDB for under 100GB. A warehouse for up to 10TB. Spark for bigger. Starter includes DuckDB. Growth adds Warehouse and Spark. Scale adds Dedicated Spark.

What security features does OptimaFlo provide?

Role-based access. Workspace-scoped data isolation. Append-only audit logs. And everything runs in your own cloud, so your data never leaves. Iceberg stores full snapshot history for compliance.

How quickly can I get started?

Hours, not months. Connect your first data source and the Manager walks you through everything: authentication, schema, pipeline, dashboard. No coding required.

What cloud platforms are supported?

Google Cloud Platform today. AWS and Azure on the roadmap. Pipelines run on Cloud Composer (Airflow). Data is stored in Apache Iceberg on Cloud Storage.

How does pipeline monitoring work?

Every pipeline run is tracked. Status updates, error reports, and execution history. You can monitor runs from the dashboard in real time.

What data sources can I connect?

Today: BigQuery, Cloud Storage, REST APIs, and GraphQL. Coming soon: PostgreSQL, S3, Snowflake, MySQL, Redshift, and SaaS connectors. Most sources connect with one-click OAuth or a service account key.

How does the AI work? Which models are supported?

You bring your own LLM key. Supported models: Claude, GPT, Gemini. The AI works as a seven-member data team: a Manager, an Ingestion Engineer, a Data Engineer, an Analytics Engineer, an Analyst, a BI Developer, and a Quality Engineer.

How does data quality scoring work?

Every table gets scored on five dimensions: completeness, accuracy, consistency, freshness, and uniqueness. Scoring runs alongside your pipelines. Schema enforcement blocks bad changes. Self-healing SQL fixes errors at runtime.

Can I export data out of OptimaFlo?

Yes. Export to BigQuery and Cloud Storage today. PostgreSQL, MySQL, Snowflake, and S3 coming soon. Your Gold tables live in Apache Iceberg. Any Iceberg-compatible tool can read them directly, including Spark, Trino, and DuckDB.

Features

EverythingYouNeed,
NothingYouDon't

From raw data to clean dashboards in one platform. Here's everything OptimaFlo handles for you.

Start Building

AI Engine

Built on Your AI

The platform runs on your own LLM key, sees your full schema, and keeps its own SQL correct and safe.

query.sql

Data Architecture

Your Data, Organized Automatically

Data flows through clean layers such as raw, cleaned, and business-ready. Each step is schema-enforced, auditable, and stored in open formats you own.

Ingest: Raw Data, Untouched

Every record from every source lands here exactly as-is. Full history, zero transforms. Stored with ACID transactions so nothing gets lost.

Clean: Validated & Transformed

AI-generated SQL cleans, deduplicates, and joins your raw tables. Every transform is validated and approved by you before it runs.

Model: Business-Ready Metrics

Aggregated KPIs, dimension tables, and the semantic layer your dashboards query directly. Ready for reporting out of the box.

Open Storage: No Lock-In

Per-workspace data catalogs, time-travel queries, and schema evolution. Your data lives in open formats, portable wherever you go.

Open Lakehouse

Built on Apache Iceberg

An open table format under everything. Your data versions itself, travels through time, and stays portable, with no lock-in.

time-travel.sql

SELECT * FROM orders

FOR TIMESTAMP AS OF

'2026-05-01 00:00'

↳ data exactly as it was

Time Travel

Query any table as it looked at any point in time. Audit changes, debug, or reproduce a past report.

Snapshots

Every write creates an immutable snapshot. Full history, with rollback to any version.

snapshots

● s-0a9f · now4.2M rows

○ s-77c2 · 1h ago4.1M rows

↺ rollback

Schema Evolution

Add, rename, or drop columns without rewriting tables or breaking downstream queries.

schema.diff

+ add column region

~ rename amt → amount

✓ no table rewrite, no downtime

ACID Transactions

Concurrent reads and writes stay consistent. No half-written tables, no corrupt data.

transaction

✓ Atomic

✓ Consistent

✓ Isolated

✓ Durable

COMMIT;

Pipeline Canvas

See Your Entire Data Flow

The visual pipeline canvas shows every node in your data pipeline as an interactive graph. Add nodes, preview data at each layer, edit SQL, and connect sources to destinations.

Drag & Drop Nodes

Add source, transform, and destination nodes to your pipeline canvas with a click.

Live Data Preview

Preview data at any step in your pipeline before deploying to production.

Inline SQL Editor

Edit transform SQL directly on the canvas with AI copilot assistance and schema context.

Dependency Tracking

See upstream and downstream dependencies for every node in your pipeline.

Compute & Scheduling

The right engine, every time.

Small dataset? It runs instantly. Big dataset? It scales up automatically. You never think about infrastructure.

DuckDB

≤ 100 GB

Runs right in the app. No servers to manage. Answers in under a second.

No cluster to manage
Columnar storage
Sub-second queries

Warehouse

100 GB - 10 TB

A cloud warehouse that grows when you need it. Pay per query.

Serverless scaling
Pay-per-query pricing
Petabyte-capable

Apache Spark

> 10 TB

Splits big jobs across many servers. Runs in your own cloud.

Distributed compute
Full parallelism
Runs in your cloud

DuckDB

≤ 100 GB

Runs right in the app. No servers to manage. Answers in under a second.

Warehouse

100 GB - 10 TB

A cloud warehouse that grows when you need it. Pay per query.

Apache Spark

> 10 TB

Splits big jobs across many servers. Runs in your own cloud.

0 GB100 GB10 TB10 TB+

Automatic orchestration

Apache Airflow

Every pipeline runs on managed Apache Airflow in your own cloud. Scheduling, retries, and monitoring built in.

Scheduled Runs

Cron-based scheduling with configurable intervals. Set it once, and your pipelines run on autopilot.

Retries & Backfills

Failed tasks retry automatically. Run backfills across historical date ranges with sequential triggering.

Semantic Layer & BI

One Source of Truth for Every Metric

Define metrics once, use them everywhere; dashboards, AI queries, exports. No more conflicting definitions across tools.

Certified Metrics

Define revenue, churn, ARR, and any custom metric once. Tag them as certified so everyone queries the same number.

Dimension Hierarchies

Organize dimensions into hierarchies like region → country → city. Drill-down and roll-up just work.

Table Relationships

Map joins between tables in your semantic layer. AI uses these relationships to write correct multi-table queries automatically.

Business Glossary

Define business terms and metrics in one shared glossary, so every person and every AI uses the same definitions.

Semantic Layer

Metrics

Certified Metrics

Monthly Revenue

SUM(amount)

Active Users

COUNT(DISTINCT user_id)

Churn Rate

churned / total

BI Developer

Describe the dashboard you need. The BI Developer analyzes your tables and builds widgets with the right chart types, filters, and metrics.

BI Digests

Scheduled LLM-narrated insights delivered to email, Slack, or webhooks. Your team gets actionable summaries, not raw data.

Dashboard Sharing

Share interactive dashboards with your team. Role-based access controls keep the right people on the right data.

Ad-hoc Querying

Ask a question in plain English or write SQL directly. Query any connected table and get instant charts and tables back.

Data Quality

Trust Your Data Before Anyone Sees It

Automatic quality scoring, AI-generated validation rules, and real-time alerts so bad data never reaches your dashboards.

Quality Gate Passed: 94/100

Completeness98%

Accuracy94%

Freshness100%

Validity87%

Consistency91%

5-Dimensional Scoring

Every table is scored on completeness, accuracy, freshness, validity, and consistency. Quality gates pause pipelines when scores drop below threshold; bad data never reaches downstream.

Generating expectations…

customer_idNOT NULL

order_totalRANGE [0, 10000]

user_emailFORMAT email

transaction_idUNIQUE

LLM-Generated Expectations

AI analyzes your data and generates validation rules automatically; null checks, range bounds, format patterns, and custom SQL rules. Mix LLM-generated and hand-crafted.

Alert feed

Completeness dropped to 72%

orders·critical

Freshness SLA breach: 4h stale

events·warning

Validity restored to 99%

users·resolved

Alerts & Data Profiling

Get notified when quality drops. Route alerts to Slack, email, or webhooks. Profile any table with one click; distributions, outliers, null rates, and cardinality.

Connectors

Connect, Transform, Export

One-click OAuth or service account auth, with automatic schema inference. Ingest from any source, export to any destination.

BigQuery

Google's serverless data warehouse

Available Now

Cloud Storage

Object storage for files and data lakes

Available Now

REST APIs

Any REST API with JSON or CSV responses

Available Now

Amazon S3

AWS object storage

Coming Soon

Redshift

AWS data warehouse

Coming Soon

PostgreSQL

Popular relational database

Coming Soon

Snowflake

Cloud data warehouse

Coming Soon

MySQL

Popular relational database

Coming Soon

GraphQL

Any GraphQL API endpoint

Coming Soon

Warehouse Export

Write processed results to your warehouse for BI tools

Export

Cloud Storage Export

Export as Parquet, CSV, or JSON to any GCS bucket

Export

Webhooks

Pipeline completion and schema change notifications via Slack, email, or custom webhooks

Export

Your Cloud

Your infrastructure. Our orchestration.

OptimaFlo sets up everything inside your own cloud. Your data never leaves. We manage the workflow around it.

Your GCP Project

Cloud Composer

Airflow DAGs

Cloud Run

Polaris Catalog

GCS Buckets

Iceberg Tables

BigQuery

Query Engine

Orchestrated by OptimaFlo

Your GCP project

Everything runs inside your own GCP project. We set it up. You own it.

Data never leaves

Your raw data, processed tables, and query results stay in your storage. We orchestrate, never store.

Managed orchestration

Pipeline scheduling set up and managed for you. New workflows sync automatically.

Polaris catalog

Each workspace gets its own data catalog. Full isolation between teams and projects.

Automated provisioning

One-click setup. Networking, permissions, storage, and compute configured automatically.

No data lock-in

Built on open standards so your data stays portable, wherever you run it.

Enterprise security

RBAC, workspace-level permissions, and encryption at rest. Built in from day one, not sold as an upgrade.

Execution audit trail

Every pipeline run is tracked with status, timing, and errors. Iceberg keeps full snapshot history and schema versions for compliance.

Ready to ship data work today?

From raw data to live dashboards in one conversation.

Start Building Book a walkthrough

Now in early beta. Plans from $2,500 a month. Runs in your cloud. Your data never leaves.

OptimaFlo

Enhancing data owners with a team of AI agents. From raw data to dashboards, all in your own cloud.

EverythingYouNeed,NothingYouDon't

Built on Your AI

Bring Your Own LLM

Schema-Aware Context

Self-Healing SQL

Validated & Secure SQL

Your Data, Organized Automatically

Ingest: Raw Data, Untouched

Clean: Validated & Transformed

Model: Business-Ready Metrics

Open Storage: No Lock-In

Built on Apache Iceberg

Time Travel

Snapshots

Schema Evolution

ACID Transactions

See Your Entire Data Flow

Drag & Drop Nodes

Live Data Preview

Inline SQL Editor

Dependency Tracking

The right engine, every time.

DuckDB

Warehouse

Apache Spark

DuckDB

Warehouse

Apache Spark

One Source of Truth for Every Metric

Trust Your Data Before Anyone Sees It

Connect, Transform, Export

Your infrastructure. Our orchestration.

Your GCP project

Data never leaves

Managed orchestration

Polaris catalog

Automated provisioning

No data lock-in

Enterprise security

Execution audit trail

Ready to ship data work today?

Product

Company

Resources

We value your privacy

EverythingYouNeed,
NothingYouDon't