Data Source¶

You have tabular data you want to bring into your pipeline - historical policies, external enrichment data, lookup tables. The Data Source node reads flat files (parquet or CSV) or Databricks tables.

Spreadsheet equivalent

Like opening a CSV or connecting to an external data source in Excel - it brings data into your workbook.

When to use

Loading historical data for analysis or model training.
Bringing in reference data to join with your quotes (e.g. postcode lookups, external scores).
Use Quote Input instead when building the live API entry point.

Config	Description
`sourceType`	Required. `"flat_file"` or `"databricks"`
`path`	File path (parquet or CSV). Required when sourceType is `"flat_file"`. Relative to your project folder.
`table`	Databricks table name (`catalog.schema.table`). Required when sourceType is `"databricks"`.
`http_path`	Databricks SQL warehouse HTTP path (e.g. `/sql/1.0/warehouses/abc123`). Your Databricks administrator can provide this.
`query`	SQL query to filter or transform the data before it enters the pipeline. Only applies to Databricks sources. For flat files, use the `code` field to filter data after loading.
`code`	Polars code applied after loading - the loaded data is available as `df`. See Polars for code syntax.

When to use code vs a Polars node

The code field is optional. You can also add a Polars node downstream for the same effect. Use code here when you want to filter a large dataset before it fully loads into memory, which can improve performance.

Example¶

A minimal flat file configuration:

{
  "sourceType": "flat_file",
  "path": "data/policies.parquet"
}

Supported file formats

Flat file sources support parquet and CSV. Excel files (.xlsx) are not supported directly - export to CSV first.

See also: Polars for code syntax and Preparing Your Data for a walkthrough.