Esc
Start typing to search...

DataFrame Module

Tabular data manipulation backed by Polars DataFrames.

The DataFrame module provides a pipe-friendly API for loading, transforming, filtering, and aggregating tabular data. DataFrames are opaque native objects that can only be manipulated through DataFrame module functions.

Common patterns

import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr

let data = DataFrame.readCsv "employees.csv"
let summary = data
    |> DataFrame.select ["name", "department", "salary"]
    |> DataFrame.filter (@salary |> Expr.gt (lit 50000))
    |> DataFrame.sort "salary"
    |> DataFrame.head 20
IO.println (DataFrame.shape summary)

Display

DataFrames render as formatted, column-aligned tables when printed or displayed in the REPL. Output includes shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show the first 5 and last 5 rows with a separator:

shape: (1000, 3)
  name | age |     city
   str | i64 |      str
-------+-----+---------
 Alice |  30 | New York
   Bob |  25 |   London
     … |   … |        …
  Yara |  31 |   Berlin
  Zach |  22 |    Tokyo

Security

VariableEffect
KEEL_DATAFRAME_DISABLED=1Disable DataFrame operations
KEEL_DATAFRAME_SANDBOX=/pathRestrict file I/O to directory
KEEL_DATAFRAME_MAX_ROWS=10000Limit rows loaded from files

Functions

I/O

DataFrame.readCsv

String -> Result DataFrame DataFrameError

Read a CSV file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

// Local file
DataFrame.readCsv "data.csv"

// Remote file
DataFrame.readCsv "https://example.com/data.csv"
Try it

Notes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readCsvColumns, DataFrame.readJson, DataFrame.readParquet

DataFrame.readCsvColumns

[DataFrameColumn] -> String -> Result DataFrame DataFrameError

Read only the specified columns from a CSV file. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

DataFrame.readCsvColumns [@name, @age] "data.csv"
DataFrame.readCsvColumns [@name, @age] "https://example.com/data.csv"
Try it

Notes: Column projection is applied at the I/O level for efficient reading. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readCsv

DataFrame.readJson

String -> Result DataFrame DataFrameError

Read a JSON file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

-- Local file
DataFrame.readJson "data.json"

-- Remote file
DataFrame.readJson "https://example.com/data.json"
Try it

Notes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readJsonColumns, DataFrame.readJsonl, DataFrame.readCsv

DataFrame.readJsonColumns

[DataFrameColumn] -> String -> Result DataFrame DataFrameError

Read only the specified columns from a JSON file. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

DataFrame.readJsonColumns [@x, @y] "data.json"
DataFrame.readJsonColumns [@x, @y] "https://example.com/data.json"
Try it

Notes: Reads full file then selects columns. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readJson

DataFrame.readJsonl

String -> Result DataFrame DataFrameError

Read an NDJSON (newline-delimited JSON / JSON Lines) file into a DataFrame. Each line must be a JSON object. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

-- Local file
DataFrame.readJsonl "events.jsonl"

-- Remote file
DataFrame.readJsonl "https://example.com/events.jsonl"
Try it

Notes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory). Use readJson for JSON array files ([{...}, ...]); use readJsonl for NDJSON files where each line is an independent JSON object.

Struct flattening: Nested JSON object columns are automatically expanded into _-separated top-level columns after reading. For example, a column address: {city, zip} becomes @address_city and @address_zip. This matches Jsonl.parseDataFrame behaviour. List<Struct> columns (arrays of objects) are not flattened — they remain as list columns.

Memory warning: Files with deeply nested columns (e.g. a column that is a list of structs with variable keys) can cause very large transient memory allocations during parsing — potentially many times the file size — because Polars materialises the full Arrow representation of every nested value in memory before returning. If your file has such columns and you do not need them, use readJsonlColumns to select only the flat scalar columns you need. This avoids parsing the heavy nested fields entirely.

See also: DataFrame.readJsonlColumns, DataFrame.readJson, DataFrame.readCsv

DataFrame.readJsonlColumns

[DataFrameColumn] -> String -> Result DataFrame DataFrameError

Read only the specified columns from an NDJSON file. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

DataFrame.readJsonlColumns [@id, @timestamp, @value] "events.jsonl"
DataFrame.readJsonlColumns [@id, @timestamp, @value] "https://example.com/events.jsonl"
Try it

Notes: Unselected columns — including large nested fields — are never parsed. The reader infers the full schema from the first 100 rows, builds a subset schema containing only the requested columns, and passes it to the NDJSON byte-level reader so that simd-json never allocates Arrow buffers for the skipped fields. On files with heavy List<Struct> columns this can reduce peak memory from tens of GB to a few hundred MB.

Column name forms: Selected column names may be literal top-level field names (@address) or flattened sub-field names (@address_city). A literal struct name expands to all its sub-columns after flattening (e.g. @address yields @address_city and @address_zip). A flattened sub-field name resolves back to the ancestor struct, which is loaded and then trimmed to only the requested sub-column. List<Struct> columns (arrays of objects) are left as-is and must be named by their literal top-level key.

With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readJsonl, DataFrame.readJsonColumns

DataFrame.readParquet

String -> Result DataFrame DataFrameError

Read a Parquet file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

// Local file
DataFrame.readParquet "data.parquet"

// Remote file
DataFrame.readParquet "https://example.com/data.parquet"
Try it

Notes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readParquetColumns, DataFrame.readCsv

DataFrame.readParquetColumns

[DataFrameColumn] -> String -> Result DataFrame DataFrameError

Read only the specified columns from a Parquet file. Accepts both local file paths and remote URLs (http://, https://).

Example:
import DataFrame

DataFrame.readParquetColumns [@id, @score] "data.parquet"
DataFrame.readParquetColumns ["id", "score"] "https://example.com/data.parquet"
Try it

Notes: True columnar projection — unneeded columns are never read from disk. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readParquet

DataFrame.readDta

String -> Result DataFrame DataFrameError

Read a STATA .dta file into a DataFrame with metadata. With a string literal path, column names and types are validated at compile time.

Example:
import DataFrame
DataFrame.readDta "data.dta"
Try it

Notes: Preserves variable labels, value labels, and dataset label as metadata. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readDtaColumns, DataFrame.writeDta, DataFrame.readCsv

DataFrame.readDtaColumns

[DataFrameColumn] -> String -> Result DataFrame DataFrameError

Read only the specified columns from a STATA .dta file.

Example:
import DataFrame
DataFrame.readDtaColumns ["var1", "var2"] "data.dta"
Try it

Notes: Reads full file then selects columns. With literal path and columns, validated at compile time. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readDta

DataFrame.writeCsv

String -> DataFrame -> Result Unit DataFrameError

Write a DataFrame to a CSV file.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeCsv "out.csv"
Try it

Notes: File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.writeJson

DataFrame.writeJson

String -> DataFrame -> Result Unit DataFrameError

Write a DataFrame to a JSON file.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeJson "out.json"
Try it

Notes: File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.writeCsv

DataFrame.writeParquet

String -> DataFrame -> Result Unit DataFrameError

Write a DataFrame to a Parquet file, including metadata.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeParquet "data.parquet"
Try it

Notes: Metadata is persisted in the Parquet file and restored by readParquet. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readParquet, DataFrame.writeCsv

DataFrame.writeDta

String -> DataFrame -> Result Unit DataFrameError

Write a DataFrame to a STATA .dta file with metadata.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
df |> DataFrame.writeDta "data.dta"
Try it

Notes: Preserves variable labels, value labels, and dataset label from metadata. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).

See also: DataFrame.readDta, DataFrame.writeCsv

Column Ops

DataFrame.select

[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError

Select columns by name. Returns Ok with the narrowed DataFrame, or Err if a column is not found.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]

df
    |> DataFrame.select [@name, @age]
Try it

Notes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type is narrowed to only the selected columns.

See also: DataFrame.remove, DataFrame.columns

DataFrame.remove

[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError

Remove columns by name. Returns Ok with the updated DataFrame, or Err if a column is not found.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ id = 1, name = "Alice" }, { id = 2, name = "Bob" }]

df
    |> DataFrame.remove [@id]
Try it

Notes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type excludes the removed columns.

See also: DataFrame.select

DataFrame.rename

DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError

Rename a column. Returns Ok with the updated DataFrame, or Err if the column is not found.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ old = 1 }, { old = 2 }]

df
    |> DataFrame.rename @old "new"
Try it

Notes: Column names are strings ("name") or column literals (@name). The old column name is validated at compile time on typed DataFrames. The resulting DataFrame's type reflects the rename.

DataFrame.applyExprs

[(DataFrameColumn, Expr)] -> DataFrame -> Result DataFrame DataFrameError

Add or replace multiple columns using a list of (column, expr) tuples. Returns Ok with the updated DataFrame, or Err if an expression fails.

Each tuple's first element names the output column; the second is the expression to evaluate. Use @col syntax to update an existing column in-place, or a string variable to add a new column.

Example:
import DataFrame

let df = DataFrame.fromRecords [{ price = 10, quantity = 3 }]

-- Update an existing column and add a new column in one call
df |> DataFrame.applyExprs [(@price, @price * 2), (@total, @price * @quantity)]
Try it

Notes: Cross-expression dependencies (where one expression references a column produced by an earlier expression in the same list) are handled automatically by batching into sequential Polars passes.

See also: DataFrame.agg

DataFrame.column

DataFrameColumn -> DataFrame -> Result [Maybe a] DataFrameError

Extract a column as a list of Maybe values (Just x for values, Nothing for nulls).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

case DataFrame.column @name df of
    Ok values -> values
    Err _ -> []
Try it

Notes: Column name as a string ("name") or column literal (@name). Every value is wrapped in Maybe since DataFrame columns are nullable. On typed DataFrames the column name is validated at compile time. On untyped DataFrames a missing column returns Err(DataFrameError::ColumnNotFound).

See also: DataFrame.columns

DataFrame.columns

DataFrame -> [String]

Get the column names of a DataFrame.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.columns df
Try it

See also: DataFrame.dtypes

DataFrame.dtypes

DataFrame -> [(String, String)]

Get column names and their data types.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.dtypes df
Try it

See also: DataFrame.columns

DataFrame.checkSchema

SchemaType -> DataFrame -> Result DataFrame DataFrameError

Validate a DataFrame's schema at runtime. The schema argument is a type alias name or inline record type. An open schema ({ col: T, .. }) allows extra columns; a closed schema requires an exact match. Returns Ok(df) on success or Err(DataFrameError::SchemaMismatch) with a message listing each failing column.

Example:
import DataFrame

type InputSchema = { id: Int, amount: Float, .. }

let df = DataFrame.fromRecords [{ id = 1, amount = 9.99 }]
DataFrame.checkSchema "InputSchema" df
Try it

Notes: The schema name must be passed as a string literal (e.g. "MySchema"). Enum-backed columns are matched against their underlying primitive type (Int or String). SchemaMismatch details list all failing columns with expected vs. actual types.

See also: DataFrame.dtypes, DataFrame.columns, DataFrame.shape

Row Ops

DataFrame.tail

Int -> DataFrame -> Result DataFrame DataFrameError

Take the last n rows. Returns Err(DataFrameError::InvalidArgument) if n is negative.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.tail 5
Try it

See also: DataFrame.head

DataFrame.slice

Int -> Int -> DataFrame -> Result DataFrame DataFrameError

Take a slice of rows from offset with length. Returns Err(DataFrameError::InvalidArgument) if length is negative.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.slice 0 1
Try it

See also: DataFrame.head, DataFrame.tail

DataFrame.sort

[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError

Sort by one or more columns in ascending order. Pass a list of column names; the first column is the primary sort key, subsequent columns break ties. Returns Err(DataFrameError::ColumnNotFound) if any column does not exist at runtime.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Bob", age = 30 }, { name = "Alice", age = 25 }]

df
    |> DataFrame.sort [@name, @age]
Try it

Notes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars can reject a sort at runtime for reasons beyond column existence — for example, sorting a column whose element type does not implement a total order (such as a nested list column). This failure is not preventable at compile time even on a fully typed DataFrame.

See also: DataFrame.sortDesc

DataFrame.sortDesc

[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError

Sort by one or more columns in descending order. Pass a list of column names; the first column is the primary sort key, subsequent columns break ties. Returns Err(DataFrameError::ColumnNotFound) if any column does not exist at runtime.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ salary = 50000 }, { salary = 70000 }]

df
    |> DataFrame.sortDesc [@salary]
Try it

Notes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars can reject a sort at runtime for reasons beyond column existence — for example, sorting a column whose element type does not implement a total order (such as a nested list column). This failure is not preventable at compile time even on a fully typed DataFrame.

See also: DataFrame.sort

DataFrame.unique

[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError

Keep unique rows based on specified columns. Returns Err(DataFrameError::OperationFailed) on failure.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.unique [@name]
Try it

Notes: Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars deduplication can fail at runtime when a column contains a type that does not support equality comparison (such as a floating-point column with NaN values in certain configurations). This failure is not preventable at compile time even on a fully typed DataFrame.

DataFrame.sample

Int -> DataFrame -> Result DataFrame DataFrameError

Randomly sample n rows. Returns Err(DataFrameError::InvalidArgument) if n is negative.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.sample 1
Try it

See also: DataFrame.head

Filters

DataFrame.filter

Expr -> DataFrame -> Result DataFrame DataFrameError

Filter rows using a DataFrame.Expr boolean expression. Returns Ok with the filtered DataFrame, or Err if the expression fails. Always uses the fast Polars path.

Example:
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]

df
    |> DataFrame.filter (@x |> Expr.gt (lit 2))
Try it

Notes: This is the recommended way to filter DataFrames. The expression always compiles to Polars for optimal performance.

See also: DataFrame.Expr.col, DataFrame.Expr.gt

Aggregation

DataFrame.groupBy

[DataFrameColumn] -> DataFrame -> GroupedDataFrame

Group a DataFrame by the given columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "Sales", salary = 60000 }]

df
    |> DataFrame.groupBy [@department]
Try it

Notes: Returns a GroupedDataFrame. Use DataFrame.agg to aggregate.

See also: DataFrame.agg

DataFrame.agg

[Expr] -> GroupedDataFrame -> DataFrame

Aggregate a grouped DataFrame using a list of DataFrame.Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing col
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ group = "A", value = 10 }, { group = "A", value = 20 }, { group = "B", value = 30 }]
let totalExpr = @value |> Expr.sum |> Expr.named "total"
let avgExpr = @value |> Expr.mean |> Expr.named "average"

df |> DataFrame.groupBy [@group] |> DataFrame.agg [totalExpr, avgExpr]
Try it

Notes: Each expression should have an alias set using Expr.named. This defines the output column name.

See also: DataFrame.groupBy, DataFrame.Expr.sum

DataFrame.count

DataFrame -> Int

Get the number of rows in a DataFrame.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

DataFrame.count df
Try it

See also: DataFrame.shape

DataFrame.summary

DataFrame -> DataFrame

Compute summary statistics for all columns. Returns a 10-row DataFrame with a statistic column and one column per source column.

Row order: count, mean, min, max, std, var, median, q25, q75, iqr.

Non-numeric columns have "null" for numeric stat rows.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]

DataFrame.summary df
Try it

Notes: std and var use Bessel's correction (ddof=1). Quantiles use linear interpolation.

See also: DataFrame.mean, DataFrame.std, DataFrame.median, DataFrame.quantile, DataFrame.quantiles

Statistics

DataFrame.mean

DataFrame -> DataFrame

Column-wise arithmetic mean (numeric columns only). Returns a 1-row DataFrame.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 10 }, { v = 20 }, { v = 30 }]
df |> DataFrame.mean
Try it

Notes: Non-numeric columns are excluded. Uses all rows.

See also: DataFrame.median, DataFrame.std, DataFrame.summary

DataFrame.median

DataFrame -> DataFrame

Column-wise median (numeric columns only). Returns a 1-row DataFrame.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 3 }, { v = 2 }]
df |> DataFrame.median
Try it

Notes: Non-numeric columns are excluded.

See also: DataFrame.mean, DataFrame.quantile

DataFrame.std

DataFrame -> DataFrame

Column-wise sample standard deviation (ddof=1, numeric columns only). Returns a 1-row DataFrame.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 2 }, { v = 4 }, { v = 6 }]
df |> DataFrame.std
Try it

Notes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.

See also: DataFrame.var, DataFrame.mean

DataFrame.var

DataFrame -> DataFrame

Column-wise sample variance (ddof=1, numeric columns only). Returns a 1-row DataFrame.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 2 }, { v = 4 }, { v = 6 }]
df |> DataFrame.var
Try it

Notes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.

See also: DataFrame.std, DataFrame.mean

DataFrame.mode

DataFrame -> DataFrame

Column-wise mode (all columns). Returns a 1-row DataFrame with the most frequent value per column. Ties broken by the smallest value.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ g = "A" }, { g = "A" }, { g = "B" }]
df |> DataFrame.mode
Try it

Notes: Operates on all columns, including non-numeric.

See also: DataFrame.mean, DataFrame.median

DataFrame.quantile

Float -> DataFrame -> Result DataFrame DataFrameError

Column-wise quantile (numeric columns only). Returns a 1-row DataFrame. Returns Err(DataFrameError::InvalidArgument) if p is outside [0.0, 1.0].

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 2 }, { v = 3 }, { v = 4 }]
df |> DataFrame.quantile 0.75
Try it

Notes: Uses linear interpolation. Non-numeric columns are excluded. Returns InvalidArgument error if p is outside [0.0, 1.0].

See also: DataFrame.median, DataFrame.summary, DataFrame.quantiles

DataFrame.corr

DataFrame -> DataFrame

Pairwise Pearson correlation matrix (numeric columns, ddof=1). Returns a DataFrame with a variable String column and one Float column per numeric column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 6 }]
df |> DataFrame.corr
Try it

Notes: Diagonal values are 1.0. Non-numeric columns are excluded.

See also: DataFrame.cov, DataFrame.std, DataFrame.corrSpearman

DataFrame.cov

DataFrame -> DataFrame

Pairwise covariance matrix (numeric columns, ddof=1). Returns a DataFrame with a variable String column and one Float column per numeric column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 6 }]
df |> DataFrame.cov
Try it

Notes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.

See also: DataFrame.corr, DataFrame.var

Window

DataFrame.partitionBy

[DataFrameColumn] -> DataFrame -> WindowedDataFrame

Create a windowed DataFrame partitioned by the given columns. This is the entry point for all window function operations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "HR", salary = 60000 }]

df
    |> DataFrame.partitionBy [@department]
Try it

Notes: Column names are strings ("name") or column literals (@name). Partition columns define independent groups for window calculations. Chain with orderBy, ranking, lag/lead, rolling, or cumulative functions.

See also: DataFrame.orderBy, DataFrame.collect

DataFrame.orderBy

[DataFrameColumn] -> WindowedDataFrame -> WindowedDataFrame

Set the ordering columns for a windowed DataFrame. Required before rank, lag, lead, and rolling functions.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1 }, { dept = "Sales", date = 2 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
Try it

Notes: Column names are strings ("name") or column literals (@name). Ordering determines how rows are sequenced within each partition. Must be called before withRank, withDenseRank, withLag, withLead, or rolling functions.

See also: DataFrame.partitionBy, DataFrame.withRank

DataFrame.collect

WindowedDataFrame -> DataFrame

Collect a windowed DataFrame back into a regular DataFrame, materializing all window computations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.withRowNumber "row_num"
    |> DataFrame.collect
Try it

Notes: Must be called at the end of a window function chain to produce a usable DataFrame.

See also: DataFrame.partitionBy

DataFrame.withRowNumber

String -> WindowedDataFrame -> WindowedDataFrame

Add a sequential row number column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.withRowNumber "row_num"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column (symbol or string). Row numbers start at 1. Does not require orderBy.

See also: DataFrame.withRank, DataFrame.withDenseRank

DataFrame.withRank

String -> WindowedDataFrame -> WindowedDataFrame

Add a rank column within each partition. Ties receive the same rank, with gaps after ties (e.g., 1, 2, 2, 4).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@score]
    |> DataFrame.withRank "rank"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column (symbol or string). Requires orderBy to be set first.

See also: DataFrame.withDenseRank, DataFrame.withRowNumber, DataFrame.orderBy

DataFrame.withDenseRank

String -> WindowedDataFrame -> WindowedDataFrame

Add a dense rank column within each partition. Ties receive the same rank, with no gaps (e.g., 1, 2, 2, 3).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@score]
    |> DataFrame.withDenseRank "dense_rank"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column (symbol or string). Requires orderBy to be set first.

See also: DataFrame.withRank, DataFrame.withRowNumber, DataFrame.orderBy

DataFrame.withLag

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a column with the value from a previous row within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withLag "prev_sales" @sales 1
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, offset (number of rows back). Column names are strings ("name") or column literals (@name). Produces Nothing for rows without a previous value. Requires orderBy.

See also: DataFrame.withLead, DataFrame.orderBy

DataFrame.withLead

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a column with the value from a subsequent row within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withLead "next_sales" @sales 1
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, offset (number of rows forward). Column names are strings ("name") or column literals (@name). Produces Nothing for rows without a subsequent value. Requires orderBy.

See also: DataFrame.withLag, DataFrame.orderBy

DataFrame.withRollingSum

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling sum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withRollingSum "sum_3d" @sales 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMean, DataFrame.withCumSum

DataFrame.withRollingMean

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling mean column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withRollingMean "avg_3d" @sales 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingSum, DataFrame.withCumMean

DataFrame.withRollingMin

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling minimum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withRollingMin "min_3d" @sales 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMax, DataFrame.withCumMin

DataFrame.withRollingMax

String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling maximum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withRollingMax "max_3d" @sales 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMin, DataFrame.withCumMax

DataFrame.withCumSum

String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative sum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withCumSum "running_total" @sales
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Computes running total over all preceding rows in the partition.

See also: DataFrame.withCumMean, DataFrame.withRollingSum

DataFrame.withCumMean

String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative mean column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withCumMean "running_avg" @sales
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Computes running average over all preceding rows in the partition.

See also: DataFrame.withCumSum, DataFrame.withRollingMean

DataFrame.withCumMin

String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative minimum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withCumMin "running_min" @sales
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Tracks the minimum value seen so far in the partition.

See also: DataFrame.withCumMax, DataFrame.withRollingMin

DataFrame.withCumMax

String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative maximum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@date]
    |> DataFrame.withCumMax "running_max" @sales
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Tracks the maximum value seen so far in the partition.

See also: DataFrame.withCumMin, DataFrame.withRollingMax

Lazy

DataFrame.lazy

DataFrame -> LazyFrame

Convert a DataFrame to a LazyFrame for deferred, optimized execution.

Example:
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]

df
    |> DataFrame.lazy
    |> DataFrame.lazyFilter (@x |> Expr.gt (lit 2))
    |> DataFrame.lazyCollect
Try it

Notes: LazyFrame enables Polars query optimization: predicate pushdown, projection pushdown, and parallel execution.

See also: DataFrame.lazyCollect, DataFrame.lazyFilter

DataFrame.lazyCollect

LazyFrame -> DataFrame

Materialize a LazyFrame back to a DataFrame, executing the optimized query plan.

Example:
import DataFrame
DataFrame.fromRecords [{ x = 1 }, { x = 2 }] |> DataFrame.lazy |> DataFrame.lazyCollect
Try it

Notes: This triggers the actual computation. Until collect is called, all operations are deferred.

See also: DataFrame.lazy

DataFrame.lazyFilter

Expr -> LazyFrame -> LazyFrame

Filter a LazyFrame using a DataFrame.Expr boolean expression.

Example:
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let filterExpr = @x |> Expr.gt (lit 5)
DataFrame.fromRecords [{ x = 1 }, { x = 10 }] |> DataFrame.lazy |> DataFrame.lazyFilter filterExpr |> DataFrame.lazyCollect
Try it

Notes: The filter is added to the query plan and optimized with other operations.

See also: DataFrame.lazy, DataFrame.filter

DataFrame.lazySelect

[Expr] -> LazyFrame -> LazyFrame

Select columns from a LazyFrame using a list of Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing col
import DataFrame.Expr as Expr
let yRenamed = @y |> Expr.named "y_renamed"
DataFrame.fromRecords [{ x = 1, y = 2 }] |> DataFrame.lazy |> DataFrame.lazySelect [yRenamed] |> DataFrame.lazyCollect
Try it

Notes: Enables projection pushdown - only selected columns are read from files.

See also: DataFrame.lazy, DataFrame.select

DataFrame.lazyApplyExprs

[Expr] -> LazyFrame -> LazyFrame

Add or replace columns in a LazyFrame using a list of Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let doubledExpr = @x |> Expr.mul (lit 2) |> Expr.named "x_doubled"
DataFrame.fromRecords [{ x = 5 }] |> DataFrame.lazy |> DataFrame.lazyApplyExprs [doubledExpr] |> DataFrame.lazyCollect
Try it

Notes: Each expression should have an alias set using Expr.named.

See also: DataFrame.lazy, DataFrame.applyExprs

Multi-DataFrame

DataFrame.join

[DataFrameColumn] -> [DataFrameColumn] -> JoinType -> DataFrame -> DataFrame -> Result DataFrame DataFrameError

Inner join two DataFrames on the given key columns with explicit cardinality validation.

The third argument declares the expected relationship between the join keys:

  • JoinType::OneToOne — each key value appears at most once on both sides (recommended default; raises a runtime error if either side has duplicates)
  • JoinType::OneToMany — each left key value is unique; right side may have duplicates (e.g. joining a parent table to a child table)
  • JoinType::ManyToOne — left side may have duplicates; each right key value is unique (the reverse of OneToMany)
  • JoinType::ManyToMany — both sides may have duplicates (produces a Cartesian product for matching keys; use with care)

Cardinality is enforced at runtime by Polars. A violation raises a JoinCardinalityViolation error instead of silently producing an unexpectedly large result.

Example:
import DataFrame

let users =
    DataFrame.fromRecords
        [ { id = 1, name = "Alice" }
        , { id = 2, name = "Bob" }
        ]

let roles =
    DataFrame.fromRecords
        [ { user_id = 1, role = "admin" }
        , { user_id = 2, role = "viewer" }
        ]

-- OneToOne: each user_id appears exactly once on both sides
users
    |> DataFrame.join [@id] [@user_id] JoinType::OneToOne roles

-- Multi-column join: match on both country and year
let pop =
    DataFrame.fromRecords
        [ { country = "DE", year = 2020, population = 83000000 }
        ]

let gdp =
    DataFrame.fromRecords
        [ { country = "DE", year = 2020, gdp = 3800000000000 }
        ]

pop
    |> DataFrame.join [@country, @year] [@country, @year] JoinType::OneToOne gdp
Try it

Notes: Pass column names as column literals ([@id]) or string lists (["id"]). For single-column joins use a one-element list: [@id]. JoinType is available after import DataFrame.

See also: DataFrame.concat

DataFrame.concat

[DataFrame] -> Result DataFrame DataFrameError

Concatenate a list of DataFrames vertically. Returns Err(DataFrameError::OperationFailed) on failure.

Example:
import DataFrame

let df1 =
    DataFrame.fromRecords [{ name = "Alice" }]

let df2 =
    DataFrame.fromRecords [{ name = "Bob" }]

DataFrame.concat [df1, df2]
Try it

See also: DataFrame.join

DataFrame.pivot

DataFrameColumn -> DataFrameColumn -> DataFrameColumn -> DataFrame -> Result DataFrame DataFrameError

Pivot a DataFrame: spread values from one column into new columns. Returns Err(DataFrameError::OperationFailed) on failure.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ category = "A", date = "Jan", amount = 100 }, { category = "B", date = "Jan", amount = 200 }]

df
    |> DataFrame.pivot @category @date @amount
Try it

Notes: Args: on, index, values. Column names are strings ("name") or column literals (@name).

Conversion

DataFrame.toRecords

DataFrame -> [Record]

Convert a DataFrame to a list of Keel records.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.toRecords df
Try it

Notes: Each row becomes a record with column names as field names.

See also: DataFrame.fromRecords

DataFrame.fromRecords

[Record] -> DataFrame

Create a DataFrame from a list of Keel records.

Example:
import DataFrame

-- Inline record list
DataFrame.fromRecords [{ name = "Alice", age = 30 }]

-- Named schema via type alias — recommended for multi-field records
type alias Row = { name : String, age : Int }
let rows : [Row] = [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
DataFrame.fromRecords rows
Try it

Notes: All records should have the same fields. Use type alias to name a reusable row schema.

See also: DataFrame.toRecords

DataFrame.fromLists

[(String, [a])] -> DataFrame

Create a multi-column DataFrame from a list of (column name, values) tuples.

Example:
import DataFrame

DataFrame.fromLists [("age", [30, 40]), ("name", ["Alice", "Bob"])]
Try it

Notes: Column-oriented data construction. All value lists must have the same length. Supports Maybe-wrapped values. Composes well with List.zip for programmatic column creation. For single-column DataFrames, pass a single-element list: [("col", [values])].

See also: DataFrame.fromRecords

DataFrame.recode

DataFrameColumn -> [(Int, Int)] -> DataFrame -> DataFrame

Recode values in a column according to a mapping. Automatically updates value labels.

Example:
import DataFrame
import ValueLabelSet

let labels = ValueLabelSet.fromList [(1, "Low"), (2, "Medium"), (3, "High")]
let df = (DataFrame.fromRecords [{ score = 1 }, { score = 2 }, { score = 3 }]
    |> DataFrame.setValueLabels @score labels)?

-- Collapse categories: 1 stays 1, 2->1, 3->2
df |> DataFrame.recode @score [(2, 1), (3, 2)]
-- Labels are automatically remapped: 1->"Low", 2->"High"
Try it

Notes: Value labels are automatically updated based on the recode mapping. Values not in the mapping remain unchanged.

See also: DataFrame.setValueLabels

Inspection

DataFrame.shape

DataFrame -> (Int, Int)

Get the shape of a DataFrame as (rows, columns).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.shape df
Try it

See also: DataFrame.count, DataFrame.columns

Metadata

DataFrame.setMeta

String -> a -> DataFrame -> DataFrame

Set a dataset-level metadata key.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

df
    |> DataFrame.setMeta "name" "PISA 2022"
Try it

Notes: Metadata values can be String, Int, Float, Bool, List, or Record.

See also: DataFrame.meta, DataFrame.allMeta

DataFrame.meta

String -> DataFrame -> Maybe a

Get a dataset-level metadata value by key.

Example:
import DataFrame

DataFrame.fromRecords [{ x = 1 }]
    |> DataFrame.setMeta "name" "test"
    |> DataFrame.meta "name"
Try it

See also: DataFrame.setMeta, DataFrame.allMeta

DataFrame.allMeta

DataFrame -> Record

Get all dataset-level metadata as a record.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

DataFrame.allMeta df
Try it

See also: DataFrame.meta, DataFrame.setMeta

DataFrame.setColumnMeta

DataFrameColumn -> String -> a -> DataFrame -> DataFrame

Set a column-level metadata key.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ score = 500 }]

df
    |> DataFrame.setColumnMeta @score "label" "Math score"
Try it

Notes: First arg is column name (symbol or string), second is metadata key.

See also: DataFrame.columnMeta, DataFrame.allColumnMeta

DataFrame.columnMeta

DataFrameColumn -> String -> DataFrame -> Maybe a

Get a column-level metadata value by column and key.

Example:
import DataFrame

DataFrame.fromRecords [{ score = 500 }]
    |> DataFrame.setColumnMeta @score "label" "Math"
    |> DataFrame.columnMeta @score "label"
Try it

See also: DataFrame.setColumnMeta, DataFrame.allColumnMeta

DataFrame.allColumnMeta

DataFrameColumn -> DataFrame -> Record

Get all metadata for a specific column as a record.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ score = 500 }]

DataFrame.allColumnMeta @score df
Try it

See also: DataFrame.columnMeta, DataFrame.setColumnMeta

DataFrame.describeMeta

DataFrame -> DataFrame

Get a summary DataFrame of all metadata (dataset and column level).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

DataFrame.describeMeta df
Try it

Notes: Returns a DataFrame with columns: level, key, value.

See also: DataFrame.allMeta, DataFrame.allColumnMeta

DataFrame.describe

DataFrame -> DataFrame

STATA-style variable overview: returns a DataFrame with one row per column showing name, type, label, value labels, and metadata.

Example:
import DataFrame
import ValueLabelSet
import Result
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df =
    (DataFrame.fromRecords [{ name = "Alice", gender = 1 }]
        |> DataFrame.setVarLabel @name "Person's name"
        |> Result.andThen (DataFrame.setValueLabels @gender gender))?
DataFrame.describe df
Try it

Notes: Returns a DataFrame with columns: name, type, label, values, metadata. Rows are in column order. Value labels show abbreviated form: {1=Male, 2=Female} for ≤5 labels, or "N labels" for more.

See also: DataFrame.describeMeta, DataFrame.describeLabels, DataFrame.varLabels, DataFrame.search

Labels

DataFrame.setVarLabel

DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError

Set a variable label (description) for a column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
case df |> DataFrame.setVarLabel @name "Person's full name" of
    Ok labeled -> DataFrame.varLabel @name labeled
    Err e -> Nothing
Try it

Notes: Variable labels describe what a column represents. They are preserved in STATA files. On typed DataFrames the column name is validated at compile time. On untyped DataFrames a missing column returns Err(DataFrameError::ColumnNotFound).

See also: DataFrame.varLabel, DataFrame.removeVarLabel

DataFrame.varLabel

DataFrameColumn -> DataFrame -> Maybe String

Get the variable label for a column, if any.

Example:
import DataFrame
let df = (DataFrame.fromRecords [{ name = "Alice" }] |> DataFrame.setVarLabel @name "Person's name")?
DataFrame.varLabel @name df  -- Just "Person's name"
Try it

See also: DataFrame.setVarLabel, DataFrame.varLabels

DataFrame.varLabels

DataFrame -> { String : String }

Get all variable labels as a record.

Example:
import DataFrame
import Result
let df =
    (DataFrame.fromRecords [{ name = "Alice", age = 30 }]
        |> DataFrame.setVarLabel @name "Person's name"
        |> Result.andThen (DataFrame.setVarLabel @age "Age in years"))?
DataFrame.varLabels df  -- { name = "Person's name", age = "Age in years" }
Try it

See also: DataFrame.varLabel, DataFrame.setVarLabel

DataFrame.removeVarLabel

DataFrameColumn -> DataFrame -> DataFrame

Remove the variable label from a column.

Example:
import DataFrame
let df = (DataFrame.fromRecords [{ name = "Alice" }] |> DataFrame.setVarLabel @name "Person's name")?
df |> DataFrame.removeVarLabel @name
Try it

See also: DataFrame.setVarLabel, DataFrame.varLabel

DataFrame.setValueLabels

DataFrameColumn -> ValueLabelSet -> DataFrame -> Result DataFrame DataFrameError

Attach value labels to a column (lenient - allows unlabeled values).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
case df |> DataFrame.setValueLabels @gender gender of
    Ok labeled -> labeled
    Err e -> df
Try it

Notes: Value labels map numeric codes to human-readable labels. Use setValueLabelsStrict for exhaustive validation. Returns Err(DataFrameError::ColumnNotFound) if the column does not exist.

See also: DataFrame.setValueLabelsStrict, DataFrame.valueLabels

DataFrame.setValueLabelsStrict

DataFrameColumn -> ValueLabelSet -> DataFrame -> Result DataFrame DataFrameError

Attach value labels to a column (strict - all values must have labels).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
case df |> DataFrame.setValueLabelsStrict @gender gender of
    Ok labeled -> labeled
    Err e -> df
Try it

Notes: Returns Err(DataFrameError::OperationFailed) if any value in the column lacks a corresponding label. Returns Err(DataFrameError::ColumnNotFound) if the column does not exist.

See also: DataFrame.setValueLabels, DataFrame.valueLabels

DataFrame.valueLabels

DataFrameColumn -> DataFrame -> Maybe ValueLabelSet

Get the value labels for a column, if any.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.valueLabels @gender df  -- Just (ValueLabelSet)
Try it

See also: DataFrame.setValueLabels, DataFrame.allValueLabels

DataFrame.allValueLabels

DataFrame -> { String : ValueLabelSet }

Get all value labels as a record mapping column names to ValueLabelSets.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.allValueLabels df
Try it

See also: DataFrame.valueLabels, DataFrame.setValueLabels

DataFrame.removeValueLabels

DataFrameColumn -> DataFrame -> DataFrame

Remove value labels from a column.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
df |> DataFrame.removeValueLabels @gender
Try it

See also: DataFrame.setValueLabels, DataFrame.valueLabels

DataFrame.setDisplayMode

DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError

Set how a column's values should be displayed (Raw, Labeled, or Both).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
case df |> DataFrame.setValueLabels @gender gender of
    Ok labeled ->
        case labeled |> DataFrame.setDisplayMode @gender "Labeled" of
            Ok final -> final
            Err _ -> labeled
    Err _ -> df
Try it

Notes: Display modes: "Raw" shows only the value, "Labeled" shows only the label, "Both" (default) shows "value (label)". Returns Err(DataFrameError::InvalidArgument) if the mode string is not one of these three values.

See also: DataFrame.displayMode, DataFrame.setValueLabels

DataFrame.displayMode

DataFrameColumn -> DataFrame -> String

Get the display mode for a column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ gender = 1 }]
DataFrame.displayMode @gender df  -- "Both"
Try it

Notes: Returns "Both" (the default) if no display mode has been set.

See also: DataFrame.setDisplayMode, DataFrame.setValueLabels

DataFrame.describeLabel

DataFrameColumn -> DataFrame -> String

Describe value labels for a single column as a formatted table.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.describeLabel @gender df
Try it

Notes: Returns a formatted table with Value and Label columns. Returns a message if the column has no value labels.

See also: DataFrame.describeLabels, DataFrame.valueLabels

DataFrame.describeLabels

DataFrame -> String

Describe all value labels in a DataFrame as a formatted string.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.describeLabels df
Try it

Notes: Lists all columns with value labels, sorted by column name. Returns "(no value labels)" if none are set.

See also: DataFrame.describeLabel, DataFrame.valueLabels, DataFrame.allValueLabels, DataFrame.describeMeta

Lineage

DataFrame.lineage

DataFrame -> Record

Get complete lineage metadata for all columns, including origins and transformations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.lineage
Try it

Notes: Returns a Record with 'columns' (per-column lineage) and 'globalOperations' (operations affecting all columns).

See also: DataFrame.columnLineage

DataFrame.columnLineage

DataFrameColumn -> DataFrame -> Maybe Record

Get lineage metadata for a specific column.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

DataFrame.columnLineage @name df
Try it

Notes: Returns Just Record with origin, transformations, and dependencies, or Nothing if column not found.

See also: DataFrame.lineage

DataFrame.sourcePath

DataFrame -> Maybe String

Get the source file path of a DataFrame, if it was read from a file.

Example:
import DataFrame

let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.sourcePath df
Try it

Notes: Returns Just path for file-sourced DataFrames, Nothing for DataFrames created from records or other sources.

See also: DataFrame.lineage, DataFrame.parents

DataFrame.parents

DataFrame -> [Record]

Get the parent DataFrames in the lineage DAG.

Example:
import DataFrame

let df = DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
let selected = case (df |> DataFrame.select [@name]) of
    Ok d -> d
    Err _ -> DataFrame.fromRecords []
DataFrame.parents selected
Try it

Notes: Returns a list of records with 'id', 'name', and 'operation' fields. Root DataFrames (from file reads or fromRecords) have no parents.

See also: DataFrame.lineage, DataFrame.sourcePath

DataFrame.lineageById

String -> Maybe Record

Look up a DataFrame's lineage by its UUID.

Example:
import DataFrame

-- Look up a DataFrame by its ID (returns Nothing if not found)
DataFrame.lineageById "00000000-0000-0000-0000-000000000000"
Try it

Notes: Returns Just Record if a DataFrame with that ID has been created in this session, Nothing otherwise.

See also: DataFrame.lineageByName, DataFrame.lineage

DataFrame.lineageByName

String -> [Record]

Look up DataFrames by display name (case-insensitive substring match).

Example:
import DataFrame

let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }]
DataFrame.lineageByName "fromRecords"
Try it

Notes: Returns a list of lineage records whose display name contains the search string (case-insensitive).

See also: DataFrame.lineageById, DataFrame.lineage

Other

DataFrame._checkSchemaImpl

CheckSchemaSpec -> DataFrame -> Result DataFrame DataFrameError

Internal: runtime schema validation. Use DataFrame.checkSchema instead.

DataFrame._fromRecordsWithLabels

ValueLabelsMap -> [Record] -> DataFrame

Internal: like fromRecords but with a compile-time value label map injected by the compiler.

DataFrame.corrSpearman

DataFrame -> DataFrame

Pairwise Spearman rank correlation matrix (numeric columns). Returns a DataFrame with a variable String column and one Float column per numeric column. Diagonal values are 1.0. Measures monotone association between columns; more robust to outliers than Pearson correlation.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 5 }]
df |> DataFrame.corrSpearman
Try it

Notes: Diagonal values are 1.0. Non-numeric columns are excluded. NaN values in rank computation are treated as the largest rank value.

See also: DataFrame.corr, DataFrame.cov

DataFrame.melt

[DataFrameColumn] -> [String] -> String -> String -> DataFrame -> Result DataFrame DataFrameError

Reshape a wide DataFrame to long format. Provide id columns to keep, column prefixes (one per output value column), the separator, and the name for the new index column. Returns Err(DataFrameError::UnsupportedOperation) if no matching columns are found.

Example:
import DataFrame

let wide =
    DataFrame.fromRecords
        [ { nr = 1, var1_year1 = 10, var1_year2 = 20, var2_year1 = 30, var2_year2 = 40 }
        , { nr = 2, var1_year1 = 50, var1_year2 = 60, var2_year1 = 70, var2_year2 = 80 }
        ]

-- Result: columns nr, year (Int), var1, var2 -- 4 rows
wide |> DataFrame.melt [@nr] ["var1_", "var2_"] "_" "year"
Try it

Notes: Stem name is the prefix with the trailing separator stripped ("var1_" → "var1"). Suffix is parsed as Int if all values are numeric, otherwise String.

See also: DataFrame.pivot

DataFrame.quantiles

[Float] -> DataFrame -> Result DataFrame DataFrameError

Column-wise quantiles (numeric columns only). Accepts a list of quantile levels and returns a DataFrame with one row per level. The result has a leading quantile Float column recording each level, followed by one Float column per numeric column. Returns Err(DataFrameError::InvalidArgument) if any level is outside [0.0, 1.0].

Example:
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 2 }, { v = 3 }, { v = 4 }]
case df |> DataFrame.quantiles [0.25, 0.5, 0.75] of
    Ok q  -> q
    Err _ -> DataFrame.fromRecords []
Try it

Notes: Uses linear interpolation. Non-numeric columns are excluded. Quantile rows appear in the same order as the input list; duplicates are preserved.

See also: DataFrame.quantile, DataFrame.median, DataFrame.summary