DataFrame Module
Tabular data manipulation backed by Polars DataFrames.
The DataFrame module provides a pipe-friendly API for loading, transforming, filtering, and aggregating tabular data. DataFrames are opaque native objects that can only be manipulated through DataFrame module functions.
Common patterns
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let data = DataFrame.readCsv "employees.csv"
let summary = data
|> DataFrame.select ["name", "department", "salary"]
|> DataFrame.filter (@salary |> Expr.gt (lit 50000))
|> DataFrame.sort "salary"
|> DataFrame.head 20
IO.println (DataFrame.shape summary)
Display
DataFrames render as formatted, column-aligned tables when printed or displayed in the REPL. Output includes shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show the first 5 and last 5 rows with a … separator:
shape: (1000, 3)
name | age | city
str | i64 | str
-------+-----+---------
Alice | 30 | New York
Bob | 25 | London
… | … | …
Yara | 31 | Berlin
Zach | 22 | Tokyo
Security
| Variable | Effect |
|---|---|
KEEL_DATAFRAME_DISABLED=1 | Disable DataFrame operations |
KEEL_DATAFRAME_SANDBOX=/path | Restrict file I/O to directory |
KEEL_DATAFRAME_MAX_ROWS=10000 | Limit rows loaded from files |
Functions
I/O
DataFrame.readCsv
String -> Result DataFrame DataFrameError
Read a CSV file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
// Local file
DataFrame.readCsv "data.csv"
// Remote file
DataFrame.readCsv "https://example.com/data.csv"Try itNotes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readCsvColumns, DataFrame.readJson, DataFrame.readParquet
DataFrame.readCsvColumns
[DataFrameColumn] -> String -> Result DataFrame DataFrameError
Read only the specified columns from a CSV file. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
DataFrame.readCsvColumns [@name, @age] "data.csv"
DataFrame.readCsvColumns [@name, @age] "https://example.com/data.csv"Try itNotes: Column projection is applied at the I/O level for efficient reading. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readCsv
DataFrame.readJson
String -> Result DataFrame DataFrameError
Read a JSON file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
-- Local file
DataFrame.readJson "data.json"
-- Remote file
DataFrame.readJson "https://example.com/data.json"Try itNotes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readJsonColumns, DataFrame.readJsonl, DataFrame.readCsv
DataFrame.readJsonColumns
[DataFrameColumn] -> String -> Result DataFrame DataFrameError
Read only the specified columns from a JSON file. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
DataFrame.readJsonColumns [@x, @y] "data.json"
DataFrame.readJsonColumns [@x, @y] "https://example.com/data.json"Try itNotes: Reads full file then selects columns. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readJson
DataFrame.readJsonl
String -> Result DataFrame DataFrameError
Read an NDJSON (newline-delimited JSON / JSON Lines) file into a DataFrame. Each line must be a JSON object. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
-- Local file
DataFrame.readJsonl "events.jsonl"
-- Remote file
DataFrame.readJsonl "https://example.com/events.jsonl"Try itNotes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory). Use readJson for JSON array files ([{...}, ...]); use readJsonl for NDJSON files where each line is an independent JSON object.
Struct flattening: Nested JSON object columns are automatically expanded into _-separated top-level columns after reading. For example, a column address: {city, zip} becomes @address_city and @address_zip. This matches Jsonl.parseDataFrame behaviour. List<Struct> columns (arrays of objects) are not flattened — they remain as list columns.
Memory warning: Files with deeply nested columns (e.g. a column that is a list of structs with variable keys) can cause very large transient memory allocations during parsing — potentially many times the file size — because Polars materialises the full Arrow representation of every nested value in memory before returning. If your file has such columns and you do not need them, use readJsonlColumns to select only the flat scalar columns you need. This avoids parsing the heavy nested fields entirely.
See also: DataFrame.readJsonlColumns, DataFrame.readJson, DataFrame.readCsv
DataFrame.readJsonlColumns
[DataFrameColumn] -> String -> Result DataFrame DataFrameError
Read only the specified columns from an NDJSON file. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
DataFrame.readJsonlColumns [@id, @timestamp, @value] "events.jsonl"
DataFrame.readJsonlColumns [@id, @timestamp, @value] "https://example.com/events.jsonl"Try itNotes: Unselected columns — including large nested fields — are never parsed. The reader infers the full schema from the first 100 rows, builds a subset schema containing only the requested columns, and passes it to the NDJSON byte-level reader so that simd-json never allocates Arrow buffers for the skipped fields. On files with heavy List<Struct> columns this can reduce peak memory from tens of GB to a few hundred MB.
Column name forms: Selected column names may be literal top-level field names (@address) or flattened sub-field names (@address_city). A literal struct name expands to all its sub-columns after flattening (e.g. @address yields @address_city and @address_zip). A flattened sub-field name resolves back to the ancestor struct, which is loaded and then trimmed to only the requested sub-column. List<Struct> columns (arrays of objects) are left as-is and must be named by their literal top-level key.
With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readJsonl, DataFrame.readJsonColumns
DataFrame.readParquet
String -> Result DataFrame DataFrameError
Read a Parquet file into a DataFrame. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
// Local file
DataFrame.readParquet "data.parquet"
// Remote file
DataFrame.readParquet "https://example.com/data.parquet"Try itNotes: With a string literal path, column names and types are validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readParquetColumns, DataFrame.readCsv
DataFrame.readParquetColumns
[DataFrameColumn] -> String -> Result DataFrame DataFrameError
Read only the specified columns from a Parquet file. Accepts both local file paths and remote URLs (http://, https://).
import DataFrame
DataFrame.readParquetColumns [@id, @score] "data.parquet"
DataFrame.readParquetColumns ["id", "score"] "https://example.com/data.parquet"Try itNotes: True columnar projection — unneeded columns are never read from disk. With literal path and columns, validated at compile time. Remote URLs require KEEL_HTTP_DISABLED=0. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readParquet
DataFrame.readDta
String -> Result DataFrame DataFrameError
Read a STATA .dta file into a DataFrame with metadata. With a string literal path, column names and types are validated at compile time.
import DataFrame
DataFrame.readDta "data.dta"Try itNotes: Preserves variable labels, value labels, and dataset label as metadata. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readDtaColumns, DataFrame.writeDta, DataFrame.readCsv
DataFrame.readDtaColumns
[DataFrameColumn] -> String -> Result DataFrame DataFrameError
Read only the specified columns from a STATA .dta file.
import DataFrame
DataFrame.readDtaColumns ["var1", "var2"] "data.dta"Try itNotes: Reads full file then selects columns. With literal path and columns, validated at compile time. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readDta
DataFrame.writeCsv
String -> DataFrame -> Result Unit DataFrameError
Write a DataFrame to a CSV file.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeCsv "out.csv"Try itNotes: File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.writeJson
DataFrame.writeJson
String -> DataFrame -> Result Unit DataFrameError
Write a DataFrame to a JSON file.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeJson "out.json"Try itNotes: File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.writeCsv
DataFrame.writeParquet
String -> DataFrame -> Result Unit DataFrameError
Write a DataFrame to a Parquet file, including metadata.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeParquet "data.parquet"Try itNotes: Metadata is persisted in the Parquet file and restored by readParquet. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readParquet, DataFrame.writeCsv
DataFrame.writeDta
String -> DataFrame -> Result Unit DataFrameError
Write a DataFrame to a STATA .dta file with metadata.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
df |> DataFrame.writeDta "data.dta"Try itNotes: Preserves variable labels, value labels, and dataset label from metadata. File paths use two-mode resolution: a path starting with ./ or ../ resolves relative to the calling file; a bare path resolves from the project root (keel.toml directory).
See also: DataFrame.readDta, DataFrame.writeCsv
Column Ops
DataFrame.select
[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError
Select columns by name. Returns Ok with the narrowed DataFrame, or Err if a column is not found.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
df
|> DataFrame.select [@name, @age]Try itNotes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type is narrowed to only the selected columns.
See also: DataFrame.remove, DataFrame.columns
DataFrame.remove
[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError
Remove columns by name. Returns Ok with the updated DataFrame, or Err if a column is not found.
import DataFrame
let df =
DataFrame.fromRecords [{ id = 1, name = "Alice" }, { id = 2, name = "Bob" }]
df
|> DataFrame.remove [@id]Try itNotes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type excludes the removed columns.
See also: DataFrame.select
DataFrame.rename
DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError
Rename a column. Returns Ok with the updated DataFrame, or Err if the column is not found.
import DataFrame
let df =
DataFrame.fromRecords [{ old = 1 }, { old = 2 }]
df
|> DataFrame.rename @old "new"Try itNotes: Column names are strings ("name") or column literals (@name). The old column name is validated at compile time on typed DataFrames. The resulting DataFrame's type reflects the rename.
DataFrame.applyExprs
[(DataFrameColumn, Expr)] -> DataFrame -> Result DataFrame DataFrameError
Add or replace multiple columns using a list of (column, expr) tuples. Returns Ok with the updated DataFrame, or Err if an expression fails.
Each tuple's first element names the output column; the second is the expression to evaluate. Use @col syntax to update an existing column in-place, or a string variable to add a new column.
import DataFrame
let df = DataFrame.fromRecords [{ price = 10, quantity = 3 }]
-- Update an existing column and add a new column in one call
df |> DataFrame.applyExprs [(@price, @price * 2), (@total, @price * @quantity)]Try itNotes: Cross-expression dependencies (where one expression references a column produced by an earlier expression in the same list) are handled automatically by batching into sequential Polars passes.
See also: DataFrame.agg
DataFrame.column
DataFrameColumn -> DataFrame -> Result [Maybe a] DataFrameError
Extract a column as a list of Maybe values (Just x for values, Nothing for nulls).
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
case DataFrame.column @name df of
Ok values -> values
Err _ -> []Try itNotes: Column name as a string ("name") or column literal (@name). Every value is wrapped in Maybe since DataFrame columns are nullable. On typed DataFrames the column name is validated at compile time. On untyped DataFrames a missing column returns Err(DataFrameError::ColumnNotFound).
See also: DataFrame.columns
DataFrame.columns
DataFrame -> [String]
Get the column names of a DataFrame.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.columns dfTry itSee also: DataFrame.dtypes
DataFrame.dtypes
DataFrame -> [(String, String)]
Get column names and their data types.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.dtypes dfTry itSee also: DataFrame.columns
DataFrame.checkSchema
SchemaType -> DataFrame -> Result DataFrame DataFrameError
Validate a DataFrame's schema at runtime. The schema argument is a type alias name or inline record type. An open schema ({ col: T, .. }) allows extra columns; a closed schema requires an exact match. Returns Ok(df) on success or Err(DataFrameError::SchemaMismatch) with a message listing each failing column.
import DataFrame
type InputSchema = { id: Int, amount: Float, .. }
let df = DataFrame.fromRecords [{ id = 1, amount = 9.99 }]
DataFrame.checkSchema "InputSchema" dfTry itNotes: The schema name must be passed as a string literal (e.g. "MySchema"). Enum-backed columns are matched against their underlying primitive type (Int or String). SchemaMismatch details list all failing columns with expected vs. actual types.
See also: DataFrame.dtypes, DataFrame.columns, DataFrame.shape
Row Ops
DataFrame.head
Int -> DataFrame -> Result DataFrame DataFrameError
Take the first n rows. Returns Err(DataFrameError::InvalidArgument) if n is negative.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.head 10Try itSee also: DataFrame.tail, DataFrame.slice
DataFrame.tail
Int -> DataFrame -> Result DataFrame DataFrameError
Take the last n rows. Returns Err(DataFrameError::InvalidArgument) if n is negative.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.tail 5Try itSee also: DataFrame.head
DataFrame.slice
Int -> Int -> DataFrame -> Result DataFrame DataFrameError
Take a slice of rows from offset with length. Returns Err(DataFrameError::InvalidArgument) if length is negative.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.slice 0 1Try itSee also: DataFrame.head, DataFrame.tail
DataFrame.sort
[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError
Sort by one or more columns in ascending order. Pass a list of column names; the first column is the primary sort key, subsequent columns break ties. Returns Err(DataFrameError::ColumnNotFound) if any column does not exist at runtime.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Bob", age = 30 }, { name = "Alice", age = 25 }]
df
|> DataFrame.sort [@name, @age]Try itNotes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars can reject a sort at runtime for reasons beyond column existence — for example, sorting a column whose element type does not implement a total order (such as a nested list column). This failure is not preventable at compile time even on a fully typed DataFrame.
See also: DataFrame.sortDesc
DataFrame.sortDesc
[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError
Sort by one or more columns in descending order. Pass a list of column names; the first column is the primary sort key, subsequent columns break ties. Returns Err(DataFrameError::ColumnNotFound) if any column does not exist at runtime.
import DataFrame
let df =
DataFrame.fromRecords [{ salary = 50000 }, { salary = 70000 }]
df
|> DataFrame.sortDesc [@salary]Try itNotes: Column names are strings ("name") or column literals (@name). Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars can reject a sort at runtime for reasons beyond column existence — for example, sorting a column whose element type does not implement a total order (such as a nested list column). This failure is not preventable at compile time even on a fully typed DataFrame.
See also: DataFrame.sort
DataFrame.unique
[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError
Keep unique rows based on specified columns. Returns Err(DataFrameError::OperationFailed) on failure.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.unique [@name]Try itNotes: Column names are validated at compile time on typed DataFrames. Returns Result rather than a bare DataFrame because Polars deduplication can fail at runtime when a column contains a type that does not support equality comparison (such as a floating-point column with NaN values in certain configurations). This failure is not preventable at compile time even on a fully typed DataFrame.
DataFrame.sample
Int -> DataFrame -> Result DataFrame DataFrameError
Randomly sample n rows. Returns Err(DataFrameError::InvalidArgument) if n is negative.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.sample 1Try itSee also: DataFrame.head
Filters
DataFrame.filter
Expr -> DataFrame -> Result DataFrame DataFrameError
Filter rows using a DataFrame.Expr boolean expression. Returns Ok with the filtered DataFrame, or Err if the expression fails. Always uses the fast Polars path.
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]
df
|> DataFrame.filter (@x |> Expr.gt (lit 2))Try itNotes: This is the recommended way to filter DataFrames. The expression always compiles to Polars for optimal performance.
See also: DataFrame.Expr.col, DataFrame.Expr.gt
Aggregation
DataFrame.groupBy
[DataFrameColumn] -> DataFrame -> GroupedDataFrame
Group a DataFrame by the given columns.
import DataFrame
let df =
DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "Sales", salary = 60000 }]
df
|> DataFrame.groupBy [@department]Try itNotes: Returns a GroupedDataFrame. Use DataFrame.agg to aggregate.
See also: DataFrame.agg
DataFrame.agg
[Expr] -> GroupedDataFrame -> DataFrame
Aggregate a grouped DataFrame using a list of DataFrame.Expr expressions.
import DataFrame
import DataFrame.Expr exposing col
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ group = "A", value = 10 }, { group = "A", value = 20 }, { group = "B", value = 30 }]
let totalExpr = @value |> Expr.sum |> Expr.named "total"
let avgExpr = @value |> Expr.mean |> Expr.named "average"
df |> DataFrame.groupBy [@group] |> DataFrame.agg [totalExpr, avgExpr]Try itNotes: Each expression should have an alias set using Expr.named. This defines the output column name.
See also: DataFrame.groupBy, DataFrame.Expr.sum
DataFrame.count
DataFrame -> Int
Get the number of rows in a DataFrame.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
DataFrame.count dfTry itSee also: DataFrame.shape
DataFrame.summary
DataFrame -> DataFrame
Compute summary statistics for all columns. Returns a 10-row DataFrame with a statistic column and one column per source column.
Row order: count, mean, min, max, std, var, median, q25, q75, iqr.
Non-numeric columns have "null" for numeric stat rows.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
DataFrame.summary dfTry itNotes: std and var use Bessel's correction (ddof=1). Quantiles use linear interpolation.
See also: DataFrame.mean, DataFrame.std, DataFrame.median, DataFrame.quantile, DataFrame.quantiles
Statistics
DataFrame.mean
DataFrame -> DataFrame
Column-wise arithmetic mean (numeric columns only). Returns a 1-row DataFrame.
import DataFrame
let df = DataFrame.fromRecords [{ v = 10 }, { v = 20 }, { v = 30 }]
df |> DataFrame.meanTry itNotes: Non-numeric columns are excluded. Uses all rows.
See also: DataFrame.median, DataFrame.std, DataFrame.summary
DataFrame.median
DataFrame -> DataFrame
Column-wise median (numeric columns only). Returns a 1-row DataFrame.
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 3 }, { v = 2 }]
df |> DataFrame.medianTry itNotes: Non-numeric columns are excluded.
See also: DataFrame.mean, DataFrame.quantile
DataFrame.std
DataFrame -> DataFrame
Column-wise sample standard deviation (ddof=1, numeric columns only). Returns a 1-row DataFrame.
import DataFrame
let df = DataFrame.fromRecords [{ v = 2 }, { v = 4 }, { v = 6 }]
df |> DataFrame.stdTry itNotes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.
See also: DataFrame.var, DataFrame.mean
DataFrame.var
DataFrame -> DataFrame
Column-wise sample variance (ddof=1, numeric columns only). Returns a 1-row DataFrame.
import DataFrame
let df = DataFrame.fromRecords [{ v = 2 }, { v = 4 }, { v = 6 }]
df |> DataFrame.varTry itNotes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.
See also: DataFrame.std, DataFrame.mean
DataFrame.mode
DataFrame -> DataFrame
Column-wise mode (all columns). Returns a 1-row DataFrame with the most frequent value per column. Ties broken by the smallest value.
import DataFrame
let df = DataFrame.fromRecords [{ g = "A" }, { g = "A" }, { g = "B" }]
df |> DataFrame.modeTry itNotes: Operates on all columns, including non-numeric.
See also: DataFrame.mean, DataFrame.median
DataFrame.quantile
Float -> DataFrame -> Result DataFrame DataFrameError
Column-wise quantile (numeric columns only). Returns a 1-row DataFrame. Returns Err(DataFrameError::InvalidArgument) if p is outside [0.0, 1.0].
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 2 }, { v = 3 }, { v = 4 }]
df |> DataFrame.quantile 0.75Try itNotes: Uses linear interpolation. Non-numeric columns are excluded. Returns InvalidArgument error if p is outside [0.0, 1.0].
See also: DataFrame.median, DataFrame.summary, DataFrame.quantiles
DataFrame.corr
DataFrame -> DataFrame
Pairwise Pearson correlation matrix (numeric columns, ddof=1). Returns a DataFrame with a variable String column and one Float column per numeric column.
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 6 }]
df |> DataFrame.corrTry itNotes: Diagonal values are 1.0. Non-numeric columns are excluded.
See also: DataFrame.cov, DataFrame.std, DataFrame.corrSpearman
DataFrame.cov
DataFrame -> DataFrame
Pairwise covariance matrix (numeric columns, ddof=1). Returns a DataFrame with a variable String column and one Float column per numeric column.
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 6 }]
df |> DataFrame.covTry itNotes: Uses Bessel's correction (ddof=1). Non-numeric columns are excluded.
See also: DataFrame.corr, DataFrame.var
Window
DataFrame.partitionBy
[DataFrameColumn] -> DataFrame -> WindowedDataFrame
Create a windowed DataFrame partitioned by the given columns. This is the entry point for all window function operations.
import DataFrame
let df =
DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "HR", salary = 60000 }]
df
|> DataFrame.partitionBy [@department]Try itNotes: Column names are strings ("name") or column literals (@name). Partition columns define independent groups for window calculations. Chain with orderBy, ranking, lag/lead, rolling, or cumulative functions.
See also: DataFrame.orderBy, DataFrame.collect
DataFrame.orderBy
[DataFrameColumn] -> WindowedDataFrame -> WindowedDataFrame
Set the ordering columns for a windowed DataFrame. Required before rank, lag, lead, and rolling functions.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1 }, { dept = "Sales", date = 2 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]Try itNotes: Column names are strings ("name") or column literals (@name). Ordering determines how rows are sequenced within each partition. Must be called before withRank, withDenseRank, withLag, withLead, or rolling functions.
See also: DataFrame.partitionBy, DataFrame.withRank
DataFrame.collect
WindowedDataFrame -> DataFrame
Collect a windowed DataFrame back into a regular DataFrame, materializing all window computations.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.withRowNumber "row_num"
|> DataFrame.collectTry itNotes: Must be called at the end of a window function chain to produce a usable DataFrame.
See also: DataFrame.partitionBy
DataFrame.withRowNumber
String -> WindowedDataFrame -> WindowedDataFrame
Add a sequential row number column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.withRowNumber "row_num"
|> DataFrame.collectTry itNotes: Argument is the name of the new column (symbol or string). Row numbers start at 1. Does not require orderBy.
See also: DataFrame.withRank, DataFrame.withDenseRank
DataFrame.withRank
String -> WindowedDataFrame -> WindowedDataFrame
Add a rank column within each partition. Ties receive the same rank, with gaps after ties (e.g., 1, 2, 2, 4).
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@score]
|> DataFrame.withRank "rank"
|> DataFrame.collectTry itNotes: Argument is the name of the new column (symbol or string). Requires orderBy to be set first.
See also: DataFrame.withDenseRank, DataFrame.withRowNumber, DataFrame.orderBy
DataFrame.withDenseRank
String -> WindowedDataFrame -> WindowedDataFrame
Add a dense rank column within each partition. Ties receive the same rank, with no gaps (e.g., 1, 2, 2, 3).
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@score]
|> DataFrame.withDenseRank "dense_rank"
|> DataFrame.collectTry itNotes: Argument is the name of the new column (symbol or string). Requires orderBy to be set first.
See also: DataFrame.withRank, DataFrame.withRowNumber, DataFrame.orderBy
DataFrame.withLag
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a column with the value from a previous row within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withLag "prev_sales" @sales 1
|> DataFrame.collectTry itNotes: Args: new column name, source column, offset (number of rows back). Column names are strings ("name") or column literals (@name). Produces Nothing for rows without a previous value. Requires orderBy.
See also: DataFrame.withLead, DataFrame.orderBy
DataFrame.withLead
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a column with the value from a subsequent row within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withLead "next_sales" @sales 1
|> DataFrame.collectTry itNotes: Args: new column name, source column, offset (number of rows forward). Column names are strings ("name") or column literals (@name). Produces Nothing for rows without a subsequent value. Requires orderBy.
See also: DataFrame.withLag, DataFrame.orderBy
DataFrame.withRollingSum
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling sum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withRollingSum "sum_3d" @sales 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMean, DataFrame.withCumSum
DataFrame.withRollingMean
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling mean column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withRollingMean "avg_3d" @sales 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingSum, DataFrame.withCumMean
DataFrame.withRollingMin
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling minimum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withRollingMin "min_3d" @sales 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMax, DataFrame.withCumMin
DataFrame.withRollingMax
String -> DataFrameColumn -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling maximum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withRollingMax "max_3d" @sales 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Column names are strings ("name") or column literals (@name). Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMin, DataFrame.withCumMax
DataFrame.withCumSum
String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative sum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withCumSum "running_total" @sales
|> DataFrame.collectTry itNotes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Computes running total over all preceding rows in the partition.
See also: DataFrame.withCumMean, DataFrame.withRollingSum
DataFrame.withCumMean
String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative mean column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withCumMean "running_avg" @sales
|> DataFrame.collectTry itNotes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Computes running average over all preceding rows in the partition.
See also: DataFrame.withCumSum, DataFrame.withRollingMean
DataFrame.withCumMin
String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative minimum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withCumMin "running_min" @sales
|> DataFrame.collectTry itNotes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Tracks the minimum value seen so far in the partition.
See also: DataFrame.withCumMax, DataFrame.withRollingMin
DataFrame.withCumMax
String -> DataFrameColumn -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative maximum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@date]
|> DataFrame.withCumMax "running_max" @sales
|> DataFrame.collectTry itNotes: Args: new column name, source column. Column names are strings ("name") or column literals (@name). Tracks the maximum value seen so far in the partition.
See also: DataFrame.withCumMin, DataFrame.withRollingMax
Lazy
DataFrame.lazy
DataFrame -> LazyFrame
Convert a DataFrame to a LazyFrame for deferred, optimized execution.
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]
df
|> DataFrame.lazy
|> DataFrame.lazyFilter (@x |> Expr.gt (lit 2))
|> DataFrame.lazyCollectTry itNotes: LazyFrame enables Polars query optimization: predicate pushdown, projection pushdown, and parallel execution.
See also: DataFrame.lazyCollect, DataFrame.lazyFilter
DataFrame.lazyCollect
LazyFrame -> DataFrame
Materialize a LazyFrame back to a DataFrame, executing the optimized query plan.
import DataFrame
DataFrame.fromRecords [{ x = 1 }, { x = 2 }] |> DataFrame.lazy |> DataFrame.lazyCollectTry itNotes: This triggers the actual computation. Until collect is called, all operations are deferred.
See also: DataFrame.lazy
DataFrame.lazyFilter
Expr -> LazyFrame -> LazyFrame
Filter a LazyFrame using a DataFrame.Expr boolean expression.
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let filterExpr = @x |> Expr.gt (lit 5)
DataFrame.fromRecords [{ x = 1 }, { x = 10 }] |> DataFrame.lazy |> DataFrame.lazyFilter filterExpr |> DataFrame.lazyCollectTry itNotes: The filter is added to the query plan and optimized with other operations.
See also: DataFrame.lazy, DataFrame.filter
DataFrame.lazySelect
[Expr] -> LazyFrame -> LazyFrame
Select columns from a LazyFrame using a list of Expr expressions.
import DataFrame
import DataFrame.Expr exposing col
import DataFrame.Expr as Expr
let yRenamed = @y |> Expr.named "y_renamed"
DataFrame.fromRecords [{ x = 1, y = 2 }] |> DataFrame.lazy |> DataFrame.lazySelect [yRenamed] |> DataFrame.lazyCollectTry itNotes: Enables projection pushdown - only selected columns are read from files.
See also: DataFrame.lazy, DataFrame.select
DataFrame.lazyApplyExprs
[Expr] -> LazyFrame -> LazyFrame
Add or replace columns in a LazyFrame using a list of Expr expressions.
import DataFrame
import DataFrame.Expr exposing col, lit
import DataFrame.Expr as Expr
let doubledExpr = @x |> Expr.mul (lit 2) |> Expr.named "x_doubled"
DataFrame.fromRecords [{ x = 5 }] |> DataFrame.lazy |> DataFrame.lazyApplyExprs [doubledExpr] |> DataFrame.lazyCollectTry itNotes: Each expression should have an alias set using Expr.named.
See also: DataFrame.lazy, DataFrame.applyExprs
Multi-DataFrame
DataFrame.join
[DataFrameColumn] -> [DataFrameColumn] -> JoinType -> DataFrame -> DataFrame -> Result DataFrame DataFrameError
Inner join two DataFrames on the given key columns with explicit cardinality validation.
The third argument declares the expected relationship between the join keys:
JoinType::OneToOne— each key value appears at most once on both sides (recommended default; raises a runtime error if either side has duplicates)JoinType::OneToMany— each left key value is unique; right side may have duplicates (e.g. joining a parent table to a child table)JoinType::ManyToOne— left side may have duplicates; each right key value is unique (the reverse of OneToMany)JoinType::ManyToMany— both sides may have duplicates (produces a Cartesian product for matching keys; use with care)
Cardinality is enforced at runtime by Polars. A violation raises a JoinCardinalityViolation error instead of silently producing an unexpectedly large result.
import DataFrame
let users =
DataFrame.fromRecords
[ { id = 1, name = "Alice" }
, { id = 2, name = "Bob" }
]
let roles =
DataFrame.fromRecords
[ { user_id = 1, role = "admin" }
, { user_id = 2, role = "viewer" }
]
-- OneToOne: each user_id appears exactly once on both sides
users
|> DataFrame.join [@id] [@user_id] JoinType::OneToOne roles
-- Multi-column join: match on both country and year
let pop =
DataFrame.fromRecords
[ { country = "DE", year = 2020, population = 83000000 }
]
let gdp =
DataFrame.fromRecords
[ { country = "DE", year = 2020, gdp = 3800000000000 }
]
pop
|> DataFrame.join [@country, @year] [@country, @year] JoinType::OneToOne gdpTry itNotes: Pass column names as column literals ([@id]) or string lists (["id"]). For single-column joins use a one-element list: [@id]. JoinType is available after import DataFrame.
See also: DataFrame.concat
DataFrame.concat
[DataFrame] -> Result DataFrame DataFrameError
Concatenate a list of DataFrames vertically. Returns Err(DataFrameError::OperationFailed) on failure.
import DataFrame
let df1 =
DataFrame.fromRecords [{ name = "Alice" }]
let df2 =
DataFrame.fromRecords [{ name = "Bob" }]
DataFrame.concat [df1, df2]Try itSee also: DataFrame.join
DataFrame.pivot
DataFrameColumn -> DataFrameColumn -> DataFrameColumn -> DataFrame -> Result DataFrame DataFrameError
Pivot a DataFrame: spread values from one column into new columns. Returns Err(DataFrameError::OperationFailed) on failure.
import DataFrame
let df =
DataFrame.fromRecords [{ category = "A", date = "Jan", amount = 100 }, { category = "B", date = "Jan", amount = 200 }]
df
|> DataFrame.pivot @category @date @amountTry itNotes: Args: on, index, values. Column names are strings ("name") or column literals (@name).
Conversion
DataFrame.toRecords
DataFrame -> [Record]
Convert a DataFrame to a list of Keel records.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.toRecords dfTry itNotes: Each row becomes a record with column names as field names.
See also: DataFrame.fromRecords
DataFrame.fromRecords
[Record] -> DataFrame
Create a DataFrame from a list of Keel records.
import DataFrame
-- Inline record list
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
-- Named schema via type alias — recommended for multi-field records
type alias Row = { name : String, age : Int }
let rows : [Row] = [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
DataFrame.fromRecords rowsTry itNotes: All records should have the same fields. Use type alias to name a reusable row schema.
See also: DataFrame.toRecords
DataFrame.fromLists
[(String, [a])] -> DataFrame
Create a multi-column DataFrame from a list of (column name, values) tuples.
import DataFrame
DataFrame.fromLists [("age", [30, 40]), ("name", ["Alice", "Bob"])]Try itNotes: Column-oriented data construction. All value lists must have the same length. Supports Maybe-wrapped values. Composes well with List.zip for programmatic column creation. For single-column DataFrames, pass a single-element list: [("col", [values])].
See also: DataFrame.fromRecords
DataFrame.recode
DataFrameColumn -> [(Int, Int)] -> DataFrame -> DataFrame
Recode values in a column according to a mapping. Automatically updates value labels.
import DataFrame
import ValueLabelSet
let labels = ValueLabelSet.fromList [(1, "Low"), (2, "Medium"), (3, "High")]
let df = (DataFrame.fromRecords [{ score = 1 }, { score = 2 }, { score = 3 }]
|> DataFrame.setValueLabels @score labels)?
-- Collapse categories: 1 stays 1, 2->1, 3->2
df |> DataFrame.recode @score [(2, 1), (3, 2)]
-- Labels are automatically remapped: 1->"Low", 2->"High"Try itNotes: Value labels are automatically updated based on the recode mapping. Values not in the mapping remain unchanged.
See also: DataFrame.setValueLabels
Inspection
DataFrame.shape
DataFrame -> (Int, Int)
Get the shape of a DataFrame as (rows, columns).
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.shape dfTry itSee also: DataFrame.count, DataFrame.columns
Metadata
DataFrame.setMeta
String -> a -> DataFrame -> DataFrame
Set a dataset-level metadata key.
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
df
|> DataFrame.setMeta "name" "PISA 2022"Try itNotes: Metadata values can be String, Int, Float, Bool, List, or Record.
See also: DataFrame.meta, DataFrame.allMeta
DataFrame.meta
String -> DataFrame -> Maybe a
Get a dataset-level metadata value by key.
import DataFrame
DataFrame.fromRecords [{ x = 1 }]
|> DataFrame.setMeta "name" "test"
|> DataFrame.meta "name"Try itSee also: DataFrame.setMeta, DataFrame.allMeta
DataFrame.allMeta
DataFrame -> Record
Get all dataset-level metadata as a record.
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
DataFrame.allMeta dfTry itSee also: DataFrame.meta, DataFrame.setMeta
DataFrame.setColumnMeta
DataFrameColumn -> String -> a -> DataFrame -> DataFrame
Set a column-level metadata key.
import DataFrame
let df =
DataFrame.fromRecords [{ score = 500 }]
df
|> DataFrame.setColumnMeta @score "label" "Math score"Try itNotes: First arg is column name (symbol or string), second is metadata key.
See also: DataFrame.columnMeta, DataFrame.allColumnMeta
DataFrame.columnMeta
DataFrameColumn -> String -> DataFrame -> Maybe a
Get a column-level metadata value by column and key.
import DataFrame
DataFrame.fromRecords [{ score = 500 }]
|> DataFrame.setColumnMeta @score "label" "Math"
|> DataFrame.columnMeta @score "label"Try itSee also: DataFrame.setColumnMeta, DataFrame.allColumnMeta
DataFrame.allColumnMeta
DataFrameColumn -> DataFrame -> Record
Get all metadata for a specific column as a record.
import DataFrame
let df =
DataFrame.fromRecords [{ score = 500 }]
DataFrame.allColumnMeta @score dfTry itSee also: DataFrame.columnMeta, DataFrame.setColumnMeta
DataFrame.describeMeta
DataFrame -> DataFrame
Get a summary DataFrame of all metadata (dataset and column level).
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
DataFrame.describeMeta dfTry itNotes: Returns a DataFrame with columns: level, key, value.
See also: DataFrame.allMeta, DataFrame.allColumnMeta
DataFrame.describe
DataFrame -> DataFrame
STATA-style variable overview: returns a DataFrame with one row per column showing name, type, label, value labels, and metadata.
import DataFrame
import ValueLabelSet
import Result
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df =
(DataFrame.fromRecords [{ name = "Alice", gender = 1 }]
|> DataFrame.setVarLabel @name "Person's name"
|> Result.andThen (DataFrame.setValueLabels @gender gender))?
DataFrame.describe dfTry itNotes: Returns a DataFrame with columns: name, type, label, values, metadata. Rows are in column order. Value labels show abbreviated form: {1=Male, 2=Female} for ≤5 labels, or "N labels" for more.
See also: DataFrame.describeMeta, DataFrame.describeLabels, DataFrame.varLabels, DataFrame.search
DataFrame.search
String -> DataFrame -> DataFrame
Search for variables by name, label, value labels, or metadata. Returns matching variables as a DataFrame.
import DataFrame
import ValueLabelSet
import Result
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df =
(DataFrame.fromRecords [{ name = "Alice", gender = 1, income = 50000 }]
|> DataFrame.setVarLabel @name "Person's name"
|> Result.andThen (DataFrame.setVarLabel @income "Annual income in USD")
|> Result.andThen (DataFrame.setValueLabels @gender gender))?
-- Search by variable name
DataFrame.search "name" dfTry itNotes: Case-insensitive substring search across all variable metadata: name, label (description), value labels, and column metadata. Returns a DataFrame with columns: name, type, label, values, metadata.
See also: DataFrame.describe, DataFrame.describeMeta, DataFrame.varLabels
Labels
DataFrame.setVarLabel
DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError
Set a variable label (description) for a column.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
case df |> DataFrame.setVarLabel @name "Person's full name" of
Ok labeled -> DataFrame.varLabel @name labeled
Err e -> NothingTry itNotes: Variable labels describe what a column represents. They are preserved in STATA files. On typed DataFrames the column name is validated at compile time. On untyped DataFrames a missing column returns Err(DataFrameError::ColumnNotFound).
See also: DataFrame.varLabel, DataFrame.removeVarLabel
DataFrame.varLabel
DataFrameColumn -> DataFrame -> Maybe String
Get the variable label for a column, if any.
import DataFrame
let df = (DataFrame.fromRecords [{ name = "Alice" }] |> DataFrame.setVarLabel @name "Person's name")?
DataFrame.varLabel @name df -- Just "Person's name"Try itSee also: DataFrame.setVarLabel, DataFrame.varLabels
DataFrame.varLabels
DataFrame -> { String : String }
Get all variable labels as a record.
import DataFrame
import Result
let df =
(DataFrame.fromRecords [{ name = "Alice", age = 30 }]
|> DataFrame.setVarLabel @name "Person's name"
|> Result.andThen (DataFrame.setVarLabel @age "Age in years"))?
DataFrame.varLabels df -- { name = "Person's name", age = "Age in years" }Try itSee also: DataFrame.varLabel, DataFrame.setVarLabel
DataFrame.removeVarLabel
DataFrameColumn -> DataFrame -> DataFrame
Remove the variable label from a column.
import DataFrame
let df = (DataFrame.fromRecords [{ name = "Alice" }] |> DataFrame.setVarLabel @name "Person's name")?
df |> DataFrame.removeVarLabel @nameTry itSee also: DataFrame.setVarLabel, DataFrame.varLabel
DataFrame.setValueLabels
DataFrameColumn -> ValueLabelSet -> DataFrame -> Result DataFrame DataFrameError
Attach value labels to a column (lenient - allows unlabeled values).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
case df |> DataFrame.setValueLabels @gender gender of
Ok labeled -> labeled
Err e -> dfTry itNotes: Value labels map numeric codes to human-readable labels. Use setValueLabelsStrict for exhaustive validation. Returns Err(DataFrameError::ColumnNotFound) if the column does not exist.
See also: DataFrame.setValueLabelsStrict, DataFrame.valueLabels
DataFrame.setValueLabelsStrict
DataFrameColumn -> ValueLabelSet -> DataFrame -> Result DataFrame DataFrameError
Attach value labels to a column (strict - all values must have labels).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
case df |> DataFrame.setValueLabelsStrict @gender gender of
Ok labeled -> labeled
Err e -> dfTry itNotes: Returns Err(DataFrameError::OperationFailed) if any value in the column lacks a corresponding label. Returns Err(DataFrameError::ColumnNotFound) if the column does not exist.
See also: DataFrame.setValueLabels, DataFrame.valueLabels
DataFrame.valueLabels
DataFrameColumn -> DataFrame -> Maybe ValueLabelSet
Get the value labels for a column, if any.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.valueLabels @gender df -- Just (ValueLabelSet)Try itSee also: DataFrame.setValueLabels, DataFrame.allValueLabels
DataFrame.allValueLabels
DataFrame -> { String : ValueLabelSet }
Get all value labels as a record mapping column names to ValueLabelSets.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.allValueLabels dfTry itSee also: DataFrame.valueLabels, DataFrame.setValueLabels
DataFrame.removeValueLabels
DataFrameColumn -> DataFrame -> DataFrame
Remove value labels from a column.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
df |> DataFrame.removeValueLabels @genderTry itSee also: DataFrame.setValueLabels, DataFrame.valueLabels
DataFrame.setDisplayMode
DataFrameColumn -> String -> DataFrame -> Result DataFrame DataFrameError
Set how a column's values should be displayed (Raw, Labeled, or Both).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
case df |> DataFrame.setValueLabels @gender gender of
Ok labeled ->
case labeled |> DataFrame.setDisplayMode @gender "Labeled" of
Ok final -> final
Err _ -> labeled
Err _ -> dfTry itNotes: Display modes: "Raw" shows only the value, "Labeled" shows only the label, "Both" (default) shows "value (label)". Returns Err(DataFrameError::InvalidArgument) if the mode string is not one of these three values.
See also: DataFrame.displayMode, DataFrame.setValueLabels
DataFrame.displayMode
DataFrameColumn -> DataFrame -> String
Get the display mode for a column.
import DataFrame
let df = DataFrame.fromRecords [{ gender = 1 }]
DataFrame.displayMode @gender df -- "Both"Try itNotes: Returns "Both" (the default) if no display mode has been set.
See also: DataFrame.setDisplayMode, DataFrame.setValueLabels
DataFrame.describeLabel
DataFrameColumn -> DataFrame -> String
Describe value labels for a single column as a formatted table.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.describeLabel @gender dfTry itNotes: Returns a formatted table with Value and Label columns. Returns a message if the column has no value labels.
See also: DataFrame.describeLabels, DataFrame.valueLabels
DataFrame.describeLabels
DataFrame -> String
Describe all value labels in a DataFrame as a formatted string.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = (DataFrame.fromRecords [{ gender = 1 }] |> DataFrame.setValueLabels @gender gender)?
DataFrame.describeLabels dfTry itNotes: Lists all columns with value labels, sorted by column name. Returns "(no value labels)" if none are set.
See also: DataFrame.describeLabel, DataFrame.valueLabels, DataFrame.allValueLabels, DataFrame.describeMeta
Lineage
DataFrame.lineage
DataFrame -> Record
Get complete lineage metadata for all columns, including origins and transformations.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.lineageTry itNotes: Returns a Record with 'columns' (per-column lineage) and 'globalOperations' (operations affecting all columns).
See also: DataFrame.columnLineage
DataFrame.columnLineage
DataFrameColumn -> DataFrame -> Maybe Record
Get lineage metadata for a specific column.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
DataFrame.columnLineage @name dfTry itNotes: Returns Just Record with origin, transformations, and dependencies, or Nothing if column not found.
See also: DataFrame.lineage
DataFrame.sourcePath
DataFrame -> Maybe String
Get the source file path of a DataFrame, if it was read from a file.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.sourcePath dfTry itNotes: Returns Just path for file-sourced DataFrames, Nothing for DataFrames created from records or other sources.
See also: DataFrame.lineage, DataFrame.parents
DataFrame.parents
DataFrame -> [Record]
Get the parent DataFrames in the lineage DAG.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
let selected = case (df |> DataFrame.select [@name]) of
Ok d -> d
Err _ -> DataFrame.fromRecords []
DataFrame.parents selectedTry itNotes: Returns a list of records with 'id', 'name', and 'operation' fields. Root DataFrames (from file reads or fromRecords) have no parents.
See also: DataFrame.lineage, DataFrame.sourcePath
DataFrame.lineageById
String -> Maybe Record
Look up a DataFrame's lineage by its UUID.
import DataFrame
-- Look up a DataFrame by its ID (returns Nothing if not found)
DataFrame.lineageById "00000000-0000-0000-0000-000000000000"Try itNotes: Returns Just Record if a DataFrame with that ID has been created in this session, Nothing otherwise.
See also: DataFrame.lineageByName, DataFrame.lineage
DataFrame.lineageByName
String -> [Record]
Look up DataFrames by display name (case-insensitive substring match).
import DataFrame
let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }]
DataFrame.lineageByName "fromRecords"Try itNotes: Returns a list of lineage records whose display name contains the search string (case-insensitive).
See also: DataFrame.lineageById, DataFrame.lineage
Other
DataFrame._checkSchemaImpl
CheckSchemaSpec -> DataFrame -> Result DataFrame DataFrameError
Internal: runtime schema validation. Use DataFrame.checkSchema instead.
DataFrame._fromRecordsWithLabels
ValueLabelsMap -> [Record] -> DataFrame
Internal: like fromRecords but with a compile-time value label map injected by the compiler.
DataFrame.corrSpearman
DataFrame -> DataFrame
Pairwise Spearman rank correlation matrix (numeric columns). Returns a DataFrame with a variable String column and one Float column per numeric column. Diagonal values are 1.0. Measures monotone association between columns; more robust to outliers than Pearson correlation.
import DataFrame
let df = DataFrame.fromRecords [{ x = 1, y = 2 }, { x = 2, y = 4 }, { x = 3, y = 5 }]
df |> DataFrame.corrSpearmanTry itNotes: Diagonal values are 1.0. Non-numeric columns are excluded. NaN values in rank computation are treated as the largest rank value.
See also: DataFrame.corr, DataFrame.cov
DataFrame.melt
[DataFrameColumn] -> [String] -> String -> String -> DataFrame -> Result DataFrame DataFrameError
Reshape a wide DataFrame to long format. Provide id columns to keep, column prefixes (one per output value column), the separator, and the name for the new index column. Returns Err(DataFrameError::UnsupportedOperation) if no matching columns are found.
import DataFrame
let wide =
DataFrame.fromRecords
[ { nr = 1, var1_year1 = 10, var1_year2 = 20, var2_year1 = 30, var2_year2 = 40 }
, { nr = 2, var1_year1 = 50, var1_year2 = 60, var2_year1 = 70, var2_year2 = 80 }
]
-- Result: columns nr, year (Int), var1, var2 -- 4 rows
wide |> DataFrame.melt [@nr] ["var1_", "var2_"] "_" "year"Try itNotes: Stem name is the prefix with the trailing separator stripped ("var1_" → "var1"). Suffix is parsed as Int if all values are numeric, otherwise String.
See also: DataFrame.pivot
DataFrame.quantiles
[Float] -> DataFrame -> Result DataFrame DataFrameError
Column-wise quantiles (numeric columns only). Accepts a list of quantile levels and returns a DataFrame with one row per level. The result has a leading quantile Float column recording each level, followed by one Float column per numeric column. Returns Err(DataFrameError::InvalidArgument) if any level is outside [0.0, 1.0].
import DataFrame
let df = DataFrame.fromRecords [{ v = 1 }, { v = 2 }, { v = 3 }, { v = 4 }]
case df |> DataFrame.quantiles [0.25, 0.5, 0.75] of
Ok q -> q
Err _ -> DataFrame.fromRecords []Try itNotes: Uses linear interpolation. Non-numeric columns are excluded. Quantile rows appear in the same order as the input list; duplicates are preserved.
See also: DataFrame.quantile, DataFrame.median, DataFrame.summary