DataFrame Expressions
Column expressions let you describe operations on DataFrame columns — arithmetic, comparisons, aggregations, and more. They compile directly to Polars' optimized SIMD engine, so they're fast even on large datasets.
Getting Started
Use @name to reference a column in an expression. Standard Keel operators work directly on column references:
import DataFrame
import Result
import DataFrame.Expr as Expr
let source =
DataFrame.fromRecords
[ { name = "Alice", value = 10 }
, { name = "Bob", value = 20 }
]
source
|> DataFrame.applyExprs [(@doubled, @value * 2)]
|> Result.andThen (DataFrame.column @doubled)
|> Result.withDefault []
Try it@valuereferences the column named "value"- Infix operators (
*,+,==, etc.) work on column references applyExprstakes a list of(@outputCol, expr)tuples — the first element names the output column
For column names with spaces or special characters, use @"My Column" (quoted column reference). Expr.col "name" is also available for dynamically constructed column names.
Column references (@name) are used both in applyExprs/filter expressions and as DataFrameColumn arguments to select, sort, groupBy, rename, and column. The type system distinguishes these contexts: @name in an expression context produces Expr, while @name as a direct argument to select or sort produces DataFrameColumn (checked against the schema). For example, DataFrame.select [@name, @age] and DataFrame.sort [@name]. Functions like sort and sortDesc accept a list, so you can sort by multiple columns: DataFrame.sort [@department, @salary].
Infix Operators vs Expr.* Functions
There are two ways to build column expressions — infix operators and Expr.* pipe functions — and they serve different roles.
Infix operators (+, -, *, /, %, ^, ==, !=, <, <=, >, >=, &&, ||, not) handle arithmetic, comparison, and boolean logic. Use them whenever you can — they read like normal Keel code:
@price * 0.9 -- arithmetic
@age >= 18 && @active -- comparison + boolean logic
@x + @y -- column-to-column
Expr.* functions cover everything else: operations that don't have a natural operator symbol. This includes conditionals (Expr.cond), null handling (Expr.fillNull, Expr.isNull), string operations (Expr.strUpper, Expr.strContains), math (Expr.abs, Expr.sqrt), aggregations (Expr.sum, Expr.mean), type casting, and window functions. These are always called through the pipe:
@score |> Expr.fillNull 0
@name |> Expr.strUpper
In practice, most expressions use both together — infix operators for the computation, pipe functions for null handling and domain-specific transforms:
@price * @qty
@score >= 90
Every infix operator also has a pipe equivalent (Expr.add, Expr.mul, Expr.gte, Expr.and, etc.) for cases where you prefer a uniform pipe style. The full mapping is in the Pipe API section below.
Arithmetic
Use standard arithmetic operators on column references. Scalars are automatically converted — no wrapping needed:
-- Infix arithmetic operators work directly on column references
import DataFrame
import Result
import DataFrame.Expr as Expr
let df =
DataFrame.fromRecords
[ { price = 10.0, quantity = 3 }
, { price = 20.0, quantity = 2 }
]
df
|> DataFrame.applyExprs
[ (@total, @price * @quantity)
]
|> Result.andThen (DataFrame.column @total)
|> Result.withDefault []
Try it-- Scalars are auto-coerced to Expr when used with infix operators
import DataFrame
import Result
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }, { x = 3 }]
df
|> DataFrame.applyExprs [(@plus10, @x + 10)]
|> Result.andThen (DataFrame.column @plus10)
|> Result.withDefault []
Try itOperator precedence works as expected — * binds tighter than +:
-- Chained infix operations respect operator precedence
import DataFrame
import Result
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }, { x = 3 }]
df
|> DataFrame.applyExprs [(@result, @x * 10 + 2)]
|> Result.andThen (DataFrame.column @result)
|> Result.withDefault []
Try it| Operator | Description |
|---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Modulo |
^ | Exponentiation |
- | Negation (unary) |
Comparison and Filtering
Comparison operators on Expr values produce boolean expressions. Use them with filter to select rows:
-- Infix comparison operators produce boolean Expr for filtering
import DataFrame
import Result
let df =
DataFrame.fromRecords
[ { name = "Alice", age = 30 }
, { name = "Bob", age = 17 }
, { name = "Carol", age = 25 }
]
df
|> DataFrame.filter (@age >= 18)
|> Result.map DataFrame.count
|> Result.withDefault 0
Try it| Operator | Description |
|---|---|
== | Equal |
!= | Not equal |
< | Less than |
<= | Less than or equal |
> | Greater than |
>= | Greater than or equal |
&& | Logical AND |
|| | Logical OR |
not | Logical negation |
Use Expr.in to test membership — whether a column value is in a given list:
-- Membership test: Expr.in filters rows where the column value is in a list
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { city = "Berlin" }
, { city = "Munich" }
, { city = "Hamburg" }
]
|> DataFrame.filter (@city |> Expr.in ["Berlin", "Munich"])
|> Result.map DataFrame.count
|> Result.withDefault 0
Try itCombine multiple conditions with && and ||:
-- Boolean logic: && and || work with Expr values
import DataFrame
import Result
let df =
DataFrame.fromRecords
[ { name = "Alice", age = 30, active = True }
, { name = "Bob", age = 17, active = True }
, { name = "Carol", age = 25, active = False }
]
-- age >= 18 AND active
let filterExpr = @age >= 18 && @active
df
|> DataFrame.filter filterExpr
|> Result.map DataFrame.count
|> Result.withDefault 0
Try itApplying Expressions
applyExprs takes a list of (@outputCol, expr) tuples and adds or replaces columns in the DataFrame:
-- Add a computed column with applyExprs using tuple syntax
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { name = "Widget", price = 100.0 }
, { name = "Gadget", price = 250.0 }
]
|> DataFrame.applyExprs
[ (@discounted, @price * 0.9)
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itThe first element of each tuple (@outputCol) names the output column. The second element is any Expr. When you pass multiple expressions, they are batched so that later expressions can reference columns added by earlier ones — see Cross-Column References below.
Expr.named is still used in agg — see Aggregations.
Conditional Expressions
Expr.cond takes a list of (condition, value) pairs and a default value. The first matching condition wins:
-- Simple if/else conditional with Expr.cond
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { name = "Alice", age = 30 }
, { name = "Bob", age = 15 }
]
|> DataFrame.applyExprs
[ (@category, Expr.cond [(@age >= 18, "adult")] "minor")
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itMultiple branches work the same way:
-- Multi-branch conditional with Expr.cond
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { name = "Alice", score = 95 }
, { name = "Bob", score = 72 }
, { name = "Carol", score = 88 }
]
|> DataFrame.applyExprs
[ (@grade, Expr.cond [(@score >= 90, "A"), (@score >= 80, "B"), (@score >= 70, "C")] "F")
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itAll branch values and the default must share the same type. The compiler catches mismatches — for example, Expr.cond [(@x > 5, "high")] 42 is a compile error because the branch value is a String but the default is an Int.
Aggregations
Reduce columns to summary values:
-- Aggregation with DataFrame.Expr
import DataFrame
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { group = "A", value = 10 }
, { group = "A", value = 20 }
, { group = "B", value = 30 }
]
|> DataFrame.groupBy [@group]
|> DataFrame.agg
[ @value |> Expr.sum |> Expr.named "total"
]
|> DataFrame.columns
Try itAvailable: sum, mean, min, max, count, first, last, std, var, median, quantile q.
quantile takes the quantile level (0.0–1.0) as its first argument:
import DataFrame
import DataFrame.Expr as Expr
-- Interquartile range components
let q1 =
@salary
|> Expr.quantile 0.25
|> Expr.named "p25"
let q3 =
@salary
|> Expr.quantile 0.75
|> Expr.named "p75"
DataFrame.fromRecords
[ { dept = "Sales", salary = 100 }
, { dept = "Sales", salary = 200 }
, { dept = "Sales", salary = 300 }
, { dept = "HR", salary = 150 }
, { dept = "HR", salary = 250 }
]
|> DataFrame.groupBy [@dept]
|> DataFrame.agg [q1, q3]
|> DataFrame.columns
Try itThe agg function takes a list of expressions and a GroupedDataFrame (from groupBy).
String Operations
Transform string columns:
-- String operations with DataFrame.Expr
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords [{ name = "alice" }, { name = "bob" }]
|> DataFrame.applyExprs
[ (@upper_name, @name |> Expr.strUpper)
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itAvailable: strLength, strUpper, strLower, strTrim, strContains, strStartsWith, strEndsWith, strReplace, strSplit, strMatches, strCapture, strCaptureAll, strReplaceAll, strCount.
-- String transformation chain
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { email = "alice@example.com" }
, { email = "bob@test.org" }
]
|> DataFrame.applyExprs
[ (@domain, col @email |> Expr.strReplace ".*@" "")
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itimport DataFrame
import Result
import DataFrame.Expr as Expr
-- Whitespace trimming (both sides)
let trimmed = @code |> Expr.strTrim
-- Upper case transformation
let upper = @label |> Expr.strUpper
-- String contains predicate
let found = @code |> Expr.strContains "AB"
-- String replacement
let replaced = @code |> Expr.strReplace "AB" "XY"
DataFrame.fromRecords
[ { code = " AB123 ", label = "short" }
, { code = " CD456 ", label = "long" }
]
|> DataFrame.applyExprs [(@code_trimmed, trimmed), (@label_upper, upper), (@found_abc, found), (@replaced, replaced)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itstrSplit produces a column of lists.
Regex String Operations
strMatches, strCapture, strCaptureAll, strReplaceAll, and strCount provide regex-based string operations:
strMatches pattern expr— boolean column: true if the cell matches the regexstrCapture pattern groupIndex expr— extracts the nth capture group (1-indexed)strCaptureAll pattern expr— extracts all non-overlapping matches into a list columnstrReplaceAll pattern replacement expr— replaces all regex matchesstrCount pattern expr— counts the number of non-overlapping matches
-- Extract year from date strings using regex capture
import DataFrame
import Result
import DataFrame.Expr as Expr
let df =
DataFrame.fromRecords
[ { date_str = "2024-01-15" }
, { date_str = "2023-07-04" }
, { date_str = "no date" }
]
let filtered =
df |> DataFrame.filter (@date_str |> Expr.strMatches "[0-9]{4}-[0-9]{2}-[0-9]{2}")
case filtered of
Ok f ->
f
|> DataFrame.applyExprs [ (@year, @date_str |> Expr.strCapture "([0-9]{4})" 1) ]
|> Result.map DataFrame.columns
|> Result.withDefault []
Err _ -> []
Try itMath Functions
-- Math functions: abs
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords [{ value = -3.7 }, { value = 4.2 }]
|> DataFrame.applyExprs
[ (@abs_value, Expr.col @value |> Expr.abs)
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itAvailable: abs, sqrt, pow, floor, ceil, round.
Null Handling
-- Null handling: fillNull
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords [{ score = 85 }, { score = 90 }]
|> DataFrame.applyExprs
[ (@score_filled, col @score |> Expr.fillNull 0)
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itfillNull replaces null values with a constant. isNull and isNotNull test for nullability and can be used with filter:
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
-- Test for null values in float columns
let nullFlag = col @score |> Expr.isNull
let hasValue = col @value |> Expr.isNotNull
DataFrame.fromRecords
[ { value = Just 1.0, score = Just 95.0 }
, { value = Just 2.0, score = Nothing }
]
|> DataFrame.applyExprs [(@is_null_score, nullFlag), (@has_value, hasValue)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itWindow Functions
Apply expressions over partitions (SQL-style window functions):
import DataFrame
import Result
import DataFrame.Expr as Expr
let df =
DataFrame.fromRecords
[ { region = "East", revenue = 100 }
, { region = "East", revenue = 200 }
, { region = "West", revenue = 150 }
]
-- Sum per partition using over
let e =
Expr.col @revenue
|> Expr.sum
|> Expr.over ["region"]
df
|> DataFrame.applyExprs [(@running_sum, e)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itCumulative Operations
Cumulative operations accumulate a value row-by-row within a partition. These are DataFrame module functions (not Expr functions) — use DataFrame.partitionBy + DataFrame.orderBy + DataFrame.withCumSum to scope the accumulation to each group:
import DataFrame
-- Cumulative sum and mean per region using window functions
DataFrame.fromRecords
[ { region = "East", revenue = 100 }
, { region = "East", revenue = 200 }
, { region = "East", revenue = 150 }
, { region = "West", revenue = 300 }
]
|> DataFrame.partitionBy [@region]
|> DataFrame.orderBy [@revenue]
|> DataFrame.withCumSum "running_total" @revenue
|> DataFrame.withCumMean "running_avg" @revenue
|> DataFrame.collect
|> DataFrame.columns
Try itAvailable: DataFrame.withCumSum, DataFrame.withCumMean, DataFrame.withCumMin, DataFrame.withCumMax.
import DataFrame
-- Running sum per region using window functions
DataFrame.fromRecords
[ { region = "East", revenue = 100 }
, { region = "East", revenue = 200 }
, { region = "West", revenue = 150 }
]
|> DataFrame.partitionBy [@region]
|> DataFrame.orderBy [@revenue]
|> DataFrame.withCumSum "running_total" @revenue
|> DataFrame.collect
|> DataFrame.columns
Try itRolling Window Operations
Rolling operations compute aggregates over a fixed-size sliding window. These are also DataFrame module functions — use DataFrame.partitionBy + DataFrame.orderBy + DataFrame.withRollingSum. Rows where the window is not yet full return null:
import DataFrame
-- Rolling window operations using partitionBy + withRollingSum/withRollingMean
DataFrame.fromRecords
[ { region = "East", value = 1 }
, { region = "East", value = 2 }
, { region = "East", value = 3 }
, { region = "East", value = 4 }
, { region = "East", value = 5 }
]
|> DataFrame.partitionBy [@region]
|> DataFrame.orderBy [@value]
|> DataFrame.withRollingSum "roll3_sum" @value 3
|> DataFrame.withRollingMean "roll3_mean" @value 3
|> DataFrame.collect
|> DataFrame.columns
Try itAvailable: DataFrame.withRollingSum, DataFrame.withRollingMean, DataFrame.withRollingMin, DataFrame.withRollingMax.
import DataFrame
-- 7-day rolling average using window functions
DataFrame.fromRecords
[ { dept = "East", sales = 10 }
, { dept = "East", sales = 20 }
, { dept = "East", sales = 30 }
, { dept = "East", sales = 40 }
, { dept = "East", sales = 50 }
, { dept = "East", sales = 60 }
, { dept = "East", sales = 70 }
, { dept = "East", sales = 80 }
]
|> DataFrame.partitionBy [@dept]
|> DataFrame.orderBy [@sales]
|> DataFrame.withRollingMean "sales_7d_avg" @sales 7
|> DataFrame.collect
|> DataFrame.columns
Try itDate and DateTime Operations
ISO 8601 date and datetime strings in CSV files are automatically parsed. Use strContains, strStartsWith, or strEndsWith to test date string patterns, or use strReplace to transform them:
import DataFrame
import DataFrame.Expr exposing col, lit
import Result
import DataFrame.Expr as Expr
-- Extract year and month as strings via strSlice-equivalent using strReplace pattern
-- ISO 8601 dates: "YYYY-MM-DD" — extract first 4 chars as year, chars 5-6 as month
let yearStr = col @birth_date |> Expr.strContains "1990"
let monthStr = col @birth_date |> Expr.strContains "-06-"
case DataFrame.readCsv "content/examples/guide/dataframe/dates.csv" of
Ok df ->
df
|> DataFrame.applyExprs [(@year_str, yearStr), (@month_str, monthStr)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Err _ -> []
Try itimport DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
-- Detect ISO 8601 datetime structure using string predicates
let hasT = col @created_at |> Expr.strContains "T"
let hasColon = col @created_at |> Expr.strContains ":"
case DataFrame.readCsv "content/examples/guide/dataframe/datetimes.csv" of
Ok df ->
df
|> DataFrame.applyExprs [(@has_T, hasT), (@has_colon, hasColon)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Err _ -> []
Try itCross-Column References
DataFrame.applyExprs automatically batches expressions so that each expression can reference columns added by earlier expressions in the same call. You do not need to chain multiple applyExprs calls when one column depends on another:
-- applyExprs batches exprs so later ones can reference columns added earlier
import DataFrame
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords [{ x = 10 }, { x = 20 }]
|> DataFrame.applyExprs [(@y, @x + 1), (@z, @y * 2)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itIn the example above, y is produced by the first expression and then consumed by the second expression within the same applyExprs call. This works because the runtime applies expressions sequentially within a single batch, making each new column visible to subsequent expressions immediately.
Pipe API
Every infix operator has a corresponding pipe-style function in DataFrame.Expr. These are useful for chaining operations or when the infix syntax is less readable:
| Infix | Pipe equivalent | Example |
|---|---|---|
@x + 5 | @x |> Expr.add 5 | Addition |
@x - 3 | @x |> Expr.sub 3 | Subtraction |
@x * 2 | @x |> Expr.mul 2 | Multiplication |
@x / 10 | @x |> Expr.div 10 | Division |
@x % 3 | @x |> Expr.mod 3 | Modulo |
@x ^ 2 | @x |> Expr.pow 2 | Exponentiation |
@x == 5 | @x |> Expr.eq 5 | Equal |
@x != 5 | @x |> Expr.neq 5 | Not equal |
@x > 3 | @x |> Expr.gt 3 | Greater than |
@x >= 3 | @x |> Expr.gte 3 | Greater or equal |
@x < 10 | @x |> Expr.lt 10 | Less than |
@x <= 10 | @x |> Expr.lte 10 | Less or equal |
expr1 && expr2 | expr1 |> Expr.and expr2 | Logical AND |
expr1 || expr2 | expr1 |> Expr.or expr2 | Logical OR |
Additional pipe-only functions: Expr.in, Expr.not, Expr.named (for agg), Expr.cond, Expr.fillNull, Expr.isNull, Expr.isNotNull, Expr.over, and all string/math/aggregation/window functions listed above.
-- Pipe-style arithmetic: Expr.mul for column * column
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { price = 10, quantity = 3 }
, { price = 20, quantity = 2 }
]
|> DataFrame.applyExprs
[ (@total, col @price |> Expr.mul (col @quantity))
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try it-- Pipe-style comparison: Expr.gte for column >= literal
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
DataFrame.fromRecords
[ { name = "Alice", age = 25 }
, { name = "Bob", age = 15 }
]
|> DataFrame.applyExprs
[ (@is_adult, col @age |> Expr.gte 18)
]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itExpression Reuse and Advanced Features
Expression reuse — bind an expression to a variable and apply it to multiple DataFrames or operations:
import DataFrame
import Result
import DataFrame.Expr as Expr
let revenue = @price * @qty
let sales =
DataFrame.fromRecords
[ { name = "A", price = 10, qty = 3 }
, { name = "B", price = 20, qty = 2 }
]
-- Reuse the same expression on different DataFrames
sales
|> DataFrame.applyExprs [(@revenue, revenue)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itCaptured outer-scope variables — reference any Keel value from the surrounding scope:
-- Outer-scope variables are captured in expressions
import DataFrame
import Result
import DataFrame.Expr as Expr
let multiplier = 2
DataFrame.fromRecords [{ price = 10 }, { price = 20 }]
|> DataFrame.applyExprs
[ (@doubled, @price * multiplier)
]
|> Result.andThen (DataFrame.column @doubled)
|> Result.withDefault []
Try itMulti-column string composition — Expr.concatMany joins multiple columns into one string column:
import DataFrame
import Result
import DataFrame.Expr as Expr
let full = Expr.concatMany " " [@first, @last]
DataFrame.fromRecords
[ { first = "Alice", last = "Smith" }
, { first = "Bob", last = "Jones" }
]
|> DataFrame.applyExprs [(@full_name, full)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itList-typed column output — Expr.strSplit produces a column of lists:
import DataFrame
import DataFrame.Expr exposing col
import Result
import DataFrame.Expr as Expr
let tags = col @tag_str |> Expr.strSplit ","
DataFrame.fromRecords
[ { tag_str = "red,green,blue" }
, { tag_str = "alpha,beta" }
]
|> DataFrame.applyExprs [(@tags, tags)]
|> Result.map DataFrame.columns
|> Result.withDefault []
Try itType Safety
The Expr API is fully type-safe: every Expr.* function compiles directly to a Polars operation, so there is no unsupported-operation path. Errors that can still occur at runtime are column-not-found errors (e.g., @typo) — but these are caught at compile time when the DataFrame has a known schema (from readCsv, readParquet, or similar).
Next Steps
See the DataFrame stdlib page for all DataFrame functions.