Esc
Start typing to search...

DataFrame Expressions

Column expressions let you describe operations on DataFrame columns — arithmetic, comparisons, aggregations, and more. They compile directly to Polars' optimized SIMD engine, so they're fast even on large datasets.

Getting Started

Use @name to reference a column in an expression. Standard Keel operators work directly on column references:

import DataFrame
import Result

import DataFrame.Expr as Expr

let source =
    DataFrame.fromRecords
        [ { name = "Alice", value = 10 }
        , { name = "Bob", value = 20 }
        ]

source
    |> DataFrame.applyExprs [(@doubled, @value * 2)]
    |> Result.andThen (DataFrame.column @doubled)
    |> Result.withDefault []
Try it
  • @value references the column named "value"
  • Infix operators (*, +, ==, etc.) work on column references
  • applyExprs takes a list of (@outputCol, expr) tuples — the first element names the output column

For column names with spaces or special characters, use @"My Column" (quoted column reference). Expr.col "name" is also available for dynamically constructed column names.

Column references (@name) are used both in applyExprs/filter expressions and as DataFrameColumn arguments to select, sort, groupBy, rename, and column. The type system distinguishes these contexts: @name in an expression context produces Expr, while @name as a direct argument to select or sort produces DataFrameColumn (checked against the schema). For example, DataFrame.select [@name, @age] and DataFrame.sort [@name]. Functions like sort and sortDesc accept a list, so you can sort by multiple columns: DataFrame.sort [@department, @salary].

Infix Operators vs Expr.* Functions

There are two ways to build column expressions — infix operators and Expr.* pipe functions — and they serve different roles.

Infix operators (+, -, *, /, %, ^, ==, !=, <, <=, >, >=, &&, ||, not) handle arithmetic, comparison, and boolean logic. Use them whenever you can — they read like normal Keel code:

@price * 0.9                   -- arithmetic
@age >= 18 && @active           -- comparison + boolean logic
@x + @y                        -- column-to-column

Expr.* functions cover everything else: operations that don't have a natural operator symbol. This includes conditionals (Expr.cond), null handling (Expr.fillNull, Expr.isNull), string operations (Expr.strUpper, Expr.strContains), math (Expr.abs, Expr.sqrt), aggregations (Expr.sum, Expr.mean), type casting, and window functions. These are always called through the pipe:

@score |> Expr.fillNull 0
@name |> Expr.strUpper

In practice, most expressions use both together — infix operators for the computation, pipe functions for null handling and domain-specific transforms:

@price * @qty
@score >= 90

Every infix operator also has a pipe equivalent (Expr.add, Expr.mul, Expr.gte, Expr.and, etc.) for cases where you prefer a uniform pipe style. The full mapping is in the Pipe API section below.

Arithmetic

Use standard arithmetic operators on column references. Scalars are automatically converted — no wrapping needed:

-- Infix arithmetic operators work directly on column references
import DataFrame
import Result

import DataFrame.Expr as Expr

let df =
    DataFrame.fromRecords
        [ { price = 10.0, quantity = 3 }
        , { price = 20.0, quantity = 2 }
        ]

df
    |> DataFrame.applyExprs
        [ (@total, @price * @quantity)
        ]
    |> Result.andThen (DataFrame.column @total)
    |> Result.withDefault []
Try it
-- Scalars are auto-coerced to Expr when used with infix operators
import DataFrame
import Result

import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }, { x = 3 }]

df
    |> DataFrame.applyExprs [(@plus10, @x + 10)]
    |> Result.andThen (DataFrame.column @plus10)
    |> Result.withDefault []
Try it

Operator precedence works as expected — * binds tighter than +:

-- Chained infix operations respect operator precedence
import DataFrame
import Result

import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 2 }, { x = 3 }]

df
    |> DataFrame.applyExprs [(@result, @x * 10 + 2)]
    |> Result.andThen (DataFrame.column @result)
    |> Result.withDefault []
Try it
OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulo
^Exponentiation
-Negation (unary)

Comparison and Filtering

Comparison operators on Expr values produce boolean expressions. Use them with filter to select rows:

-- Infix comparison operators produce boolean Expr for filtering
import DataFrame
import Result

let df =
    DataFrame.fromRecords
        [ { name = "Alice", age = 30 }
        , { name = "Bob", age = 17 }
        , { name = "Carol", age = 25 }
        ]

df
    |> DataFrame.filter (@age >= 18)
    |> Result.map DataFrame.count
    |> Result.withDefault 0
Try it
OperatorDescription
==Equal
!=Not equal
<Less than
<=Less than or equal
>Greater than
>=Greater than or equal
&&Logical AND
||Logical OR
notLogical negation

Use Expr.in to test membership — whether a column value is in a given list:

-- Membership test: Expr.in filters rows where the column value is in a list
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { city = "Berlin" }
    , { city = "Munich" }
    , { city = "Hamburg" }
    ]
    |> DataFrame.filter (@city |> Expr.in ["Berlin", "Munich"])
    |> Result.map DataFrame.count
    |> Result.withDefault 0
Try it

Combine multiple conditions with && and ||:

-- Boolean logic: && and || work with Expr values
import DataFrame
import Result

let df =
    DataFrame.fromRecords
        [ { name = "Alice", age = 30, active = True }
        , { name = "Bob", age = 17, active = True }
        , { name = "Carol", age = 25, active = False }
        ]

-- age >= 18 AND active
let filterExpr = @age >= 18 && @active

df
    |> DataFrame.filter filterExpr
    |> Result.map DataFrame.count
    |> Result.withDefault 0
Try it

Applying Expressions

applyExprs takes a list of (@outputCol, expr) tuples and adds or replaces columns in the DataFrame:

-- Add a computed column with applyExprs using tuple syntax
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Widget", price = 100.0 }
    , { name = "Gadget", price = 250.0 }
    ]
    |> DataFrame.applyExprs
        [ (@discounted, @price * 0.9)
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

The first element of each tuple (@outputCol) names the output column. The second element is any Expr. When you pass multiple expressions, they are batched so that later expressions can reference columns added by earlier ones — see Cross-Column References below.

Expr.named is still used in agg — see Aggregations.

Conditional Expressions

Expr.cond takes a list of (condition, value) pairs and a default value. The first matching condition wins:

-- Simple if/else conditional with Expr.cond
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Alice", age = 30 }
    , { name = "Bob", age = 15 }
    ]
    |> DataFrame.applyExprs
        [ (@category, Expr.cond [(@age >= 18, "adult")] "minor")
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Multiple branches work the same way:

-- Multi-branch conditional with Expr.cond
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Alice", score = 95 }
    , { name = "Bob", score = 72 }
    , { name = "Carol", score = 88 }
    ]
    |> DataFrame.applyExprs
        [ (@grade, Expr.cond [(@score >= 90, "A"), (@score >= 80, "B"), (@score >= 70, "C")] "F")
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

All branch values and the default must share the same type. The compiler catches mismatches — for example, Expr.cond [(@x > 5, "high")] 42 is a compile error because the branch value is a String but the default is an Int.

Aggregations

Reduce columns to summary values:

-- Aggregation with DataFrame.Expr
import DataFrame

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { group = "A", value = 10 }
    , { group = "A", value = 20 }
    , { group = "B", value = 30 }
    ]
    |> DataFrame.groupBy [@group]
    |> DataFrame.agg
        [ @value |> Expr.sum |> Expr.named "total"
        ]
    |> DataFrame.columns
Try it

Available: sum, mean, min, max, count, first, last, std, var, median, quantile q.

quantile takes the quantile level (0.0–1.0) as its first argument:

import DataFrame

import DataFrame.Expr as Expr

-- Interquartile range components
let q1 =
    @salary
        |> Expr.quantile 0.25
        |> Expr.named "p25"

let q3 =
    @salary
        |> Expr.quantile 0.75
        |> Expr.named "p75"

DataFrame.fromRecords
    [ { dept = "Sales", salary = 100 }
    , { dept = "Sales", salary = 200 }
    , { dept = "Sales", salary = 300 }
    , { dept = "HR", salary = 150 }
    , { dept = "HR", salary = 250 }
    ]
    |> DataFrame.groupBy [@dept]
    |> DataFrame.agg [q1, q3]
    |> DataFrame.columns
Try it

The agg function takes a list of expressions and a GroupedDataFrame (from groupBy).

String Operations

Transform string columns:

-- String operations with DataFrame.Expr
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords [{ name = "alice" }, { name = "bob" }]
    |> DataFrame.applyExprs
        [ (@upper_name, @name |> Expr.strUpper)
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Available: strLength, strUpper, strLower, strTrim, strContains, strStartsWith, strEndsWith, strReplace, strSplit, strMatches, strCapture, strCaptureAll, strReplaceAll, strCount.

-- String transformation chain
import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { email = "alice@example.com" }
    , { email = "bob@test.org" }
    ]
    |> DataFrame.applyExprs
        [ (@domain, col @email |> Expr.strReplace ".*@" "")
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it
import DataFrame
import Result

import DataFrame.Expr as Expr

-- Whitespace trimming (both sides)
let trimmed = @code |> Expr.strTrim

-- Upper case transformation
let upper = @label |> Expr.strUpper

-- String contains predicate
let found = @code |> Expr.strContains "AB"

-- String replacement
let replaced = @code |> Expr.strReplace "AB" "XY"

DataFrame.fromRecords
    [ { code = "  AB123  ", label = "short" }
    , { code = "  CD456  ", label = "long" }
    ]
    |> DataFrame.applyExprs [(@code_trimmed, trimmed), (@label_upper, upper), (@found_abc, found), (@replaced, replaced)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

strSplit produces a column of lists.

Regex String Operations

strMatches, strCapture, strCaptureAll, strReplaceAll, and strCount provide regex-based string operations:

  • strMatches pattern expr — boolean column: true if the cell matches the regex
  • strCapture pattern groupIndex expr — extracts the nth capture group (1-indexed)
  • strCaptureAll pattern expr — extracts all non-overlapping matches into a list column
  • strReplaceAll pattern replacement expr — replaces all regex matches
  • strCount pattern expr — counts the number of non-overlapping matches
-- Extract year from date strings using regex capture
import DataFrame
import Result

import DataFrame.Expr as Expr

let df =
    DataFrame.fromRecords
        [ { date_str = "2024-01-15" }
        , { date_str = "2023-07-04" }
        , { date_str = "no date" }
        ]

let filtered =
    df |> DataFrame.filter (@date_str |> Expr.strMatches "[0-9]{4}-[0-9]{2}-[0-9]{2}")

case filtered of
    Ok f ->
        f
            |> DataFrame.applyExprs [ (@year, @date_str |> Expr.strCapture "([0-9]{4})" 1) ]
            |> Result.map DataFrame.columns
            |> Result.withDefault []
    Err _ -> []
Try it

Math Functions

-- Math functions: abs
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords [{ value = -3.7 }, { value = 4.2 }]
    |> DataFrame.applyExprs
        [ (@abs_value, Expr.col @value |> Expr.abs)
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Available: abs, sqrt, pow, floor, ceil, round.

Null Handling

-- Null handling: fillNull
import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords [{ score = 85 }, { score = 90 }]
    |> DataFrame.applyExprs
        [ (@score_filled, col @score |> Expr.fillNull 0)
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

fillNull replaces null values with a constant. isNull and isNotNull test for nullability and can be used with filter:

import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

-- Test for null values in float columns
let nullFlag = col @score |> Expr.isNull

let hasValue = col @value |> Expr.isNotNull

DataFrame.fromRecords
    [ { value = Just 1.0, score = Just 95.0 }
    , { value = Just 2.0, score = Nothing }
    ]
    |> DataFrame.applyExprs [(@is_null_score, nullFlag), (@has_value, hasValue)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Window Functions

Apply expressions over partitions (SQL-style window functions):

import DataFrame
import Result

import DataFrame.Expr as Expr

let df =
    DataFrame.fromRecords
        [ { region = "East", revenue = 100 }
        , { region = "East", revenue = 200 }
        , { region = "West", revenue = 150 }
        ]

-- Sum per partition using over
let e =
    Expr.col @revenue
        |> Expr.sum
        |> Expr.over ["region"]

df
    |> DataFrame.applyExprs [(@running_sum, e)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Cumulative Operations

Cumulative operations accumulate a value row-by-row within a partition. These are DataFrame module functions (not Expr functions) — use DataFrame.partitionBy + DataFrame.orderBy + DataFrame.withCumSum to scope the accumulation to each group:

import DataFrame

-- Cumulative sum and mean per region using window functions
DataFrame.fromRecords
    [ { region = "East", revenue = 100 }
    , { region = "East", revenue = 200 }
    , { region = "East", revenue = 150 }
    , { region = "West", revenue = 300 }
    ]
    |> DataFrame.partitionBy [@region]
    |> DataFrame.orderBy [@revenue]
    |> DataFrame.withCumSum "running_total" @revenue
    |> DataFrame.withCumMean "running_avg" @revenue
    |> DataFrame.collect
    |> DataFrame.columns
Try it

Available: DataFrame.withCumSum, DataFrame.withCumMean, DataFrame.withCumMin, DataFrame.withCumMax.

import DataFrame

-- Running sum per region using window functions
DataFrame.fromRecords
    [ { region = "East", revenue = 100 }
    , { region = "East", revenue = 200 }
    , { region = "West", revenue = 150 }
    ]
    |> DataFrame.partitionBy [@region]
    |> DataFrame.orderBy [@revenue]
    |> DataFrame.withCumSum "running_total" @revenue
    |> DataFrame.collect
    |> DataFrame.columns
Try it

Rolling Window Operations

Rolling operations compute aggregates over a fixed-size sliding window. These are also DataFrame module functions — use DataFrame.partitionBy + DataFrame.orderBy + DataFrame.withRollingSum. Rows where the window is not yet full return null:

import DataFrame

-- Rolling window operations using partitionBy + withRollingSum/withRollingMean
DataFrame.fromRecords
    [ { region = "East", value = 1 }
    , { region = "East", value = 2 }
    , { region = "East", value = 3 }
    , { region = "East", value = 4 }
    , { region = "East", value = 5 }
    ]
    |> DataFrame.partitionBy [@region]
    |> DataFrame.orderBy [@value]
    |> DataFrame.withRollingSum "roll3_sum" @value 3
    |> DataFrame.withRollingMean "roll3_mean" @value 3
    |> DataFrame.collect
    |> DataFrame.columns
Try it

Available: DataFrame.withRollingSum, DataFrame.withRollingMean, DataFrame.withRollingMin, DataFrame.withRollingMax.

import DataFrame

-- 7-day rolling average using window functions
DataFrame.fromRecords
    [ { dept = "East", sales = 10 }
    , { dept = "East", sales = 20 }
    , { dept = "East", sales = 30 }
    , { dept = "East", sales = 40 }
    , { dept = "East", sales = 50 }
    , { dept = "East", sales = 60 }
    , { dept = "East", sales = 70 }
    , { dept = "East", sales = 80 }
    ]
    |> DataFrame.partitionBy [@dept]
    |> DataFrame.orderBy [@sales]
    |> DataFrame.withRollingMean "sales_7d_avg" @sales 7
    |> DataFrame.collect
    |> DataFrame.columns
Try it

Date and DateTime Operations

ISO 8601 date and datetime strings in CSV files are automatically parsed. Use strContains, strStartsWith, or strEndsWith to test date string patterns, or use strReplace to transform them:

import DataFrame
import DataFrame.Expr exposing col, lit
import Result

import DataFrame.Expr as Expr

-- Extract year and month as strings via strSlice-equivalent using strReplace pattern
-- ISO 8601 dates: "YYYY-MM-DD" — extract first 4 chars as year, chars 5-6 as month
let yearStr = col @birth_date |> Expr.strContains "1990"

let monthStr = col @birth_date |> Expr.strContains "-06-"

case DataFrame.readCsv "content/examples/guide/dataframe/dates.csv" of
    Ok df ->
        df
            |> DataFrame.applyExprs [(@year_str, yearStr), (@month_str, monthStr)]
            |> Result.map DataFrame.columns
            |> Result.withDefault []
    Err _ -> []
Try it
import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

-- Detect ISO 8601 datetime structure using string predicates
let hasT = col @created_at |> Expr.strContains "T"

let hasColon = col @created_at |> Expr.strContains ":"

case DataFrame.readCsv "content/examples/guide/dataframe/datetimes.csv" of
    Ok df ->
        df
            |> DataFrame.applyExprs [(@has_T, hasT), (@has_colon, hasColon)]
            |> Result.map DataFrame.columns
            |> Result.withDefault []
    Err _ -> []
Try it

Cross-Column References

DataFrame.applyExprs automatically batches expressions so that each expression can reference columns added by earlier expressions in the same call. You do not need to chain multiple applyExprs calls when one column depends on another:

-- applyExprs batches exprs so later ones can reference columns added earlier
import DataFrame
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords [{ x = 10 }, { x = 20 }]
    |> DataFrame.applyExprs [(@y, @x + 1), (@z, @y * 2)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

In the example above, y is produced by the first expression and then consumed by the second expression within the same applyExprs call. This works because the runtime applies expressions sequentially within a single batch, making each new column visible to subsequent expressions immediately.

Pipe API

Every infix operator has a corresponding pipe-style function in DataFrame.Expr. These are useful for chaining operations or when the infix syntax is less readable:

InfixPipe equivalentExample
@x + 5@x |> Expr.add 5Addition
@x - 3@x |> Expr.sub 3Subtraction
@x * 2@x |> Expr.mul 2Multiplication
@x / 10@x |> Expr.div 10Division
@x % 3@x |> Expr.mod 3Modulo
@x ^ 2@x |> Expr.pow 2Exponentiation
@x == 5@x |> Expr.eq 5Equal
@x != 5@x |> Expr.neq 5Not equal
@x > 3@x |> Expr.gt 3Greater than
@x >= 3@x |> Expr.gte 3Greater or equal
@x < 10@x |> Expr.lt 10Less than
@x <= 10@x |> Expr.lte 10Less or equal
expr1 && expr2expr1 |> Expr.and expr2Logical AND
expr1 || expr2expr1 |> Expr.or expr2Logical OR

Additional pipe-only functions: Expr.in, Expr.not, Expr.named (for agg), Expr.cond, Expr.fillNull, Expr.isNull, Expr.isNotNull, Expr.over, and all string/math/aggregation/window functions listed above.

-- Pipe-style arithmetic: Expr.mul for column * column
import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { price = 10, quantity = 3 }
    , { price = 20, quantity = 2 }
    ]
    |> DataFrame.applyExprs
        [ (@total, col @price |> Expr.mul (col @quantity))
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it
-- Pipe-style comparison: Expr.gte for column >= literal
import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Alice", age = 25 }
    , { name = "Bob", age = 15 }
    ]
    |> DataFrame.applyExprs
        [ (@is_adult, col @age |> Expr.gte 18)
        ]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Expression Reuse and Advanced Features

Expression reuse — bind an expression to a variable and apply it to multiple DataFrames or operations:

import DataFrame
import Result

import DataFrame.Expr as Expr

let revenue = @price * @qty

let sales =
    DataFrame.fromRecords
        [ { name = "A", price = 10, qty = 3 }
        , { name = "B", price = 20, qty = 2 }
        ]

-- Reuse the same expression on different DataFrames
sales
    |> DataFrame.applyExprs [(@revenue, revenue)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Captured outer-scope variables — reference any Keel value from the surrounding scope:

-- Outer-scope variables are captured in expressions
import DataFrame
import Result

import DataFrame.Expr as Expr

let multiplier = 2

DataFrame.fromRecords [{ price = 10 }, { price = 20 }]
    |> DataFrame.applyExprs
        [ (@doubled, @price * multiplier)
        ]
    |> Result.andThen (DataFrame.column @doubled)
    |> Result.withDefault []
Try it

Multi-column string compositionExpr.concatMany joins multiple columns into one string column:

import DataFrame
import Result

import DataFrame.Expr as Expr

let full = Expr.concatMany " " [@first, @last]

DataFrame.fromRecords
    [ { first = "Alice", last = "Smith" }
    , { first = "Bob", last = "Jones" }
    ]
    |> DataFrame.applyExprs [(@full_name, full)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

List-typed column outputExpr.strSplit produces a column of lists:

import DataFrame
import DataFrame.Expr exposing col
import Result

import DataFrame.Expr as Expr

let tags = col @tag_str |> Expr.strSplit ","

DataFrame.fromRecords
    [ { tag_str = "red,green,blue" }
    , { tag_str = "alpha,beta" }
    ]
    |> DataFrame.applyExprs [(@tags, tags)]
    |> Result.map DataFrame.columns
    |> Result.withDefault []
Try it

Type Safety

The Expr API is fully type-safe: every Expr.* function compiles directly to a Polars operation, so there is no unsupported-operation path. Errors that can still occur at runtime are column-not-found errors (e.g., @typo) — but these are caught at compile time when the DataFrame has a known schema (from readCsv, readParquet, or similar).

Next Steps

See the DataFrame stdlib page for all DataFrame functions.