Esc
Start typing to search...

Changelog

All notable changes to Keel are documented here. This project follows Keep a Changelog and Semantic Versioning.

Unreleased

167 changes

Added

67 items
  • DataFrame.checkSchema — runtime schema validation — New DataFrame.checkSchema : Schema -> DataFrame -> Result DataFrame DataFrameError validates a DataFrame's column types against a declared schema at runtime. Schema is a record where each field name is a column name and the value is the expected type tag (e.g. { age : Int, name : String }). Returns Ok df when all declared columns exist with matching types; returns Err (DataFrameError "SchemaError" ...) listing every mismatch. Columns not mentioned in the schema are ignored. Useful for asserting postconditions on loaded or transformed data.
  • == and != for Tuple, List, and Record values — Structural equality now works for tuples, lists, and records at both compile time and runtime. The type checker (is_eq_comparable) and VM comparison handler (EqRegRegReg) both accept these types. Equality is element-wise recursive: (1, "a") == (1, "a")true, [1, 2] == [1, 3]false, { x = 1 } == { x = 1 }true. Mismatched types (e.g. comparing a Tuple to a DataFrame) are rejected at compile time.
  • vm::estimate_register_size and VM::heap() — heap-aware size estimation — New pub fn estimate_register_size(rv: &RegisterValue, heap: &[HeapSlot]) -> usize in keel_core::vm walks the VM heap recursively with a cycle guard to compute the true in-memory size of any register value, including nested lists, records, tuples, closures, and polars DataFrames (via estimated_size()). VM::heap() accessor exposes the heap slice to downstream crates. inspect::list_variables now uses heap-aware sizes instead of the OutputValue estimate, giving accurate sizes for local sessions.
  • DataFrame.readJsonl and DataFrame.readJsonlColumns — New I/O functions for NDJSON (newline-delimited JSON / JSON Lines) files. readJsonl reads a full NDJSON file into a DataFrame; readJsonlColumns reads and projects only the specified columns. Both accept local file paths and remote URLs and return Result DataFrame DataFrameError. Backed by Polars' LazyJsonLineReader. Compile-time schema inference works for literal paths (.jsonl, .ndjson extensions). LSP path completion and pre-scan updated. Use readJsonl for NDJSON files where each line is an independent JSON object; use readJson for JSON array files ([{...}, ...]).
  • DataFrame.Expr regex string functions — Five new expression functions for pattern-based string operations on DataFrame columns: strMatches (Bool column — true if cell matches the regex), strCapture (extract nth capture group as String), strCaptureAll (all non-overlapping matches as a List column), strReplaceAll (replace all regex matches), strCount (count non-overlapping matches). All map to Polars regex string methods and use the regex crate syntax. Invalid patterns are handled gracefully: strMatches/strCapture/strCaptureAll fill the column with null; strReplaceAll/strCount propagate an error through applyExprs.
  • ServerPerms permissions snapshot and effective_perms() — New serialisable struct ServerPerms (with sub-structs ServerIoPerms, ServerHttpPerms, ServerDataframePerms) captures the effective security restrictions at process startup by reading the KEEL_* env vars. effective_perms() is the public API; effective_perms_from(closure) is the testable inner function used in unit tests to avoid process-global env mutation.
  • Regex stdlib module — new module for compiled regular expression patterns. Regex.compile takes a pattern string and returns Result Regex String, placing invalid-pattern errors explicitly in the type system. Matching functions: Regex.test (Bool), Regex.find (Maybe String), Regex.findAll ([String]), Regex.captures (Maybe [Maybe String]). Transformation functions: Regex.replace, Regex.replaceAll (first/all match substitution), Regex.split. Regex values are opaque NativeObject wrappers — they compose through Result/Maybe chains and work with List.map partial application. Supports the full regex crate syntax including Unicode properties (\p{L}).
  • Process stdlib module — new module collects all process-control operations: Process.exit (exit with code), Process.wait (sleep N ms), Process.pid (process ID), Process.hostname (machine hostname), Process.uptime (ms since program start), Process.env (get env var), Process.setEnv (set env var), Process.args (command-line args). The four operations previously in IO (IO.exit, IO.env, IO.setEnv, IO.args) are removed from IO entirely — use Process.* instead. IO now covers only file, directory, console, and path operations.
  • type alias transparent type synonymstype alias Name = Type declares a transparent structural synonym. The alias name and the underlying type are fully interchangeable: values flow between them without any constructor or pattern match. Re-enables the previously stubbed parser for Declaration::TypeAlias; adds Token::Alias as a keyword. DataFrame.fromRecords accepts [AliasName] lists once the alias expands to the matching record type. Parameterized aliases (type alias Box a = { value : a }) and module-scoped aliases are supported. 52 integration tests added.
  • Http.responseText — New function Response -> String that extracts the raw response body as a string without any parsing. Companion to Http.responseJson for APIs that return plain text, HTML, CSV, or other non-JSON formats. Pipe-friendly: Http.get url |> Http.send |> Result.map Http.responseText.
  • Matrix.rangeVec — Creates a column vector [start, start+1, …, end] as Matrix Float with no interpreter overhead. Both ends inclusive. Use together with Matrix.outerAdd to build index-formula matrices:
  • Matrix.outerAdd — Computes the outer sum of two column vectors: M[i,j] = a[i] + b[j]. Both inputs must be n×1 column vectors. Runs in pure Rust with no per-element interpreter calls, making it significantly faster than Matrix.fromFn for index-formula matrices.
  • Matrix.tridiag — Creates an n×n tridiagonal matrix with a specified diagonal value and off-diagonal value. Replaces fromFn with a conditional for the common tridiagonal pattern:
  • Ordering enumOrdering::Lt, Ordering::Eq, Ordering::Gt. Import with import Ordering. Used as the return type of all compare functions.
  • DateTime.fromYmd — Constructs a DateTime from year, month, day integers. Returns Result DateTime DateTimeError.
  • Math.json.encode error propagationJson.encode now returns Result String JsonError instead of silently returning "null" for un-encodable values.
  • Math.sqrt, Math.pow, Math.log, etc. return Maybe Float — Functions that can produce non-finite results now return Nothing instead of NaN/Infinity.
  • DataFrame.applyExprs now takes [(DataFrameColumn, Expr)] instead of [Expr] — Each entry is a tuple of a column reference and an expression. The old Expr.named "colname" expr pattern is no longer required for applyExprs; use (@colname, expr) instead. Expr.named remains available for aggregation contexts. This is a breaking change for any code calling applyExprs with a plain [Expr] list.
  • DataFrame.sort and DataFrame.sortDesc now accept a list of columns — signatures changed from DataFrameColumn -> DataFrame -> Result DataFrame DataFrameError to [DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError. Passing a bare column reference (e.g. DataFrame.sort @name) is now a compile-time type error; wrap it in a list: DataFrame.sort [@name]. Multiple columns are sorted left-to-right: DataFrame.sort [@department, @salary]. All column names in the list are validated at compile time against typed DataFrames.
  • ? postfix operatorexpr? unwraps Ok v to v; short-circuits with Err e if the result is an error. Applying ? to a non-Result type is a compile-time TypeMismatch error. Works at the top level and inside function bodies, with pipes, and with let bindings.
  • Result.isOk / Result.isErr — Predicate functions that return Bool. Useful in List.filter Result.isOk patterns.
  • Result.filter(a -> Bool) -> e -> Result a e -> Result a e. Keep Ok only if the predicate passes; replace with the given Err value otherwise.
  • Result.flattenResult (Result a e) e -> Result a e. Removes one layer of Result nesting.
  • Result.zipResult a e -> Result b e -> Result (a, b) e. Combines two Result values into a Result of a tuple; the first Err wins.
  • Result.mapOrb -> (a -> b) -> Result a e -> b. Applies the function to Ok, or returns the default for Err.
  • Result.all[Result a e] -> Result [a] e. Collects a list of Result values into a single Result of a list; the first Err short-circuits.
  • Maybe.isJust / Maybe.isNothing — Predicate functions that return Bool. Useful in List.filter Maybe.isJust patterns.
  • Maybe.filter(a -> Bool) -> Maybe a -> Maybe a. Keep Just only if the predicate passes; replace with Nothing otherwise.
  • Maybe.flattenMaybe (Maybe a) -> Maybe a. Removes one layer of Maybe nesting.
  • Maybe.zipMaybe a -> Maybe b -> Maybe (a, b). Combines two Maybe values into a Maybe of a tuple; Nothing wins.
  • Maybe.mapOrb -> (a -> b) -> Maybe a -> b. Applies the function to Just, or returns the default for Nothing.
  • Maybe.all[Maybe a] -> Maybe [a]. Collects a list of Maybe values into a single Maybe of a list; the first Nothing short-circuits.
  • DataFrame functions return Result DataFrame DataFrameError — All DataFrame operations that can fail (sort, unique, head, tail, slice, sample, join, concat, pivot, melt, quantile, mean, median, std, var, mode, corr, cov, describe, collect, all window functions, and label operations) now return Result DataFrame DataFrameError instead of crashing at runtime. Use case expressions or the ? operator to handle errors. The DataFrameError enum is available in the stdlib scope.
  • DataFrame.corr / DataFrame.cov — Compute the Pearson correlation matrix and covariance matrix for all numeric columns. Return a square DataFrame indexed by column name.
  • DataFrame.mean / DataFrame.median / DataFrame.std / DataFrame.var — Column-wise descriptive statistics: mean, median, standard deviation, and variance for all numeric columns. Each returns a single-row DataFrame.
  • DataFrame.mode — Column-wise mode (most frequent value) for all columns. Returns a DataFrame with one row per mode value per column.
  • DataFrame.quantile — Column-wise quantile computation. Accepts a Float quantile (0.0–1.0) and returns a single-row DataFrame. Validates that the quantile is in range; returns InvalidArgument otherwise.
  • DataFrame.summary extensions — Six additional statistical rows appended to the summary output: median, standard deviation, variance, number of unique values, count of nulls, and count of non-null values.
  • @name column reference syntax@name (DataFrameColumn) is now the only accepted syntax for DataFrame column references in expressions. String literals (col "name") and Symbol literals (:name) used as column references are rejected at compile time with a TypeMismatch error.
  • DataFrame.melt — Reshapes a wide DataFrame into long format. Takes id columns to keep, a list of column prefixes (one per output value column), a separator character, and a name for the new index column. All columns matching each prefix are stacked; the suffix after the prefix+separator is parsed as Int if all values are numeric, otherwise kept as String. Lineage tracking records a Melt operation on each affected column and a Melted origin for the new index and value columns.
  • Opaque newtypestype Name = BaseType declares a nominally distinct type that cannot be mixed with its base type or other newtypes over the same base. Constructed as Name value and destructured via pattern matching. Generic newtypes are supported: type Validated a = a.
  • enum keyword for sum types — Sum-type declarations now use enum: enum Direction = North | South | East | West. The previous type keyword for sum types is removed.
  • VM::check_file — Type-checks a source file without executing it. Performs the same module root discovery, user-module registration, and compilation as VM::compile_file but stops before running the bytecode. Used by keel check to support project-aware type checking.
  • Let-polymorphismlet-bound lambdas without type annotations are automatically generalized to work with any type. let identity = |x| x infers ∀a. a -> a, so identity 1 and identity "hello" can both appear in the same program.
  • Recursive let bindings — A lambda bound with let can call itself by name. let fac = |n: Int| if n == 0 then 1 else n * fac (n - 1) compiles and runs correctly; no fn declaration is needed for local recursive helpers.
  • Body-constraint lambda type inference — Lambda parameters whose types can be determined from the body no longer require explicit annotations. |x| x + 1 infers x: Int from the arithmetic; only truly unconstrained parameters (like |x| x) still require annotation or context.
  • call_closure now handles PartialNativeFunctionResult.map, Maybe.map, and other higher-order stdlib functions can now accept partially-applied native functions as the mapper argument. Previously this produced a NotCallable runtime error.
  • Record spread expression{ ..base } copies all fields from a record, and { ..base, field = val } copies and overrides or adds fields. Compiles via CopyRecord + StoreNamedField. Type inference merges the extra fields into the base record type.
  • Type aliases in task signaturestask expecting MyInputs exposing MyOutputs now works when MyInputs/MyOutputs are type aliases that resolve to record types. Fields are extracted at compile time from the resolved alias.
  • Bare task declarationtask with no expecting or exposing clauses is now valid syntax. Useful as a marker that a file is a task entry point with no declared contract.
  • Aliased arguments in run expressionsrun "file.kl" { param = expr } lets callers pass a value under a different name than their local variable. Bare { x } remains shorthand for { x = x }. Mixed forms like { a, b = myval } are also supported. The type checker validates the aliased expression's type against the callee's declared parameter type.
  • VariableInfo::binding_idVariableInfo now carries a binding_id: usize field set to the VM register index for the binding. This uniquely identifies each binding even when two variables share the same name (e.g. shadowed variables in nested scopes). Consumers use it to correlate VariableInfo across updates.
  • Enum expansion in inspecthas_children, value_children, and navigate now handle OutputValue::Enum variants whose argument is a Record. Such enum values are expandable in the variable inspector: children are the record fields (flattened), and field paths are navigable directly. Scalar and tuple enum arguments remain non-expandable.
  • inspect::list_types — New list_types(scope, interner) -> Vec function in keel_core::inspect. Returns all user-defined enum types visible in scope (filtering out internal/stdlib enums whose names start with a lowercase letter). Consumers (keel-tui env inspector, keel-jupyter-kernel Positron variables pane) can display the variant list for a column's ValueLabelSet.
  • ChildInfo value-label fieldsChildInfo now carries value_label_strings: Vec and value_label_name: Option. For DataFrame columns that have a ValueLabelSet, value_label_strings contains the label strings and value_label_name contains the enum type name (e.g. "Origin"). Non-DataFrame children have empty/None values.
  • DataFrame.fromRecords compile-time value-label embedding — When DataFrame.fromRecords is called with a literal list of records that contain ValueLabel.value SomeEnum::Variant fields, the compiler now emits a compile-time HashMap constant and redirects to an internal _fromRecordsWithLabels function. Value label metadata is preserved in the resulting DataFrame without any runtime overhead.
  • inspect module — Shared runtime inspection utilities (list_variables, get_children) extracted into keel_core::inspect. Returns neutral VariableInfo / ChildInfo structs consumed by keel-repl (:env), keel-jupyter-kernel (Positron variables pane), and keel-tui (environment inspector pane). Removes duplicated variable-enumeration code from each consumer.
  • Enum shorthand for ValueLabel DataFrame column types — Enums whose variants all carry ValueLabel metadata can now be used directly as a column type: score: Score is equivalent to the verbose score: [(1, Score::Low), (2, Score::Mid), (3, Score::High)]. The compiler derives the full int+label contract from the enum's ValueLabel declarations at compile time. Plain enums (no ValueLabel) on integer columns still produce a type-mismatch error.
  • Multiline task declarationstask expecting (...) exposing (...) now supports Elm-style multiline syntax. Keywords (expecting, exposing) and their parameter lists can break across lines with indentation. Useful for tasks with many parameters. Single-line syntax remains supported.
  • DataFrame passthrough type inference for metadata operationssetVarLabel, removeVarLabel, removeValueLabels, setMeta, setColumnMeta, and setDisplayMode now preserve the input DataFrame schema in type inference (passthrough), so task expose type validation works correctly after these operations.
  • setValueLabels/setValueLabelsStrict retype column to enum in task expose contracts — When the second argument is ValueLabelSet.fromType SomeEnum, the type checker now updates the target column's type from Float to SomeEnum in the resulting DataFrame schema. Previously the column stayed Float, causing a spurious type mismatch on task expose declarations like result : DataFrame { :CO: Origin }.
  • Symbol-syntax field names in type annotations — Record and DataFrame type annotations now accept symbol-syntax field names for columns that don't conform to lowercase identifier syntax: { :Name : String }, { :"Total Revenue" : Int }, or mixed { id : Int, :Name : String }. Bare symbols also support uppercase starts (:Name), previously only lowercase (:name) was allowed. Type display renders non-lowercase fields with symbol syntax.
  • Symbol ↔ String type compatibility — Symbols (:name) are now compatible with String in the type system, so DataFrame column-name parameters accept both :name and "name". This applies to select, remove, rename, column, sort, sortDesc, unique, groupBy, join, pivot, partitionBy, orderBy, Expr.col, Expr.named, and all window/cumulative/rolling functions. Mixed lists like [:name, "age"] also work. Compile-time column validation and schema propagation recognize Symbol literals alongside String literals. 21 new tests covering type compatibility, column validation, schema propagation, error cases, and String regression.
  • Compile-time type checking for Expr.cond branch consistency — The type signature changed from [(Expr, a)] -> b -> Expr to [(Expr, a)] -> a -> Expr, so all branch values and the default must share the same type. Mismatches like Expr.cond [(cond, "minor")] 42 (String vs Int) are now caught at compile time instead of failing at Polars runtime.
  • Broader scalar ↔ Expr coercionunify_types, match_types, and are_types_compatible now accept all scalar types (Int, Float, String, Boolean, Decimal, Symbol) as compatible with Expr, not just Symbol. This enables mixed scalar/Expr lists like [(cond, col "x" * 2), (cond, 99)] to unify correctly.
  • Infix operators for DataFrame.Expr — Arithmetic (+, -, *, /, %, ^), comparison (==, !=, <, <=, >, >=), boolean (&&, ||), negation (-), and logical not (not) now work directly on Expr values. When either operand is an Expr, scalars are automatically coerced to literal expressions. && and || automatically detect Expr operands and compile to Polars' .and() / .or() instead of short-circuit evaluation. This enables natural syntax like col "price" * 1.1, col "age" >= 18 && col "active", and col "x" + col "y" instead of requiring the pipe API. 71 tests covering arithmetic, comparison, boolean logic, chained operations, filters, edge cases, and backward compatibility.
  • SPSS and pandas column labels from ParquetDataFrame.readParquet and DataFrame.readParquetColumns now also read variable labels from the spss_meta Parquet key written by pyreadstat (format: {"column_labels": {"col": "label"}}), and from pandas column metadata embedded in the pandas Parquet key. 19 tests pass including a live NIS2NL.parquet test.

Changed

32 items
  • Two-mode run path resolutionrun "file.kl" paths now follow an explicit two-mode convention. A path starting with ./ or ../ resolves relative to the calling file's directory (caller-relative). A bare path (no ./ prefix) resolves relative to the project root — the directory containing keel.toml. Using a bare path in a file with no keel.toml ancestor produces a new RunPathRequiresProjectRoot compile error. This change makes run paths unambiguous: bare paths are portable across the project, and ./ paths are anchored to the local file. All integration test fixtures updated to use ./ where appropriate.
  • Matrix stdlib module — New Matrix Float type with full arithmetic (+, -, *, .*, ./), decompositions (det, inv, solve, svd, qr, chol), reductions (trace, norm, rank), structural ops (transpose, shape, row, col, submatrix, hstack, vstack, map, fill, zeros, ones, identity, fromList, toList), and DataFrame interop (fromDataFrame, toDataFrame). Element-wise operators use .* and ./ tokens. Operations that can fail (e.g. inv, solve, chol) return Result Matrix MatrixError.
  • Stdlib renames (breaking) — The following functions were renamed for API consistency:
  • List.skipList.drop
  • List.flatMapList.andThen
  • List.varianceList.var
  • Maybe.orElseMaybe.withDefault
  • Maybe.mapOrMaybe.mapWithDefault
  • Result.orElseResult.withDefault
  • Result.mapOrResult.mapWithDefault
  • Http.jsonBodyHttp.responseJson
  • Decimal.signumDecimal.sign
  • Table.sdOfTable.stdOf
  • DateTime.fromDate (3-int form) → DateTime.fromYmd; existing DateTime.fromDate now takes a Date value and returns DateTime directly
  • DateTime.parseRfc3339 removed — use DateTime.fromYmd / DateTime.fromDate instead
  • String.split, String.contains, String.startsWith, String.endsWith argument order flipped — The needle/delimiter is now the *first* argument and the string is the *second*. String.split "," "a,b,c" replaces the old String.split "a,b,c" ",". This enables partial application: String.split "," is now a reusable tokeniser.
  • List.tail now returns Maybe [a] — Returns Nothing for an empty list instead of []. Callers must handle Just/Nothing.
  • compare functions now return OrderingDate.compare, Time.compare, DateTime.compare, and Duration.compare previously returned Int (-1, 0, 1). They now return the Ordering enum (Ordering::Lt, Ordering::Eq, Ordering::Gt). Use case result of Ordering::Lt -> … | Ordering::Eq -> … | Ordering::Gt -> … with exhaustiveness checking.
  • Duration.negate now returns Result Duration DurationError — Negation of the minimum representable duration would overflow; the function now returns Err DurationError::Overflow in that case. Duration.abs similarly returns Result Duration DurationError for API symmetry (though overflow is impossible in practice).
  • Typed error enums replace String errorsMatrixError, IOError, HttpError, DateTimeError, DurationError enums are now available as first-class types. Functions that previously returned Result … String now return Result … . Use case on the error variant for exhaustive handling.
  • @name replaces :name and col "name" for DataFrame column references — All DataFrame API functions that previously accepted String or Symbol column arguments now require DataFrameColumn (@name syntax). This is a breaking change for any code using :symbol or string column references in DataFrame expressions.
  • Newtype construction uses constructor syntaxtype UserId = Int values are now constructed as UserId 42 (applying the type name as a function) and destructured via pattern matching (case id of UserId n -> n). The previously auto-generated .wrap and .unwrap accessor functions are removed. Record newtypes use Name { field = val } and tuple newtypes use Name (a, b). This aligns newtypes with enum single-variant style and removes the inconsistency with the stdlib.
  • Declaration::Enum gains is_newtype: bool — The separate Declaration::Newtype AST variant is merged into Declaration::Enum. is_newtype = true signals type Name = Base (newtype syntax); is_newtype = false signals enum Name = Variant | .... Consumers (keel-fmt, keel-lsp) use this flag instead of matching the removed variant.
  • exposing list no longer requires parenthesesimport List exposing map, filter replaces import List exposing (map, filter). module Math exposing add, multiply replaces module Math exposing (add, multiply). exposing .. replaces exposing (..). The empty exposing () form remains valid.
  • Task expecting/exposing clauses use bracestask expecting { x : Int } exposing { result : Int } replaces the old parenthesis syntax task expecting (x : Int) exposing (result : Int). Both clauses now accept any type expression (record literal, type alias, or parameterised type). Declaration::Task AST changed: params: Vec<(String, Type)> + outputs: Vec replaced by expecting: Option + exposing: Option.
  • run argument is any record expression — The argument to run "file.kl" is now parsed as a general expression rather than the ImportFileVars enum. Any expression that produces a record is accepted: run "f.kl" { x, y }, run "f.kl" inputs (variable), run "f.kl" { ..base } (spread), or run "f.kl" (no argument). Expr::RunFile.pass_vars: ImportFileVars is replaced by arg: Option>.
  • types::format_field_name is now pub — The function that renders record field names with symbol syntax (:Name, :"name with spaces") is now publicly accessible so keel-fmt can use it when formatting type aliases.
  • Stable Rust toolchain — Removed nightly compiler requirement. Moved to stable Rust (1.85+). Deleted vendored polars-ops and ethnum patches (they were nightly-only workarounds). Dev shells updated accordingly.
  • Task syntax redesignTask.run/Task.define replaced with keyword-based syntax. Declarations use task expecting (...) exposing (...) instead of Task.define (...) -> (...). Caller uses run "file.kl" { x, y } with record-style braces instead of Task.run "file.kl" (x, y). Three keywords added: task, run, expecting (plus existing exposing).
  • Task/module-first orderingtask declarations and file-level module exposing (...) must now be the first declaration in the file, before any imports. Imports go inside the task/module scope. Previously, task declarations were allowed after imports.
  • DataFrame stdlib examples use symbol syntax — All FunctionDoc examples for column-name parameters updated to use idiomatic :name symbol syntax instead of "name" strings.
  • Module export quick-parser extracts enum variantsquick_parse_module_exports now returns ModuleExport enum (with Function and Enum variants) instead of (String, bool) tuples. Enum exports include full variant info (ModuleEnumVariant::Simple, Tuple, Record) extracted by scanning the token stream for type definitions matching exposed names. This enables downstream tools (LSP, compiler) to resolve enum constructors from user modules without full compilation.

Fixed

60 items
  • Pipe type errors now point at the failing function call, not the whole pipe expression — When a type mismatch occurred on the rhs of |> (e.g. passing a Result where a DataFrame is expected), the error span covered the entire pipe chain. The span is now narrowed to the rhs function call (right.span) so the LSP highlights only the function that received the wrong type.
  • Function type mismatch error now says "a function" — When a non-function value was passed where a function was expected (e.g. List.map list lambda with arguments swapped), the error read expected _ -> _, found [String]. It now reads expected a function (_ -> _), found [String], making clear that a callable is required. The change is in Type::display_as_expected(), used only in TypeMismatch error formatting.
  • Internal type variables no longer leak into error messages — Type errors produced inside TypeInferenceContext (via apply_function_type and other inference helpers) were drained into the compiler's error list without sanitization, causing raw names like $t1 -> $t2 to appear in user-facing messages instead of _ -> _. The drain path now applies the same sanitize_error transformation used by add_type_error.
  • Maybe equality (== / !=) now works correctly — Comparing two Maybe values with == or != previously crashed at runtime with a cryptic Enum(470, 581, Some(12)) == Enum(470, 581, Some(13)) error showing raw VM-internal interned indices. Both structural equality (Just 5 == Just 5true, Just 5 == Just 6false, Nothing == Nothingtrue) and cross-variant comparisons (Just x == Nothingfalse) now work. Comparing Nothing == Just x (where Nothing has type Maybe Unknown) was also rejected at compile time; the equality type check now uses types_compatible so Maybe Unknown is accepted alongside any Maybe T.
  • DataFrame.readDta — crash on files with large sections — The .dta reader called BufReader::fill_buf() once to peek at the next tag. After a read_exact_bytes call consumed bytes up to near a refill boundary, fill_buf() returned only 1–3 bytes (e.g. just < with no >), causing an immediate "tag boundary not found in buffer" error. Files with many columns and per-column characteristics entries (e.g. the SOEP panel study at 5,931 columns) reliably triggered this. Fixed by introducing PeekReader, a BufRead wrapper that reads byte-by-byte until > and pushes the consumed bytes back into a replay buffer so the existing parse branches see them normally.
  • DataFrame.readJsonl / DataFrame.readJsonlColumns — automatic struct flattening — Both NDJSON readers now automatically expand nested JSON object columns into _-separated top-level columns immediately after reading, matching Jsonl.parseDataFrame behaviour. For example, a column address: {city, zip} becomes @address_city and @address_zip — directly accessible with @col syntax without any manual unnesting. List columns (arrays of objects) are left as-is. readJsonlColumns additionally accepts flattened sub-field names (e.g. @address_city) as selectors: the reader back-maps the name to the ancestor struct, loads only that struct, flattens it, and trims to the requested column — enabling memory-efficient sub-field selection from large files.
  • DataFrame.readJsonl / DataFrame.readJsonlColumns — reduced peak memory on large files — Both NDJSON readers now use Polars' low-memory mode (low_memory(true)), which parses the file in smaller chunks rather than buffering the full input before building Arrow columns. This significantly reduces peak RSS when loading large JSONL files (e.g. a 2.5 GB file that previously OOM-killed the process).
  • DataFrame.readJsonl docs warn about nested-column memory spikes — Added a note to the stdlib docs for readJsonl explaining that files with deeply nested columns (e.g. a column that is a list of structs with variable keys) can cause transient allocations many times the file size during parsing, and recommending readJsonlColumns to skip heavy nested fields.
  • DataFrame.readJsonlColumns — true schema-based column pruning — Previously readJsonlColumns called lf.collect() to read all columns and then selected the requested subset with df.select(). On files with variable-schema List columns (e.g. officers, addresses) this caused OOM even when the user selected only flat scalar columns — the heavy nested fields were fully parsed into Arrow arrays before being discarded. A lf.select().collect() intermediate fix did not help because Polars' NDJSON executor does not implement projection pushdown at the byte-reader level (file_options.with_columns is always None for NDJSON). The real fix: the reader now infers the full schema from the first 100 rows, builds a subset Schema containing only the requested columns, and passes it via with_schema(). This tells simd-json / the Arrow builder to ignore all fields not in the schema during deserialization — no Arrow buffers are ever allocated for the skipped columns. Measured reduction: 68 GB → 2 MB peak RSS on a 2.1 GB JSONL file with deeply nested officers and metadata columns.
  • Env pane preview shows full error message for Err string argsformat_value_preview previously dropped the entire error message when the formatted "{variant} {message}" exceeded the 50-char preview limit, rendering e.g. "Err FileError ..." with no file path or error detail. String arguments to enum variants are now always shown in full (no length cap), so the preview shows the complete message such as FileError "No such file or directory (os error 2): /home/user/data/events.jsonl".
  • Jsonl.parseDataFrame automatically flattens nested struct columns — Nested JSON objects in JSONL input (e.g. {"address": {"city": "Berlin"}}) produced opaque Polars Struct-typed columns that rendered as {} in the DataFrame preview pane and could not be sorted, filtered, or selected. parseDataFrame now recursively expands all struct columns into prefixed top-level columns (e.g. address_city, address_zip) before returning the DataFrame. Deep nesting (2+ levels) is handled by repeating the expansion until no struct columns remain.
  • Instruction::Closure captures only referenced variables — Previously, every closure creation captured all of self.variables (the entire module environment) into the closure's env. For programs that bind large values (e.g. let file_contents = Jsonl.readFile …) and then use List.map over them, each call_closure call cloned the full env — O(n × file_size) memory usage — causing OOM and process kill on even modest inputs. The fix scans the closure body for LoadVarReg instructions and captures only the variables whose names are referenced there.
  • Instruction::Call full application no longer clones entire module env — The Instruction::Call handler (direct bytecode calls, e.g. calling a let-bound lambda from keel code) seeded the callee's locals from self.variables.clone() — copying the whole module env on every call. The fix starts locals from env.clone() (the closure's already-filtered captured env) and patches in only the body-referenced variables missing from env (forward references and recursive self-references).
  • Inline record parser now accepts leading-comma multiline style inside braces — A brace-delimited record written with leading commas ({ a = 1\n , b = 2\n }) was silently dropping all fields after the first when the field values were followed by a newline before the comma (e.g. after a ? operator). The inline_record separator now skips any leading newlines before the comma, so this style works correctly in all contexts including newtype constructors.
  • Destructured run bindings now carry the correct field type in the parser scopelet { x } = run "file.kl" previously registered x with the full expose-record type { x: T } instead of T. Each field variable now gets its own field type, so hover and inlay hints show the right type.
  • User module available in all sequential run child scopes — When two or more run files each import the same user module (e.g. import Config exposing ProjectConfig), the second and later imports no longer fail with Module 'Config' not found. Previously the module's exports were only registered in the first child scope; when that scope was torn down the exports disappeared and later child scopes found nothing. The compiler now caches exports in FileContext after first compilation and re-inserts them into each subsequent child scope.
  • Imported record-type constructor no longer triggers BareModuleName errorimport Mod exposing RecordType followed by RecordType { field = val } previously failed with 'RecordType' is a module name, not a value. The quick module scanner (scan_enum_definitions) only matched Token::Enum, so type Name = { ... } newtypes were invisible to it. They were registered with zero variants, causing is_newtype_enum() to return false and the parser to fall through to the wrong error. The scanner now also handles Token::Type, dispatching to a new scan_record_fields helper for record bodies.
  • Record newtype constructor: brace allowed on next line — Parsing a record newtype constructor with the opening brace on the next line now works correctly. Previously the parser rejected MyType\n{ field: value } with a parse error. Field access on the result (e.g. MyType { field: value }.field) is also fixed.
  • DataFrame.readJsonl / readJsonlColumns URL path peak memory reduced — When reading from a URL, the raw download buffer (Vec) was kept alive through the entire LazyJsonLineReader::collect() call, causing peak RAM usage of 3–5× the file size. The buffer is now dropped immediately after std::fs::write completes and before collect() is called, eliminating the overlap.
  • Runtime error values now show useful information in REPL output and env paneResult(Err(DataFrameError::FileError("..."))) and similar runtime errors were displayed with no message in the REPL (text/plain MIME entry used an internal Display impl) and only the variant name in the env pane (format_value_preview returned a hard-coded "Err ..."). The REPL now uses to_keel_syntax_pretty() for the text/plain entry (e.g. Err FileError "No such file or directory"), and the env pane now shows the variant name and error message for both Enum arguments and Result(Err(_)) inner values.
  • --features stata buildDataFrameMetadata was missing from the import list in src/stdlib/dataframe_dta/write.rs's test module, causing all --features stata builds to fail with an E0433 compile error. Added the missing import.
  • list_types no longer includes stdlib enumsOrdering, DurationError, DateTimeError, HttpError, MatrixError, IOError, JoinType, and DataFrameError are stdlib-registered enums that were incorrectly appearing in the user-defined type list exposed by inspect::list_types. They are now excluded. This fixes a crash in the TUI env pane when navigation keys were pressed on an empty user environment.
  • DataFrame stdlib doc examplesreadCsvColumns, readJsonColumns, readParquetColumns, and join examples updated from old symbol syntax ([:col]) to column literal syntax ([@col]).
  • Windows cross-compilationbuild.rs now skips the ReadStat C build when CARGO_CFG_TARGET_OS is windows. iconv.h is unavailable in the standard mingw-w64 cross-compilation sysroot; Stata .dta file support is not functional on Windows, so the build is safely skipped rather than failing.
  • Curried fn with fewer explicit params than its type signature acceptedfn f : Int -> Int -> Int with body fn f x = |y: Int| x + y previously produced a spurious TypeMismatch (expected: Int, found: Function(Int, Int)). The parser now correctly computes the remaining return type as the right-associative suffix of the type list, so a one-parameter body returning a lambda is fully valid.
  • Three-level nested lambda bodies now execute correctlyfn f x = |y: Int| |z: Int| x + y + z compiled successfully but failed at runtime with NotCallable("Uninitialized"). The jump-rebase pass (adjust_jump_targets) was collecting body ranges from all Closure instructions in a function body, including transitively-nested ones whose ranges were already relative. Those stale relative values caused the jump-over for the outer lambda's body to be skipped, leaving it pointing at FunctionReturn instead of the Closure instruction. Only direct-child Closures are now used for the skip guard.
  • Pipe operators now parse correctly inside list literals — A pipe expression like [@col |> Expr.add 1] previously caused the |> to escape the brackets and bind to the outer context, producing a spurious TypeMismatch. Pipe chains are now correctly confined within list element positions.
  • Multiple consecutive blank lines in block bodies no longer cause parse failures — Two or more consecutive Newline tokens inside let / fn / module / task blocks previously triggered an unexpected token error. The block parser now tolerates runs of blank lines between statements.
  • DataFrame.melt suffix separator strip — the integer suffix parser now strips the separator prefix before parsing, so melt no longer fails when separator is a non-empty string.
  • DataFrame.melt after DataFrame.rename no longer fails at runtime — polars DataFrame::rename updated series names in-place but left the internal ArrowSchema field names stale. unpivot2 resolved columns through the schema and failed with a spurious column-not-found error. The fix rebuilds the DataFrame from its series before unpivoting.
  • Multiline pipe alignment check no longer bleeds across top-level statements — A pipe chain at one indentation level in a let binding incorrectly caused a PipeIndentationMismatch error for a shallower pipe chain in the next top-level statement. The pipe indent state is now reset after each top-level node is parsed.
  • Type checker now rejects missing ? on Result-returning I/O functions — When a function was declared to return DataFrame T but its body returned Result DataFrame DataFrameError (because ? was omitted on a readParquet, readCsv, or readJson call), the type checker silently accepted the mismatch. Two gaps were closed: the body-type snapshot now syncs the string interner before re-inferring the body type (fixing top-level functions), and the module function compilation path now performs the same body-type check (fixing functions inside module exposing blocks). The LSP now reports the error in both cases.
  • a |> (expr)? desugars correctly — Pipe expressions where the right-hand side is a parenthesized ?-application (e.g. data |> (DataFrame.readParquet path)?) now compile correctly. Previously the nested closure jump-target was adjusted twice, causing a VM jump offset error. The desugar pass now only adjusts the inner closure's jump target once.
  • List.sum / List.product now accept [Decimal] — Previously only [Int] and [Float] lists were accepted; passing a Decimal list produced a type error.
  • DataFrame.Expr.col signature changed from String -> Expr to DataFrameColumn -> Exprcol "name" now produces a compile-time type error. Use @name directly or col @name for the explicit wrapper form.
  • ValueLabel enum variant uninitialized when reimported in a task scope — When a main file imported only a subset of a module's enums (e.g. import Labels exposing Origin), the remaining enums were removed from scope via remove_enum. A task child scope that reimported the same module hit the already-compiled fast path and the Expose::Type branch did nothing, leaving enum_variant_types absent. compile_enum_constructor then found no variant types and emitted an uninitialized register; at runtime extract_value_label threw TypeMismatch. Fixed by persisting all exported enum definitions in a new module_enum_registry on FileContext at module-compile time, and re-registering from the registry whenever Expose::Type runs.
  • Open-record newtype respects SchemaKind::Open in DataFrame validationtype T = { :col : X, .. } used as a DataFrame schema annotation (DataFrame T) now correctly allows extra file columns. Previously the schema validator always used SchemaKind::Closed for named newtype schemas, producing spurious "unexpected column" errors for every column not explicitly listed in T.
  • Open record flag preserved in EnumVariantType::RecordEnumVariantType::Record now carries a boolean is_open field. Previously Type::Record and Type::OpenRecord were both collapsed into the same tuple, silently dropping the open/closed distinction. All match sites destructure the new field; the two sites that reconstruct a Type from a newtype constructor now emit Type::OpenRecord when is_open is true.
  • Record type aliases in file-level module exposingtype Name = { ... } declarations inside file-level module bodies are now correctly tracked in type_names_declared. Previously they fell through to the error path, causing spurious ModuleInvalidExposedType and ModuleInvalidExposedFunction errors even when the type and function were correctly defined.
  • Div type inference — Integer division (/ and //) now correctly infers Float return type when both operands are Int, matching runtime behavior.
  • Boolean operator type safety&& and || now correctly require Bool operands at the type level; passing non-Bool values produces a compile-time TypeMismatch error.
  • TypeMismatch span narrowing for record field access — When a function call argument comes from a record field access (record.field), the type-mismatch error span now points to the field access expression rather than the entire argument expression.
  • User-module functions exposed via import M exposing (fn) now have the correct type — When a function was imported with exposing (fn_name), its type was incorrectly reconstructed as Function(arg, Function(arg, ret)) (one extra curry layer) instead of Function(arg, ret). This caused a spurious InlineFileTypeMismatch when a run file received the return value of such a function as a task parameter. Fixed by using the full function type stored by lookup_fn directly rather than wrapping it again.
  • Blank lines between block statements no longer cause "variable not found" — A blank line between block statements produces two consecutive Newline tokens. The block body loop used .or_not() to consume the trailing newline after each statement, so the second Newline was left in the stream, causing the next iteration to fail and the block to terminate early. Fixed by changing .or_not() to .repeated() in both parse_expr_block and parse_expr_block_optional_dedent.
  • let bindings after import with multiline |> chains now resolve correctly — When a function body used import Module, then bound a let via a multiline |> continuation using Module.* functions, the variable was not in scope for subsequent statements. parse_expr_block was speculatively tried as an argument to the function call; its block_start pushed to indent_stack and created a new scope before block_body failed, but chumsky only rewinds the input position on backtrack — not mutable state. The orphaned indent and spurious scope caused the variable declaration to land in the wrong scope. Fixed by guarding block_body with .or_not() and explicitly undoing block_start's side effects when the body is absent, in both parse_expr_block and parse_expr_block_optional_dedent.
  • let bindings in function bodies with indented |> continuations now resolve correctly — When a let binding's RHS spanned multiple lines via a continuation pipe (|> indented on the next line), the orphan Dedent token left by multiline_op appeared between block statements, causing the block body loop to terminate early. Variables bound before the pipe were no longer in scope for subsequent statements. Fixed by draining orphan dedents between statements in both parse_expr_block and parse_expr_block_optional_dedent.
  • import inside file-level module bodies is now supported — File-level module declarations (a bare module exposing (...) header with the rest of the file as the body) failed to parse import statements that appeared inside the module. The module body parser's parse_declarations choice did not include declaration_import, so any name defined after an import was invisible to the module's exposing list, producing spurious ModuleInvalidExposedType / ModuleInvalidExposedFunction errors. Fixed by adding declaration_import to the choices in parse_declarations. Also fixed a spurious ModuleFileNotAtTop error caused by reading at_file_start too late (after nested parsers had reset it).
  • OpenRecord field access no longer produces a spurious type errorinfer_record_access_type now handles OpenRecord correctly: declared fields are looked up and returned, and undeclared fields return Unknown without emitting a TypeMismatch error.
  • Multiline type annotations in let bindings parse correctly — The indented_record parser now makes the trailing Dedent token optional. When a record type annotation is followed by = rhs on the same line as the closing } (e.g. let data :\n { id: Float\n } = rhs), no Dedent is emitted by the lexer, so requiring one caused a parse failure. Making it optional covers both cases: type aliases (where } is on its own line before a Dedent) and let bindings (where } is followed inline by = rhs).
  • inspect::list_variables uses register-based lookup — Variable values are now fetched via vm.registers().get(register.val()) instead of the removed vm.variables() map. Uninitialized registers (lambdas compiled but not yet called) are skipped so they do not appear as live REPL bindings.
  • Multiline record types in type annotations — Record types (e.g. { a: Int, b: Int }) spanning multiple lines now parse correctly inside type annotations, including inside task expecting/exposing clauses, Maybe, DataFrame, and other parameterized types. Both Elm-style leading-comma layout and trailing-comma layout are supported. Previously, indented records triggered "found indent" parse errors.
  • Compile-time validation of custom types in task signatures — Custom types used in task expecting/task exposing clauses (e.g. task expecting (c : Color)) are now validated at compile time. If a type name is undefined, the compiler emits a clear UndeclaredType error with fuzzy-match suggestions drawn from visible enums and type aliases. Built-in types (Int, Bool, String, Float, etc.) are always accepted. The check recurses into List, Tuple, DataFrame, Record, and OpenRecord type arguments.
  • Multiline task params with paren on keyword linetask expecting (\n data : Int\n) now parses correctly. The parameter and exposing list parsers now handle Indent/Dedent tokens inside parentheses, not just Newline. Previously this layout produced "found indent 4, expecting something else".
  • Task expose type validation — The compiler now validates task exposing variables: if a declared output is never assigned in the body, a TaskExposeNotBound error is reported with a hint to add a let binding. If the assigned type doesn't match the declared type, a TaskExposeTypeMismatch error is reported. Previously, unbound or mistyped task outputs were silently ignored. This validation now also runs when a task file is compiled standalone (not via run), so the LSP can report TaskExposeNotBound errors when editing a task file directly. Additionally, DataFrame schema mismatches with enum/contract columns are now fully checked via run "...": a declared column that is missing from the actual result (e.g. :id : Int declared but DataFrame.select dropped it) now correctly reports TaskExposeTypeMismatch instead of being silently bypassed by the contract-type shortcut. Note: when DataFrame.applyExprs is in the pipeline, type inference loses column information (by design), so column-presence errors may only be caught at runtime in those cases.
  • Task parameter type checking uses are_types_compatible — Type checking for run file parameters now uses are_types_compatible instead of direct != comparison, correctly handling compatible types (e.g. Symbol/String coercion).
  • Enums not visible inside function bodies — Enum types imported via exposing or defined at file scope were invisible inside fn and task declaration bodies because the parser's function-scope boundary only allowed Symbol::Function to pass through from parent scopes. Now Symbol::Enum is also allowed, matching the compiler's behavior. This fixes "Enum X not found" errors when using imported enums inside task or function bodies.
  • task declaration params not in scopetask declaration parameters (now task expecting (...)) are registered in both the parser scope (as Symbol::Variable) and the compiler scope (via insert_var). Previously, opening a task file directly in the editor caused "variable not found" errors for declared parameters because only the caller path registered them.
  • Stack overflow on nested if-else — The control_flow_deeply_nested_if test now runs with an 8 MB thread stack to accommodate large debug-mode stack frames from recursive compile_expr/infer_type calls.
  • Project-aware module resolutionfind_project_module_root walks up to keel.toml to determine the source root (from the main field), so files in subdirectories (e.g. src/variables/age.kl) can import user modules from src/modules/. Previously the module root was the parent directory of each file, which broke imports from non-sibling directories.
  • User module enums in run task files — Task files loaded via run that import user modules (e.g. import Labels exposing (Cohort)) now parse correctly. Previously compile_run_file used parse_file_lenient with a blank parser state, so enum constructors like Cohort::Young failed with "Enum not found". Added parse_file_lenient_with_state and pre-register user modules before parsing task files.

Removed

8 items
  • DataFrame.readDta_old, DataFrame.readDtaColumns_old, DataFrame.writeDta_old — The three ReadStat C FFI-backed benchmark functions are removed. The pure-Rust readDta, readDtaColumns, and writeDta are the maintained implementations. Downstream: the stata Cargo feature, the vendor/ReadStat/ C source tree (685 KB, 216 files), flate2, libz-sys, and cc build dependencies are all gone. The pure-Rust DTA functions are now compiled unconditionally on all platforms (no longer gated behind --features stata).
  • :name symbol syntax in DataFrame column contexts — Symbol literals (:name) are no longer accepted where a DataFrameColumn or Expr column reference is expected. The Symbol→Expr auto-coercion rule in the type checker has been removed, as have all Expr::Symbol branches in extract_string_list, column, rename, setValueLabels, setVarLabel, withLag, withRolling, and column validation. Use @name for all DataFrame column references. Symbol literals remain valid for non-DataFrame uses (equality tests, symbol lists, record type annotations). This is a breaking change for any code using :symbol as a column reference.
  • type alias — Transparent type aliases are removed. Use opaque type Name = Base for nominal safety, or inline the type directly.
  • Name.wrap / Name.unwrap — The auto-generated newtype accessor functions are removed. Use constructor syntax (Name val) and pattern matching (case x of Name n -> n) instead.
  • mut in task outputs — Mutable propagation from task outputs removed. All task outputs are now immutable.
  • Task stdlib module — The Task module is removed. run and task are now language keywords handled by the parser/compiler directly.
  • DataFrame.filter (closure-based) — Removed the closure-based DataFrame.filter (|r| ...) function and the entire expr_compiler module (~1,862 lines) that compiled closures to Polars expressions. Use DataFrame.filter with Expr syntax instead: df |> DataFrame.filter (:age > 18).
  • Legacy named filter functions — Removed filterEq, filterNeq, filterGt, filterGte, filterLt, filterLte, filterIn. Use DataFrame.filter with Expr syntax instead (e.g., DataFrame.filter (:col == val), DataFrame.filter (:col |> Expr.in [vals])).

Showing page 1 of 5 (5 versions)