Changelog
All notable changes to Keel are documented here. This project follows Keep a Changelog and Semantic Versioning.
Unreleased
Added
67 itemsDataFrame.checkSchema— runtime schema validation — NewDataFrame.checkSchema : Schema -> DataFrame -> Result DataFrame DataFrameErrorvalidates a DataFrame's column types against a declared schema at runtime.Schemais a record where each field name is a column name and the value is the expected type tag (e.g.{ age : Int, name : String }). ReturnsOk dfwhen all declared columns exist with matching types; returnsErr (DataFrameError "SchemaError" ...)listing every mismatch. Columns not mentioned in the schema are ignored. Useful for asserting postconditions on loaded or transformed data.==and!=for Tuple, List, and Record values — Structural equality now works for tuples, lists, and records at both compile time and runtime. The type checker (is_eq_comparable) and VM comparison handler (EqRegRegReg) both accept these types. Equality is element-wise recursive:(1, "a") == (1, "a")→true,[1, 2] == [1, 3]→false,{ x = 1 } == { x = 1 }→true. Mismatched types (e.g. comparing a Tuple to a DataFrame) are rejected at compile time.vm::estimate_register_sizeandVM::heap()— heap-aware size estimation — Newpub fn estimate_register_size(rv: &RegisterValue, heap: &[HeapSlot]) -> usizeinkeel_core::vmwalks the VM heap recursively with a cycle guard to compute the true in-memory size of any register value, including nested lists, records, tuples, closures, and polars DataFrames (viaestimated_size()).VM::heap()accessor exposes the heap slice to downstream crates.inspect::list_variablesnow uses heap-aware sizes instead of the OutputValue estimate, giving accurate sizes for local sessions.DataFrame.readJsonlandDataFrame.readJsonlColumns— New I/O functions for NDJSON (newline-delimited JSON / JSON Lines) files.readJsonlreads a full NDJSON file into a DataFrame;readJsonlColumnsreads and projects only the specified columns. Both accept local file paths and remote URLs and returnResult DataFrame DataFrameError. Backed by Polars'LazyJsonLineReader. Compile-time schema inference works for literal paths (.jsonl,.ndjsonextensions). LSP path completion and pre-scan updated. UsereadJsonlfor NDJSON files where each line is an independent JSON object; usereadJsonfor JSON array files ([{...}, ...]).DataFrame.Exprregex string functions — Five new expression functions for pattern-based string operations on DataFrame columns:strMatches(Bool column — true if cell matches the regex),strCapture(extract nth capture group as String),strCaptureAll(all non-overlapping matches as a List column),strReplaceAll(replace all regex matches),strCount(count non-overlapping matches). All map to Polars regex string methods and use theregexcrate syntax. Invalid patterns are handled gracefully:strMatches/strCapture/strCaptureAllfill the column with null;strReplaceAll/strCountpropagate an error throughapplyExprs.ServerPermspermissions snapshot andeffective_perms()— New serialisable structServerPerms(with sub-structsServerIoPerms,ServerHttpPerms,ServerDataframePerms) captures the effective security restrictions at process startup by reading theKEEL_*env vars.effective_perms()is the public API;effective_perms_from(closure)is the testable inner function used in unit tests to avoid process-global env mutation.Regexstdlib module — new module for compiled regular expression patterns.Regex.compiletakes a pattern string and returnsResult Regex String, placing invalid-pattern errors explicitly in the type system. Matching functions:Regex.test(Bool),Regex.find(Maybe String),Regex.findAll([String]),Regex.captures(Maybe [Maybe String]). Transformation functions:Regex.replace,Regex.replaceAll(first/all match substitution),Regex.split. Regex values are opaqueNativeObjectwrappers — they compose throughResult/Maybechains and work withList.mappartial application. Supports the fullregexcrate syntax including Unicode properties (\p{L}).Processstdlib module — new module collects all process-control operations:Process.exit(exit with code),Process.wait(sleep N ms),Process.pid(process ID),Process.hostname(machine hostname),Process.uptime(ms since program start),Process.env(get env var),Process.setEnv(set env var),Process.args(command-line args). The four operations previously inIO(IO.exit,IO.env,IO.setEnv,IO.args) are removed fromIOentirely — useProcess.*instead.IOnow covers only file, directory, console, and path operations.type aliastransparent type synonyms —type alias Name = Typedeclares a transparent structural synonym. The alias name and the underlying type are fully interchangeable: values flow between them without any constructor or pattern match. Re-enables the previously stubbed parser forDeclaration::TypeAlias; addsToken::Aliasas a keyword.DataFrame.fromRecordsaccepts[AliasName]lists once the alias expands to the matching record type. Parameterized aliases (type alias Box a = { value : a }) and module-scoped aliases are supported. 52 integration tests added.Http.responseText— New functionResponse -> Stringthat extracts the raw response body as a string without any parsing. Companion toHttp.responseJsonfor APIs that return plain text, HTML, CSV, or other non-JSON formats. Pipe-friendly:Http.get url |> Http.send |> Result.map Http.responseText.Matrix.rangeVec— Creates a column vector[start, start+1, …, end]asMatrix Floatwith no interpreter overhead. Both ends inclusive. Use together withMatrix.outerAddto build index-formula matrices:Matrix.outerAdd— Computes the outer sum of two column vectors:M[i,j] = a[i] + b[j]. Both inputs must ben×1column vectors. Runs in pure Rust with no per-element interpreter calls, making it significantly faster thanMatrix.fromFnfor index-formula matrices.Matrix.tridiag— Creates an n×n tridiagonal matrix with a specified diagonal value and off-diagonal value. ReplacesfromFnwith a conditional for the common tridiagonal pattern:Orderingenum —Ordering::Lt,Ordering::Eq,Ordering::Gt. Import withimport Ordering. Used as the return type of allcomparefunctions.DateTime.fromYmd— Constructs aDateTimefrom year, month, day integers. ReturnsResult DateTime DateTimeError.Math.json.encodeerror propagation —Json.encodenow returnsResult String JsonErrorinstead of silently returning"null"for un-encodable values.Math.sqrt,Math.pow,Math.log, etc. returnMaybe Float— Functions that can produce non-finite results now returnNothinginstead ofNaN/Infinity.DataFrame.applyExprsnow takes[(DataFrameColumn, Expr)]instead of[Expr]— Each entry is a tuple of a column reference and an expression. The oldExpr.named "colname" exprpattern is no longer required forapplyExprs; use(@colname, expr)instead.Expr.namedremains available for aggregation contexts. This is a breaking change for any code callingapplyExprswith a plain[Expr]list.DataFrame.sortandDataFrame.sortDescnow accept a list of columns — signatures changed fromDataFrameColumn -> DataFrame -> Result DataFrame DataFrameErrorto[DataFrameColumn] -> DataFrame -> Result DataFrame DataFrameError. Passing a bare column reference (e.g.DataFrame.sort @name) is now a compile-time type error; wrap it in a list:DataFrame.sort [@name]. Multiple columns are sorted left-to-right:DataFrame.sort [@department, @salary]. All column names in the list are validated at compile time against typed DataFrames.?postfix operator —expr?unwrapsOk vtov; short-circuits withErr eif the result is an error. Applying?to a non-Resulttype is a compile-timeTypeMismatcherror. Works at the top level and inside function bodies, with pipes, and with let bindings.Result.isOk/Result.isErr— Predicate functions that returnBool. Useful inList.filter Result.isOkpatterns.Result.filter—(a -> Bool) -> e -> Result a e -> Result a e. KeepOkonly if the predicate passes; replace with the givenErrvalue otherwise.Result.flatten—Result (Result a e) e -> Result a e. Removes one layer ofResultnesting.Result.zip—Result a e -> Result b e -> Result (a, b) e. Combines twoResultvalues into aResultof a tuple; the firstErrwins.Result.mapOr—b -> (a -> b) -> Result a e -> b. Applies the function toOk, or returns the default forErr.Result.all—[Result a e] -> Result [a] e. Collects a list ofResultvalues into a singleResultof a list; the firstErrshort-circuits.Maybe.isJust/Maybe.isNothing— Predicate functions that returnBool. Useful inList.filter Maybe.isJustpatterns.Maybe.filter—(a -> Bool) -> Maybe a -> Maybe a. KeepJustonly if the predicate passes; replace withNothingotherwise.Maybe.flatten—Maybe (Maybe a) -> Maybe a. Removes one layer ofMaybenesting.Maybe.zip—Maybe a -> Maybe b -> Maybe (a, b). Combines twoMaybevalues into aMaybeof a tuple;Nothingwins.Maybe.mapOr—b -> (a -> b) -> Maybe a -> b. Applies the function toJust, or returns the default forNothing.Maybe.all—[Maybe a] -> Maybe [a]. Collects a list ofMaybevalues into a singleMaybeof a list; the firstNothingshort-circuits.- DataFrame functions return
Result DataFrame DataFrameError— All DataFrame operations that can fail (sort, unique, head, tail, slice, sample, join, concat, pivot, melt, quantile, mean, median, std, var, mode, corr, cov, describe, collect, all window functions, and label operations) now returnResult DataFrame DataFrameErrorinstead of crashing at runtime. Usecaseexpressions or the?operator to handle errors. TheDataFrameErrorenum is available in the stdlib scope. DataFrame.corr/DataFrame.cov— Compute the Pearson correlation matrix and covariance matrix for all numeric columns. Return a squareDataFrameindexed by column name.DataFrame.mean/DataFrame.median/DataFrame.std/DataFrame.var— Column-wise descriptive statistics: mean, median, standard deviation, and variance for all numeric columns. Each returns a single-rowDataFrame.DataFrame.mode— Column-wise mode (most frequent value) for all columns. Returns aDataFramewith one row per mode value per column.DataFrame.quantile— Column-wise quantile computation. Accepts aFloatquantile (0.0–1.0) and returns a single-rowDataFrame. Validates that the quantile is in range; returnsInvalidArgumentotherwise.DataFrame.summaryextensions — Six additional statistical rows appended to the summary output: median, standard deviation, variance, number of unique values, count of nulls, and count of non-null values.@namecolumn reference syntax —@name(DataFrameColumn) is now the only accepted syntax for DataFrame column references in expressions. String literals (col "name") and Symbol literals (:name) used as column references are rejected at compile time with aTypeMismatcherror.DataFrame.melt— Reshapes a wide DataFrame into long format. Takes id columns to keep, a list of column prefixes (one per output value column), a separator character, and a name for the new index column. All columns matching each prefix are stacked; the suffix after the prefix+separator is parsed asIntif all values are numeric, otherwise kept asString. Lineage tracking records aMeltoperation on each affected column and aMeltedorigin for the new index and value columns.- Opaque newtypes —
type Name = BaseTypedeclares a nominally distinct type that cannot be mixed with its base type or other newtypes over the same base. Constructed asName valueand destructured via pattern matching. Generic newtypes are supported:type Validated a = a. enumkeyword for sum types — Sum-type declarations now useenum:enum Direction = North | South | East | West. The previoustypekeyword for sum types is removed.VM::check_file— Type-checks a source file without executing it. Performs the same module root discovery, user-module registration, and compilation asVM::compile_filebut stops before running the bytecode. Used bykeel checkto support project-aware type checking.- Let-polymorphism —
let-bound lambdas without type annotations are automatically generalized to work with any type.let identity = |x| xinfers∀a. a -> a, soidentity 1andidentity "hello"can both appear in the same program. - Recursive let bindings — A lambda bound with
letcan call itself by name.let fac = |n: Int| if n == 0 then 1 else n * fac (n - 1)compiles and runs correctly; nofndeclaration is needed for local recursive helpers. - Body-constraint lambda type inference — Lambda parameters whose types can be determined from the body no longer require explicit annotations.
|x| x + 1infersx: Intfrom the arithmetic; only truly unconstrained parameters (like|x| x) still require annotation or context. call_closurenow handlesPartialNativeFunction—Result.map,Maybe.map, and other higher-order stdlib functions can now accept partially-applied native functions as the mapper argument. Previously this produced aNotCallableruntime error.- Record spread expression —
{ ..base }copies all fields from a record, and{ ..base, field = val }copies and overrides or adds fields. Compiles viaCopyRecord+StoreNamedField. Type inference merges the extra fields into the base record type. - Type aliases in task signatures —
task expecting MyInputs exposing MyOutputsnow works whenMyInputs/MyOutputsare type aliases that resolve to record types. Fields are extracted at compile time from the resolved alias. - Bare
taskdeclaration —taskwith noexpectingorexposingclauses is now valid syntax. Useful as a marker that a file is a task entry point with no declared contract. - Aliased arguments in
runexpressions —run "file.kl" { param = expr }lets callers pass a value under a different name than their local variable. Bare{ x }remains shorthand for{ x = x }. Mixed forms like{ a, b = myval }are also supported. The type checker validates the aliased expression's type against the callee's declared parameter type. VariableInfo::binding_id—VariableInfonow carries abinding_id: usizefield set to the VM register index for the binding. This uniquely identifies each binding even when two variables share the same name (e.g. shadowed variables in nested scopes). Consumers use it to correlateVariableInfoacross updates.- Enum expansion in inspect —
has_children,value_children, andnavigatenow handleOutputValue::Enumvariants whose argument is aRecord. Such enum values are expandable in the variable inspector: children are the record fields (flattened), and field paths are navigable directly. Scalar and tuple enum arguments remain non-expandable. inspect::list_types— Newlist_types(scope, interner) -> Vecfunction inkeel_core::inspect. Returns all user-defined enum types visible in scope (filtering out internal/stdlib enums whose names start with a lowercase letter). Consumers (keel-tui env inspector, keel-jupyter-kernel Positron variables pane) can display the variant list for a column's ValueLabelSet.ChildInfovalue-label fields —ChildInfonow carriesvalue_label_strings: Vecandvalue_label_name: Option. For DataFrame columns that have aValueLabelSet,value_label_stringscontains the label strings andvalue_label_namecontains the enum type name (e.g."Origin"). Non-DataFrame children have empty/Nonevalues.DataFrame.fromRecordscompile-time value-label embedding — WhenDataFrame.fromRecordsis called with a literal list of records that containValueLabel.value SomeEnum::Variantfields, the compiler now emits a compile-timeHashMapconstant and redirects to an internal_fromRecordsWithLabelsfunction. Value label metadata is preserved in the resulting DataFrame without any runtime overhead.inspectmodule — Shared runtime inspection utilities (list_variables,get_children) extracted intokeel_core::inspect. Returns neutralVariableInfo/ChildInfostructs consumed by keel-repl (:env), keel-jupyter-kernel (Positron variables pane), and keel-tui (environment inspector pane). Removes duplicated variable-enumeration code from each consumer.- Enum shorthand for ValueLabel DataFrame column types — Enums whose variants all carry
ValueLabelmetadata can now be used directly as a column type:score: Scoreis equivalent to the verbosescore: [(1, Score::Low), (2, Score::Mid), (3, Score::High)]. The compiler derives the full int+label contract from the enum'sValueLabeldeclarations at compile time. Plain enums (noValueLabel) on integer columns still produce a type-mismatch error. - Multiline task declarations —
task expecting (...) exposing (...)now supports Elm-style multiline syntax. Keywords (expecting,exposing) and their parameter lists can break across lines with indentation. Useful for tasks with many parameters. Single-line syntax remains supported. - DataFrame passthrough type inference for metadata operations —
setVarLabel,removeVarLabel,removeValueLabels,setMeta,setColumnMeta, andsetDisplayModenow preserve the input DataFrame schema in type inference (passthrough), so task expose type validation works correctly after these operations. setValueLabels/setValueLabelsStrictretype column to enum in task expose contracts — When the second argument isValueLabelSet.fromType SomeEnum, the type checker now updates the target column's type fromFloattoSomeEnumin the resulting DataFrame schema. Previously the column stayedFloat, causing a spurious type mismatch on task expose declarations likeresult : DataFrame { :CO: Origin }.- Symbol-syntax field names in type annotations — Record and DataFrame type annotations now accept symbol-syntax field names for columns that don't conform to lowercase identifier syntax:
{ :Name : String },{ :"Total Revenue" : Int }, or mixed{ id : Int, :Name : String }. Bare symbols also support uppercase starts (:Name), previously only lowercase (:name) was allowed. Type display renders non-lowercase fields with symbol syntax. - Symbol ↔ String type compatibility — Symbols (
:name) are now compatible with String in the type system, so DataFrame column-name parameters accept both:nameand"name". This applies toselect,remove,rename,column,sort,sortDesc,unique,groupBy,join,pivot,partitionBy,orderBy,Expr.col,Expr.named, and all window/cumulative/rolling functions. Mixed lists like[:name, "age"]also work. Compile-time column validation and schema propagation recognize Symbol literals alongside String literals. 21 new tests covering type compatibility, column validation, schema propagation, error cases, and String regression. - Compile-time type checking for
Expr.condbranch consistency — The type signature changed from[(Expr, a)] -> b -> Exprto[(Expr, a)] -> a -> Expr, so all branch values and the default must share the same type. Mismatches likeExpr.cond [(cond, "minor")] 42(String vs Int) are now caught at compile time instead of failing at Polars runtime. - Broader scalar ↔ Expr coercion —
unify_types,match_types, andare_types_compatiblenow accept all scalar types (Int, Float, String, Boolean, Decimal, Symbol) as compatible with Expr, not just Symbol. This enables mixed scalar/Expr lists like[(cond, col "x" * 2), (cond, 99)]to unify correctly. - Infix operators for DataFrame.Expr — Arithmetic (
+,-,*,/,%,^), comparison (==,!=,<,<=,>,>=), boolean (&&,||), negation (-), and logical not (not) now work directly onExprvalues. When either operand is anExpr, scalars are automatically coerced to literal expressions.&&and||automatically detect Expr operands and compile to Polars'.and()/.or()instead of short-circuit evaluation. This enables natural syntax likecol "price" * 1.1,col "age" >= 18 && col "active", andcol "x" + col "y"instead of requiring the pipe API. 71 tests covering arithmetic, comparison, boolean logic, chained operations, filters, edge cases, and backward compatibility. - SPSS and pandas column labels from Parquet —
DataFrame.readParquetandDataFrame.readParquetColumnsnow also read variable labels from thespss_metaParquet key written by pyreadstat (format:{"column_labels": {"col": "label"}}), and from pandas column metadata embedded in thepandasParquet key. 19 tests pass including a live NIS2NL.parquet test.
Changed
32 items- Two-mode
runpath resolution —run "file.kl"paths now follow an explicit two-mode convention. A path starting with./or../resolves relative to the calling file's directory (caller-relative). A bare path (no./prefix) resolves relative to the project root — the directory containingkeel.toml. Using a bare path in a file with nokeel.tomlancestor produces a newRunPathRequiresProjectRootcompile error. This change makesrunpaths unambiguous: bare paths are portable across the project, and./paths are anchored to the local file. All integration test fixtures updated to use./where appropriate. Matrixstdlib module — NewMatrix Floattype with full arithmetic (+,-,*,.*,./), decompositions (det,inv,solve,svd,qr,chol), reductions (trace,norm,rank), structural ops (transpose,shape,row,col,submatrix,hstack,vstack,map,fill,zeros,ones,identity,fromList,toList), and DataFrame interop (fromDataFrame,toDataFrame). Element-wise operators use.*and./tokens. Operations that can fail (e.g.inv,solve,chol) returnResult Matrix MatrixError.- Stdlib renames (breaking) — The following functions were renamed for API consistency:
List.skip→List.dropList.flatMap→List.andThenList.variance→List.varMaybe.orElse→Maybe.withDefaultMaybe.mapOr→Maybe.mapWithDefaultResult.orElse→Result.withDefaultResult.mapOr→Result.mapWithDefaultHttp.jsonBody→Http.responseJsonDecimal.signum→Decimal.signTable.sdOf→Table.stdOfDateTime.fromDate(3-int form) →DateTime.fromYmd; existingDateTime.fromDatenow takes aDatevalue and returnsDateTimedirectlyDateTime.parseRfc3339removed — useDateTime.fromYmd/DateTime.fromDateinsteadString.split,String.contains,String.startsWith,String.endsWithargument order flipped — The needle/delimiter is now the *first* argument and the string is the *second*.String.split "," "a,b,c"replaces the oldString.split "a,b,c" ",". This enables partial application:String.split ","is now a reusable tokeniser.List.tailnow returnsMaybe [a]— ReturnsNothingfor an empty list instead of[]. Callers must handleJust/Nothing.comparefunctions now returnOrdering—Date.compare,Time.compare,DateTime.compare, andDuration.comparepreviously returnedInt(-1,0,1). They now return theOrderingenum (Ordering::Lt,Ordering::Eq,Ordering::Gt). Usecase result of Ordering::Lt -> … | Ordering::Eq -> … | Ordering::Gt -> …with exhaustiveness checking.Duration.negatenow returnsResult Duration DurationError— Negation of the minimum representable duration would overflow; the function now returnsErr DurationError::Overflowin that case.Duration.abssimilarly returnsResult Duration DurationErrorfor API symmetry (though overflow is impossible in practice).- Typed error enums replace
Stringerrors —MatrixError,IOError,HttpError,DateTimeError,DurationErrorenums are now available as first-class types. Functions that previously returnedResult … Stringnow returnResult …. Usecaseon the error variant for exhaustive handling. @namereplaces:nameandcol "name"for DataFrame column references — All DataFrame API functions that previously acceptedStringorSymbolcolumn arguments now requireDataFrameColumn(@namesyntax). This is a breaking change for any code using:symbolor string column references in DataFrame expressions.- Newtype construction uses constructor syntax —
type UserId = Intvalues are now constructed asUserId 42(applying the type name as a function) and destructured via pattern matching (case id of UserId n -> n). The previously auto-generated.wrapand.unwrapaccessor functions are removed. Record newtypes useName { field = val }and tuple newtypes useName (a, b). This aligns newtypes withenumsingle-variant style and removes the inconsistency with the stdlib. Declaration::Enumgainsis_newtype: bool— The separateDeclaration::NewtypeAST variant is merged intoDeclaration::Enum.is_newtype = truesignalstype Name = Base(newtype syntax);is_newtype = falsesignalsenum Name = Variant | .... Consumers (keel-fmt, keel-lsp) use this flag instead of matching the removed variant.exposinglist no longer requires parentheses —import List exposing map, filterreplacesimport List exposing (map, filter).module Math exposing add, multiplyreplacesmodule Math exposing (add, multiply).exposing ..replacesexposing (..). The emptyexposing ()form remains valid.- Task
expecting/exposingclauses use braces —task expecting { x : Int } exposing { result : Int }replaces the old parenthesis syntaxtask expecting (x : Int) exposing (result : Int). Both clauses now accept any type expression (record literal, type alias, or parameterised type).Declaration::TaskAST changed:params: Vec<(String, Type)>+outputs: Vecreplaced byexpecting: Option+exposing: Option. runargument is any record expression — The argument torun "file.kl"is now parsed as a general expression rather than theImportFileVarsenum. Any expression that produces a record is accepted:run "f.kl" { x, y },run "f.kl" inputs(variable),run "f.kl" { ..base }(spread), orrun "f.kl"(no argument).Expr::RunFile.pass_vars: ImportFileVarsis replaced byarg: Option.> types::format_field_nameis nowpub— The function that renders record field names with symbol syntax (:Name,:"name with spaces") is now publicly accessible sokeel-fmtcan use it when formatting type aliases.- Stable Rust toolchain — Removed nightly compiler requirement. Moved to stable Rust (1.85+). Deleted vendored
polars-opsandethnumpatches (they were nightly-only workarounds). Dev shells updated accordingly. - Task syntax redesign —
Task.run/Task.definereplaced with keyword-based syntax. Declarations usetask expecting (...) exposing (...)instead ofTask.define (...) -> (...). Caller usesrun "file.kl" { x, y }with record-style braces instead ofTask.run "file.kl" (x, y). Three keywords added:task,run,expecting(plus existingexposing). - Task/module-first ordering —
taskdeclarations and file-levelmodule exposing (...)must now be the first declaration in the file, before any imports. Imports go inside the task/module scope. Previously, task declarations were allowed after imports. - DataFrame stdlib examples use symbol syntax — All FunctionDoc examples for column-name parameters updated to use idiomatic
:namesymbol syntax instead of"name"strings. - Module export quick-parser extracts enum variants —
quick_parse_module_exportsnow returnsModuleExportenum (withFunctionandEnumvariants) instead of(String, bool)tuples. Enum exports include full variant info (ModuleEnumVariant::Simple,Tuple,Record) extracted by scanning the token stream for type definitions matching exposed names. This enables downstream tools (LSP, compiler) to resolve enum constructors from user modules without full compilation.
Fixed
60 items- Pipe type errors now point at the failing function call, not the whole pipe expression — When a type mismatch occurred on the rhs of
|>(e.g. passing aResultwhere aDataFrameis expected), the error span covered the entire pipe chain. The span is now narrowed to the rhs function call (right.span) so the LSP highlights only the function that received the wrong type. - Function type mismatch error now says "a function" — When a non-function value was passed where a function was expected (e.g.
List.map list lambdawith arguments swapped), the error readexpected _ -> _, found [String]. It now readsexpected a function (_ -> _), found [String], making clear that a callable is required. The change is inType::display_as_expected(), used only inTypeMismatcherror formatting. - Internal type variables no longer leak into error messages — Type errors produced inside
TypeInferenceContext(viaapply_function_typeand other inference helpers) were drained into the compiler's error list without sanitization, causing raw names like$t1 -> $t2to appear in user-facing messages instead of_ -> _. The drain path now applies the samesanitize_errortransformation used byadd_type_error. Maybeequality (==/!=) now works correctly — Comparing twoMaybevalues with==or!=previously crashed at runtime with a crypticEnum(470, 581, Some(12)) == Enum(470, 581, Some(13))error showing raw VM-internal interned indices. Both structural equality (Just 5 == Just 5→true,Just 5 == Just 6→false,Nothing == Nothing→true) and cross-variant comparisons (Just x == Nothing→false) now work. ComparingNothing == Just x(whereNothinghas typeMaybe Unknown) was also rejected at compile time; the equality type check now usestypes_compatiblesoMaybe Unknownis accepted alongside anyMaybe T.DataFrame.readDta— crash on files with largesections — The.dtareader calledBufReader::fill_buf()once to peek at the next tag. After aread_exact_bytescall consumed bytes up to near a refill boundary,fill_buf()returned only 1–3 bytes (e.g. just<with no>), causing an immediate "tag boundary not found in buffer" error. Files with many columns and per-column characteristics entries (e.g. the SOEP panel study at 5,931 columns) reliably triggered this. Fixed by introducingPeekReader, aBufReadwrapper that reads byte-by-byte until>and pushes the consumed bytes back into a replay buffer so the existing parse branches see them normally.DataFrame.readJsonl/DataFrame.readJsonlColumns— automatic struct flattening — Both NDJSON readers now automatically expand nested JSON object columns into_-separated top-level columns immediately after reading, matchingJsonl.parseDataFramebehaviour. For example, a columnaddress: {city, zip}becomes@address_cityand@address_zip— directly accessible with@colsyntax without any manual unnesting.Listcolumns (arrays of objects) are left as-is.readJsonlColumnsadditionally accepts flattened sub-field names (e.g.@address_city) as selectors: the reader back-maps the name to the ancestor struct, loads only that struct, flattens it, and trims to the requested column — enabling memory-efficient sub-field selection from large files.DataFrame.readJsonl/DataFrame.readJsonlColumns— reduced peak memory on large files — Both NDJSON readers now use Polars' low-memory mode (low_memory(true)), which parses the file in smaller chunks rather than buffering the full input before building Arrow columns. This significantly reduces peak RSS when loading large JSONL files (e.g. a 2.5 GB file that previously OOM-killed the process).DataFrame.readJsonldocs warn about nested-column memory spikes — Added a note to the stdlib docs forreadJsonlexplaining that files with deeply nested columns (e.g. a column that is a list of structs with variable keys) can cause transient allocations many times the file size during parsing, and recommendingreadJsonlColumnsto skip heavy nested fields.DataFrame.readJsonlColumns— true schema-based column pruning — PreviouslyreadJsonlColumnscalledlf.collect()to read all columns and then selected the requested subset withdf.select(). On files with variable-schemaListcolumns (e.g.officers,addresses) this caused OOM even when the user selected only flat scalar columns — the heavy nested fields were fully parsed into Arrow arrays before being discarded. Alf.select().collect()intermediate fix did not help because Polars' NDJSON executor does not implement projection pushdown at the byte-reader level (file_options.with_columnsis alwaysNonefor NDJSON). The real fix: the reader now infers the full schema from the first 100 rows, builds a subsetSchemacontaining only the requested columns, and passes it viawith_schema(). This tellssimd-json/ the Arrow builder to ignore all fields not in the schema during deserialization — no Arrow buffers are ever allocated for the skipped columns. Measured reduction: 68 GB → 2 MB peak RSS on a 2.1 GB JSONL file with deeply nestedofficersandmetadatacolumns.- Env pane preview shows full error message for
Errstring args —format_value_previewpreviously dropped the entire error message when the formatted"{variant} {message}"exceeded the 50-char preview limit, rendering e.g."Err FileError ..."with no file path or error detail. String arguments to enum variants are now always shown in full (no length cap), so the preview shows the complete message such asFileError "No such file or directory (os error 2): /home/user/data/events.jsonl". Jsonl.parseDataFrameautomatically flattens nested struct columns — Nested JSON objects in JSONL input (e.g.{"address": {"city": "Berlin"}}) produced opaque PolarsStruct-typed columns that rendered as{}in the DataFrame preview pane and could not be sorted, filtered, or selected.parseDataFramenow recursively expands all struct columns into prefixed top-level columns (e.g.address_city,address_zip) before returning the DataFrame. Deep nesting (2+ levels) is handled by repeating the expansion until no struct columns remain.Instruction::Closurecaptures only referenced variables — Previously, every closure creation captured all ofself.variables(the entire module environment) into the closure's env. For programs that bind large values (e.g.let file_contents = Jsonl.readFile …) and then useList.mapover them, eachcall_closurecall cloned the full env — O(n × file_size) memory usage — causing OOM and process kill on even modest inputs. The fix scans the closure body forLoadVarReginstructions and captures only the variables whose names are referenced there.Instruction::Callfull application no longer clones entire module env — TheInstruction::Callhandler (direct bytecode calls, e.g. calling alet-bound lambda from keel code) seeded the callee's locals fromself.variables.clone()— copying the whole module env on every call. The fix starts locals fromenv.clone()(the closure's already-filtered captured env) and patches in only the body-referenced variables missing from env (forward references and recursive self-references).- Inline record parser now accepts leading-comma multiline style inside braces — A brace-delimited record written with leading commas (
{ a = 1\n , b = 2\n }) was silently dropping all fields after the first when the field values were followed by a newline before the comma (e.g. after a?operator). Theinline_recordseparator now skips any leading newlines before the comma, so this style works correctly in all contexts including newtype constructors. - Destructured
runbindings now carry the correct field type in the parser scope —let { x } = run "file.kl"previously registeredxwith the full expose-record type{ x: T }instead ofT. Each field variable now gets its own field type, so hover and inlay hints show the right type. - User module available in all sequential
runchild scopes — When two or morerunfiles each import the same user module (e.g.import Config exposing ProjectConfig), the second and later imports no longer fail withModule 'Config' not found. Previously the module's exports were only registered in the first child scope; when that scope was torn down the exports disappeared and later child scopes found nothing. The compiler now caches exports inFileContextafter first compilation and re-inserts them into each subsequent child scope. - Imported record-type constructor no longer triggers
BareModuleNameerror —import Mod exposing RecordTypefollowed byRecordType { field = val }previously failed with'RecordType' is a module name, not a value. The quick module scanner (scan_enum_definitions) only matchedToken::Enum, sotype Name = { ... }newtypes were invisible to it. They were registered with zero variants, causingis_newtype_enum()to return false and the parser to fall through to the wrong error. The scanner now also handlesToken::Type, dispatching to a newscan_record_fieldshelper for record bodies. - Record newtype constructor: brace allowed on next line — Parsing a record newtype constructor with the opening brace on the next line now works correctly. Previously the parser rejected
MyType\n{ field: value }with a parse error. Field access on the result (e.g.MyType { field: value }.field) is also fixed. DataFrame.readJsonl/readJsonlColumnsURL path peak memory reduced — When reading from a URL, the raw download buffer (Vec) was kept alive through the entireLazyJsonLineReader::collect()call, causing peak RAM usage of 3–5× the file size. The buffer is now dropped immediately afterstd::fs::writecompletes and beforecollect()is called, eliminating the overlap.- Runtime error values now show useful information in REPL output and env pane —
Result(Err(DataFrameError::FileError("...")))and similar runtime errors were displayed with no message in the REPL (text/plainMIME entry used an internalDisplayimpl) and only the variant name in the env pane (format_value_previewreturned a hard-coded"Err ..."). The REPL now usesto_keel_syntax_pretty()for thetext/plainentry (e.g.Err FileError "No such file or directory"), and the env pane now shows the variant name and error message for bothEnumarguments andResult(Err(_))inner values. --features statabuild —DataFrameMetadatawas missing from the import list insrc/stdlib/dataframe_dta/write.rs's test module, causing all--features statabuilds to fail with anE0433compile error. Added the missing import.list_typesno longer includes stdlib enums —Ordering,DurationError,DateTimeError,HttpError,MatrixError,IOError,JoinType, andDataFrameErrorare stdlib-registered enums that were incorrectly appearing in the user-defined type list exposed byinspect::list_types. They are now excluded. This fixes a crash in the TUI env pane when navigation keys were pressed on an empty user environment.- DataFrame stdlib doc examples —
readCsvColumns,readJsonColumns,readParquetColumns, andjoinexamples updated from old symbol syntax ([:col]) to column literal syntax ([@col]). - Windows cross-compilation —
build.rsnow skips the ReadStat C build whenCARGO_CFG_TARGET_OSiswindows.iconv.his unavailable in the standard mingw-w64 cross-compilation sysroot; Stata.dtafile support is not functional on Windows, so the build is safely skipped rather than failing. - Curried
fnwith fewer explicit params than its type signature accepted —fn f : Int -> Int -> Intwith bodyfn f x = |y: Int| x + ypreviously produced a spuriousTypeMismatch(expected: Int, found: Function(Int, Int)). The parser now correctly computes the remaining return type as the right-associative suffix of the type list, so a one-parameter body returning a lambda is fully valid. - Three-level nested lambda bodies now execute correctly —
fn f x = |y: Int| |z: Int| x + y + zcompiled successfully but failed at runtime withNotCallable("Uninitialized"). The jump-rebase pass (adjust_jump_targets) was collecting body ranges from allClosureinstructions in a function body, including transitively-nested ones whose ranges were already relative. Those stale relative values caused the jump-over for the outer lambda's body to be skipped, leaving it pointing atFunctionReturninstead of theClosureinstruction. Only direct-child Closures are now used for the skip guard. - Pipe operators now parse correctly inside list literals — A pipe expression like
[@col |> Expr.add 1]previously caused the|>to escape the brackets and bind to the outer context, producing a spuriousTypeMismatch. Pipe chains are now correctly confined within list element positions. - Multiple consecutive blank lines in block bodies no longer cause parse failures — Two or more consecutive
Newlinetokens insidelet/fn/module/taskblocks previously triggered an unexpected token error. The block parser now tolerates runs of blank lines between statements. DataFrame.meltsuffix separator strip — the integer suffix parser now strips the separator prefix before parsing, someltno longer fails when separator is a non-empty string.DataFrame.meltafterDataFrame.renameno longer fails at runtime — polarsDataFrame::renameupdated series names in-place but left the internalArrowSchemafield names stale.unpivot2resolved columns through the schema and failed with a spurious column-not-found error. The fix rebuilds the DataFrame from its series before unpivoting.- Multiline pipe alignment check no longer bleeds across top-level statements — A pipe chain at one indentation level in a
letbinding incorrectly caused aPipeIndentationMismatcherror for a shallower pipe chain in the next top-level statement. The pipe indent state is now reset after each top-level node is parsed. - Type checker now rejects missing
?onResult-returning I/O functions — When a function was declared to returnDataFrame Tbut its body returnedResult DataFrame DataFrameError(because?was omitted on areadParquet,readCsv, orreadJsoncall), the type checker silently accepted the mismatch. Two gaps were closed: the body-type snapshot now syncs the string interner before re-inferring the body type (fixing top-level functions), and the module function compilation path now performs the same body-type check (fixing functions insidemodule exposingblocks). The LSP now reports the error in both cases. a |> (expr)?desugars correctly — Pipe expressions where the right-hand side is a parenthesized?-application (e.g.data |> (DataFrame.readParquet path)?) now compile correctly. Previously the nested closure jump-target was adjusted twice, causing a VM jump offset error. The desugar pass now only adjusts the inner closure's jump target once.List.sum/List.productnow accept[Decimal]— Previously only[Int]and[Float]lists were accepted; passing aDecimallist produced a type error.DataFrame.Expr.colsignature changed fromString -> ExprtoDataFrameColumn -> Expr—col "name"now produces a compile-time type error. Use@namedirectly orcol @namefor the explicit wrapper form.- ValueLabel enum variant uninitialized when reimported in a task scope — When a main file imported only a subset of a module's enums (e.g.
import Labels exposing Origin), the remaining enums were removed from scope viaremove_enum. A task child scope that reimported the same module hit the already-compiled fast path and theExpose::Typebranch did nothing, leavingenum_variant_typesabsent.compile_enum_constructorthen found no variant types and emitted an uninitialized register; at runtimeextract_value_labelthrewTypeMismatch. Fixed by persisting all exported enum definitions in a newmodule_enum_registryonFileContextat module-compile time, and re-registering from the registry wheneverExpose::Typeruns. - Open-record newtype respects
SchemaKind::Openin DataFrame validation —type T = { :col : X, .. }used as a DataFrame schema annotation (DataFrame T) now correctly allows extra file columns. Previously the schema validator always usedSchemaKind::Closedfor named newtype schemas, producing spurious "unexpected column" errors for every column not explicitly listed inT. - Open record flag preserved in
EnumVariantType::Record—EnumVariantType::Recordnow carries a booleanis_openfield. PreviouslyType::RecordandType::OpenRecordwere both collapsed into the same tuple, silently dropping the open/closed distinction. All match sites destructure the new field; the two sites that reconstruct aTypefrom a newtype constructor now emitType::OpenRecordwhenis_openis true. - Record type aliases in file-level module exposing —
type Name = { ... }declarations inside file-level module bodies are now correctly tracked intype_names_declared. Previously they fell through to the error path, causing spuriousModuleInvalidExposedTypeandModuleInvalidExposedFunctionerrors even when the type and function were correctly defined. - Div type inference — Integer division (
/and//) now correctly infersFloatreturn type when both operands are Int, matching runtime behavior. - Boolean operator type safety —
&&and||now correctly requireBooloperands at the type level; passing non-Bool values produces a compile-timeTypeMismatcherror. - TypeMismatch span narrowing for record field access — When a function call argument comes from a record field access (
record.field), the type-mismatch error span now points to the field access expression rather than the entire argument expression. - User-module functions exposed via
import M exposing (fn)now have the correct type — When a function was imported withexposing (fn_name), its type was incorrectly reconstructed asFunction(arg, Function(arg, ret))(one extra curry layer) instead ofFunction(arg, ret). This caused a spuriousInlineFileTypeMismatchwhen arunfile received the return value of such a function as a task parameter. Fixed by using the full function type stored bylookup_fndirectly rather than wrapping it again. - Blank lines between block statements no longer cause "variable not found" — A blank line between block statements produces two consecutive
Newlinetokens. The block body loop used.or_not()to consume the trailing newline after each statement, so the secondNewlinewas left in the stream, causing the next iteration to fail and the block to terminate early. Fixed by changing.or_not()to.repeated()in bothparse_expr_blockandparse_expr_block_optional_dedent. letbindings afterimportwith multiline|>chains now resolve correctly — When a function body usedimport Module, then bound aletvia a multiline|>continuation usingModule.*functions, the variable was not in scope for subsequent statements.parse_expr_blockwas speculatively tried as an argument to the function call; itsblock_startpushed toindent_stackand created a new scope beforeblock_bodyfailed, but chumsky only rewinds the input position on backtrack — not mutable state. The orphaned indent and spurious scope caused the variable declaration to land in the wrong scope. Fixed by guardingblock_bodywith.or_not()and explicitly undoingblock_start's side effects when the body is absent, in bothparse_expr_blockandparse_expr_block_optional_dedent.letbindings in function bodies with indented|>continuations now resolve correctly — When aletbinding's RHS spanned multiple lines via a continuation pipe (|>indented on the next line), the orphanDedenttoken left bymultiline_opappeared between block statements, causing the block body loop to terminate early. Variables bound before the pipe were no longer in scope for subsequent statements. Fixed by draining orphan dedents between statements in bothparse_expr_blockandparse_expr_block_optional_dedent.importinside file-level module bodies is now supported — File-level module declarations (a baremodule exposing (...)header with the rest of the file as the body) failed to parseimportstatements that appeared inside the module. The module body parser'sparse_declarationschoice did not includedeclaration_import, so any name defined after animportwas invisible to the module's exposing list, producing spuriousModuleInvalidExposedType/ModuleInvalidExposedFunctionerrors. Fixed by addingdeclaration_importto the choices inparse_declarations. Also fixed a spuriousModuleFileNotAtToperror caused by readingat_file_starttoo late (after nested parsers had reset it).OpenRecordfield access no longer produces a spurious type error —infer_record_access_typenow handlesOpenRecordcorrectly: declared fields are looked up and returned, and undeclared fields returnUnknownwithout emitting aTypeMismatcherror.- Multiline type annotations in
letbindings parse correctly — Theindented_recordparser now makes the trailingDedenttoken optional. When a record type annotation is followed by= rhson the same line as the closing}(e.g.let data :\n { id: Float\n } = rhs), noDedentis emitted by the lexer, so requiring one caused a parse failure. Making it optional covers both cases: type aliases (where}is on its own line before aDedent) andletbindings (where}is followed inline by= rhs). inspect::list_variablesuses register-based lookup — Variable values are now fetched viavm.registers().get(register.val())instead of the removedvm.variables()map. Uninitialized registers (lambdas compiled but not yet called) are skipped so they do not appear as live REPL bindings.- Multiline record types in type annotations — Record types (e.g.
{ a: Int, b: Int }) spanning multiple lines now parse correctly inside type annotations, including inside taskexpecting/exposingclauses,Maybe,DataFrame, and other parameterized types. Both Elm-style leading-comma layout and trailing-comma layout are supported. Previously, indented records triggered "found indent" parse errors. - Compile-time validation of custom types in task signatures — Custom types used in
task expecting/task exposingclauses (e.g.task expecting (c : Color)) are now validated at compile time. If a type name is undefined, the compiler emits a clearUndeclaredTypeerror with fuzzy-match suggestions drawn from visible enums and type aliases. Built-in types (Int,Bool,String,Float, etc.) are always accepted. The check recurses intoList,Tuple,DataFrame,Record, andOpenRecordtype arguments. - Multiline task params with paren on keyword line —
task expecting (\n data : Int\n)now parses correctly. The parameter and exposing list parsers now handleIndent/Dedenttokens inside parentheses, not justNewline. Previously this layout produced "found indent 4, expecting something else". - Task expose type validation — The compiler now validates task
exposingvariables: if a declared output is never assigned in the body, aTaskExposeNotBounderror is reported with a hint to add aletbinding. If the assigned type doesn't match the declared type, aTaskExposeTypeMismatcherror is reported. Previously, unbound or mistyped task outputs were silently ignored. This validation now also runs when a task file is compiled standalone (not viarun), so the LSP can reportTaskExposeNotBounderrors when editing a task file directly. Additionally, DataFrame schema mismatches with enum/contract columns are now fully checked viarun "...": a declared column that is missing from the actual result (e.g.:id : Intdeclared butDataFrame.selectdropped it) now correctly reportsTaskExposeTypeMismatchinstead of being silently bypassed by the contract-type shortcut. Note: whenDataFrame.applyExprsis in the pipeline, type inference loses column information (by design), so column-presence errors may only be caught at runtime in those cases. - Task parameter type checking uses
are_types_compatible— Type checking forrunfile parameters now usesare_types_compatibleinstead of direct!=comparison, correctly handling compatible types (e.g. Symbol/String coercion). - Enums not visible inside function bodies — Enum types imported via
exposingor defined at file scope were invisible insidefnandtaskdeclaration bodies because the parser's function-scope boundary only allowedSymbol::Functionto pass through from parent scopes. NowSymbol::Enumis also allowed, matching the compiler's behavior. This fixes "Enum X not found" errors when using imported enums inside task or function bodies. taskdeclaration params not in scope —taskdeclaration parameters (nowtask expecting (...)) are registered in both the parser scope (asSymbol::Variable) and the compiler scope (viainsert_var). Previously, opening a task file directly in the editor caused "variable not found" errors for declared parameters because only the caller path registered them.- Stack overflow on nested if-else — The
control_flow_deeply_nested_iftest now runs with an 8 MB thread stack to accommodate large debug-mode stack frames from recursivecompile_expr/infer_typecalls. - Project-aware module resolution —
find_project_module_rootwalks up tokeel.tomlto determine the source root (from themainfield), so files in subdirectories (e.g.src/variables/age.kl) can import user modules fromsrc/modules/. Previously the module root was the parent directory of each file, which broke imports from non-sibling directories. - User module enums in
runtask files — Task files loaded viarunthat import user modules (e.g.import Labels exposing (Cohort)) now parse correctly. Previouslycompile_run_fileusedparse_file_lenientwith a blank parser state, so enum constructors likeCohort::Youngfailed with "Enum not found". Addedparse_file_lenient_with_stateand pre-register user modules before parsing task files.
Removed
8 itemsDataFrame.readDta_old,DataFrame.readDtaColumns_old,DataFrame.writeDta_old— The three ReadStat C FFI-backed benchmark functions are removed. The pure-RustreadDta,readDtaColumns, andwriteDtaare the maintained implementations. Downstream: thestataCargo feature, thevendor/ReadStat/C source tree (685 KB, 216 files),flate2,libz-sys, andccbuild dependencies are all gone. The pure-Rust DTA functions are now compiled unconditionally on all platforms (no longer gated behind--features stata).:namesymbol syntax in DataFrame column contexts — Symbol literals (:name) are no longer accepted where aDataFrameColumnorExprcolumn reference is expected. The Symbol→Expr auto-coercion rule in the type checker has been removed, as have allExpr::Symbolbranches inextract_string_list,column,rename,setValueLabels,setVarLabel,withLag,withRolling, and column validation. Use@namefor all DataFrame column references. Symbol literals remain valid for non-DataFrame uses (equality tests, symbol lists, record type annotations). This is a breaking change for any code using:symbolas a column reference.type alias— Transparent type aliases are removed. Use opaquetype Name = Basefor nominal safety, or inline the type directly.Name.wrap/Name.unwrap— The auto-generated newtype accessor functions are removed. Use constructor syntax (Name val) and pattern matching (case x of Name n -> n) instead.mutin task outputs — Mutable propagation from task outputs removed. All task outputs are now immutable.Taskstdlib module — TheTaskmodule is removed.runandtaskare now language keywords handled by the parser/compiler directly.DataFrame.filter(closure-based) — Removed the closure-basedDataFrame.filter (|r| ...)function and the entireexpr_compilermodule (~1,862 lines) that compiled closures to Polars expressions. UseDataFrame.filterwith Expr syntax instead:df |> DataFrame.filter (:age > 18).- Legacy named filter functions — Removed
filterEq,filterNeq,filterGt,filterGte,filterLt,filterLte,filterIn. UseDataFrame.filterwith Expr syntax instead (e.g.,DataFrame.filter (:col == val),DataFrame.filter (:col |> Expr.in [vals])).
Showing page 1 of 5 (5 versions)