Introduction
This document provides an overview of the Elara compiler, its features, and its architecture. It is intended for both users of the compiler and developers interested in understanding its inner workings.
Haddock Documentation
Haddock documentation can be found on Github Pages which can provide a useful supplement to this documentation.
Syntax
Elara’s syntax is primarily inspired by Haskell and F#. It aims to be lightweight, concise, and flexible. It is layout-sensitive, meaning that indentation is significant and used to denote code blocks.
This section covers the various syntactic constructs in Elara:
Lexical Structure
Rules for identifiers, literals, keywords, and the layout system.
Comments
How to write single-line and multi-line comments.
Annotations
Attaching metadata to constructs, such as specifying operator precedence.
Lexical Structure
Identifiers
Identifiers in Elara have different semantic meaning based on the capitalisation of the first letter:
- Lowercase Identifiers identify terms, i.e. variables and function names. They must start with a lowercase letter (a-z) or an underscore (_) and can be followed by any combination of letters, digits (0-9), and underscores (_). Examples:
myVariable,compute_sum,_temp123 - Uppercase Identifiers identify types and constructors. They must start with an uppercase letter (A-Z) and can be followed by any combination of letters, digits (0-9), and underscores (_). Examples:
MyType,Option,TreeNode
Literals
Elara supports the following literal types:
- Integer Literals: A sequence of digits representing whole numbers. Examples:
0,42,-123456 - Floating-Point Literals: A sequence of digits with a decimal point representing real numbers. Examples:
3.14,0.001,-2.5 - String Literals: A sequence of characters enclosed in double quotes. Examples:
"Hello, World!","Elara is great!" - Character Literals: A single character enclosed in single quotes. Examples:
'a','Z','\n'
Keywords
The following are reserved keywords in Elara and cannot be used as identifiers:
defletintypeifthenelsematchwithmoduleimportclassalias
Layout
Elara’s syntax uses a layout system, where indentation is used to denote code blocks. This is similar to languages like Haskell, F#, and Python.
The Indentation Rule
The core layout rule is simple: Code that is part of a block must be indented further than the line that started that block.
When you use a keyword that introduces a new block (let, match, if, etc.), the compiler looks at the indentation of the following token to determine the expected indentation level for that block.
Starting a Block
Blocks are typically started by a newline followed by an increase in indentation after a Layout Trigger.
The following tokens trigger a layout check:
=(Equality / Assignment)->(Function arrows / Match cases)with(The body of amatchexpression)then(The body of anifexpression)else(The second body of anifexpression)
Example:
let pythagoras a b =
let a2 = a * a
let b2 = b * b
sqrt (a2 + b2)
In this example, the = after let pythagoras a b triggers a layout expectation. The next line is indented, so a new block starts there.
Lines that are indented to the same level as the new block are considered part of that block, and treated as separate statements, i.e. as if they were separated by semicolons.
-- Treated as 3 separate statements
let a = 1
let b = 2
let c = 3
Line Continuation
To continue a single expression across multiple lines, make sure the subsequent lines are indented further than the initial line.
let sum =
1 + 2 +
3 + 4 -- Indented further, so it continues the expression
If you wrote it like this instead:
let sum =
1 + 2 +
3 + 4 -- Same indentation, so treated as separate statements
The compiler would treat 3 + 4 as a separate statement, leading to an error.
Inline Definitions
The layout rule can be ignored if the entire block is defined on the same line. If no newline is present after a layout trigger, no block is created and so the expression continues to the end of the line.
This allows for concise inline definitions:
let add x y = x + y -- No newline after '=', so no block is created
Delimiters
Explicit delimiters (parentheses (), braces {}, and brackets []) can be used to group expressions and override the layout rules.
You can freely indent inside delimiters without affecting the layout, as long as a layout trigger is not encountered.
If you start a layout block inside a pair of delimiters, the block is automatically closed when the closing delimiter is reached.
let x = (
let y = 10
y * 2
)
This code is valid because the block started by let y = 10 is closed by the closing parenthesis.
Explicit Layout
While not typically recommended, you can disable the layout system entirely by using explicit braces {} and semicolons ; to denote blocks and separate statements.
When using explicit layout, the indentation rules are ignored.
let pythagoras a b = {
let a2 = a * a;
let b2 = b * b;
sqrt (a2 + b2);
}
This code is valid despite the inconsistent indentation because the braces and semicolons explicitly define the block structure.
Using explicit layout disables layout entirely, including for child blocks. For example, the following code is valid:
let main =
let x = 10
let y = { let i = x * 2; i + 1; } -- Explicit block for y
print y
but the following code is not:
let main = {
let x = 10;
let f n =
sqrt n -- Error: Cannot use indentation inside explicit block!
f x;
}
Annotations
Annotations provide a way to attach metadata to various syntactic constructs in Elara.
Syntax
Annotations are always specified using the # symbol followed by an annotation name and optional parameters in parentheses. The eventual goal is that an annotation can be applied to any syntactic construct, however this is a work in progress.
Operator Information
When declaring custom operators, eg
def (++) : String -> String -> String
let (++) = ...
we use annotations to specify the operator’s metadata:
#LeftAssociative
#Fixity 6
def (++) : String -> String -> String
Annotation Arguments
Annotations can also take arguments. For example, the Fixity annotation above takes a single integer argument specifying the operator’s precedence level.
The syntax for expression arguments is a subset of Elara where every expression must evaluate to a constant value at compile time. Specifically, this permits:
- Literal values (integers, strings, booleans, etc.)
- Constructor application where all arguments are constant values
- Tuples and lists of constant values
Defining Annotations
Currently, annotations are identical to data types. That is to say, every data type which only accepts constant values can be used as an annotation. For example, we can define an annotation to specify that a function should be memoised:
type Memoise = Memoise
The aforementioned LeftAssociative annotation is simply defined as:
type Associativity =
LeftAssociative
| RightAssociative
| NonAssociative
TODO: I think we can extend this system in a lot of ways : for example, compile time metaprogramming, allowing annotations restrict where certain constructs can be used, aspect oriented programming, etc.
could we make annotations first class / functions? eg annotation Memoise : (a -> b) -> (a -> b)
Comments
Single-Line Comments
Single line comments are initiated using two dashes (--). Everything following the -- on that line is considered part of the comment.
-- This is a single line comment
let x = 42 -- This comment is after code
Multi-Line Comments
Multi-line comments are enclosed between /- and -/. They can span multiple lines and can also be nested.
/- This is a
multi-line comment
which spans several lines -/
/- This is a multi-line comment
/- which contains a nested comment -/
and continues here -/
let y = 100
Module System Redesign
The Elara module system is designed to be simple, predictable, and flexible, drawing inspiration from Haskell and Rust.
Core Principles
- File System as Truth: The identity of a module is primarily determined by its location in the file system relative to defined source roots.
- No Scans: The compiler does not scan all files to discover modules. It resolves modules on-demand based on their import paths.
- Optional Declarations: The
module Namedeclaration in the file is optional. If omitted, the module name is inferred from the file path. If present, it must match the inferred name (checked by the compiler).
Source Roots
A project defines a set of source roots (e.g., src, lib, test, stdlib).
When resolving a module name like Data.List, the compiler looks for corresponding files in these roots in order.
Module Resolution
To resolve a module named A.B.C, the compiler searches the source roots for the following files (in order of preference):
- Nested:
root/A/B/C.elr - Rust-style:
root/A/B/C/mod.elr - Flat:
root/A.B.C.elr
This hybrid approach allows for both organized nested structures and flat directory layouts where preferred.
Examples
Given source root src/:
-
import Data.Listlooks for:src/Data/List.elrsrc/Data/List/mod.elrsrc/Data.List.elr
-
import Mainlooks for:src/Main.elrsrc/Main/mod.elrsrc/Main.elr(Flat check, same as nested for top-level)
The mod.elr File
Inspired by Rust’s mod.rs, a mod.elr file represents the directory it resides in.
For example, src/Data/mod.elr defines the module Data. This allows defining a module that also serves as a namespace for submodules (e.g., src/Data/List.elr).
Module Header
The module header is now optional.
-- src/Math/Utils.elr
-- Optional:
-- module Math.Utils
let add x y = x + y
If the module declaration is provided, the compiler will verify that it matches the expected name derived from the file path. A mismatch results in a compile error.
Refactoring
Renaming a module is as simple as renaming the file or directory. Since the module declaration inside the file is optional, you often don’t need to touch the file content at all.
Principal Type Import
When importing a module M qualified (e.g. import M qualified), if the module exports a type with the same name as the module (i.e. type M = ...), that type is imported unqualified. All other members must be accessed with the qualifier (e.g. M.foo).
This is particularly useful for types like Result or Option where the module name matches the main type name.
-- In Result.elr
module Result
type Result e a = Ok a | Err e
let map f r = ...
-- In Main.elr
import Result qualified
let x : Result String Int = Result.Ok 1 -- Result is unqualified, Ok is qualified
let y = Result.map (\x -> x + 1) x
CLI
Elara has a command line interface (CLI) to build and run Elara programs. The CLI is invoked using the elara command followed by various options and flags.
Subcommands
buildCompiles the Elara source code.
runCompiles and runs the Elara source code
By default, this runs in interpreted mode. You can use --target jvm to compile to JVM bytecode instead.
For example, elara run source.elr --target jvm
Dumping
The compiler can intermediate representations of the code during compilation for debugging and development purposes.
You can enable this using the --dump flag followed by a comma-separated list of targets to dump. The available dump targets are:
| Dump Target | Description |
|---|---|
lexed | Dumps a list of tokens after lexing |
parsed | Dumps the Frontend AST after parsing |
desugared | Dumps the Desugared AST after desugaring |
renamed | Dumps the Renamed AST after renaming |
shunted | Dumps the Shunted AST after shunting |
typed | Dumps the Typed AST after type checking |
core | Dumps all stages of the Core language |
ir | Dumps the JVM bytecode IR representation (only works when running with --target jvm) |
jvm | Dumps the generated JVM bytecode representation from H2JVM (only works when running with --target jvm) |
All dumps are written to the build/ directory in the current working directory.
Compiler Setup
This section provides instructions on how to set up the Elara compiler in your development environment.
The recommended workflow to build is with Nix, as this will ensure you have the correct versions of all dependencies and tools. If you don’t have / want Nix, you should be able to get away with a manually installed GHC 9.12.2 and Cabal
Building with Nix
- Run
nix buildto build - You should be able to access Elara the executable from
./result/bin/elara
Hacking with Nix
- Run
nix developto enter a shell with all dependencies - Use
just runto run in development mode (with an interpreter) - To run unit tests, run
just test
Building without Nix
- Run
cabal buildto build
Running without Nix
- Run
cabal runto run - Run
cabal testto run unit tests
Compiler Architecture
This document provides a high-level overview of the architecture of the Elara compiler. It is intended for those who wish to understand the internal workings of the compiler, contribute to its development, or extend its functionality.
Design Principles
Query-Based: The compiler is designed around the Rock query system between compilation stages, enabling memoisation and incremental compilation (not implemented yet). Each compilation stage is implemented as a query or group of queries that can fetch results from other stages.
Effectful: The the Effectful library is used to manage side effects in a structured way. For more information on how effects are used in the compiler, see the Effects section.
Trees that Grow: The compiler adopts the Trees that Grow pattern to avoid unnecessary AST duplication between compilation stages. Every Stage until ToCore uses the same AST type with different extensions to represent the information available at that stage.
Effects
Queries
The compiler uses a query-based architecture to manage the computational dependencies between different stages of the compilation process.
This is implemented using the Rock query system, and practically functions similarly to a Makefile.
The complete list of queries can be found in the Query.hs file. Every query is a constructor of the Query GADT.
Primitives
Elara has a small set of built-in primitive types. These have no definition in the language and are hardcoded into the compiler. The primitive types are:
Int_Prim: Represents integer numbers.Float_Prim: Represents floating-point numbers.Char_Prim: Represents single characters.IO_Prim: Represents input/output operations.String_Prim: Represents sequences of characters.(): The unit type
Each primitive type has a public facing alias defined in the Elara.Prim module, eg type Int = Int_Prim.
The compiler will always desugar relevant values (e.g. unit literals) to the primitive type.
For example, integer literals are treated as being of type Int_Prim.
This system should probably be redesigned in the future - I like that primitives have normal definitions but we are currently very inconsistent internally about how they’re handled which causes a lot of spaghetti.
Stages
The compiler is divided into a series of stages, each responsible for a specific part of the compilation process. Each stage (usually) takes input from the previous stage and produces output for the next stage. The inputs are fetched using the Query System
The stages are as follows:
- Lexing: The source code is tokenized into a stream of tokens. This stage is implemented in Lexing.
Lexing
The Lexing stage is the first major stage of the compiler. It takes raw source code as input and produces a stream of tokens as output.