Introduction

This document provides an overview of the Elara compiler, its features, and its architecture. It is intended for both users of the compiler and developers interested in understanding its inner workings.

Haddock Documentation

Haddock documentation can be found on Github Pages which can provide a useful supplement to this documentation.

Syntax

Elara’s syntax is primarily inspired by Haskell and F#. It aims to be lightweight, concise, and flexible. It is layout-sensitive, meaning that indentation is significant and used to denote code blocks.

This section covers the various syntactic constructs in Elara:

Lexical Structure

Rules for identifiers, literals, keywords, and the layout system.

Comments

How to write single-line and multi-line comments.

Annotations

Attaching metadata to constructs, such as specifying operator precedence.

Lexical Structure

Identifiers

Identifiers in Elara have different semantic meaning based on the capitalisation of the first letter:

Lowercase Identifiers identify terms, i.e. variables and function names. They must start with a lowercase letter (a-z) or an underscore (_) and can be followed by any combination of letters, digits (0-9), and underscores (_). Examples: myVariable, compute_sum, _temp123
Uppercase Identifiers identify types and constructors. They must start with an uppercase letter (A-Z) and can be followed by any combination of letters, digits (0-9), and underscores (_). Examples: MyType, Option, TreeNode

Literals

Elara supports the following literal types:

Integer Literals: A sequence of digits representing whole numbers. Examples: 0, 42, -123456
Floating-Point Literals: A sequence of digits with a decimal point representing real numbers. Examples: 3.14, 0.001, -2.5
String Literals: A sequence of characters enclosed in double quotes. Examples: "Hello, World!", "Elara is great!"
Character Literals: A single character enclosed in single quotes. Examples: 'a', 'Z', '\n'

Keywords

The following are reserved keywords in Elara and cannot be used as identifiers:

def
let
in
type
if
then
else
match
with
module
import
class
alias

Layout

Elara’s syntax uses a layout system, where indentation is used to denote code blocks. This is similar to languages like Haskell, F#, and Python.

The Indentation Rule

The core layout rule is simple: Code that is part of a block must be indented further than the line that started that block.

When you use a keyword that introduces a new block (let, match, if, etc.), the compiler looks at the indentation of the following token to determine the expected indentation level for that block.

Starting a Block

Blocks are typically started by a newline followed by an increase in indentation after a Layout Trigger.

The following tokens trigger a layout check:

= (Equality / Assignment)
-> (Function arrows / Match cases)
with (The body of a match expression)
then (The body of an if expression)
else (The second body of an if expression)

Example:

let pythagoras a b =
    let a2 = a * a
    let b2 = b * b
    sqrt (a2 + b2)

In this example, the = after let pythagoras a b triggers a layout expectation. The next line is indented, so a new block starts there.

Lines that are indented to the same level as the new block are considered part of that block, and treated as separate statements, i.e. as if they were separated by semicolons.

-- Treated as 3 separate statements
let a = 1
let b = 2
let c = 3

Line Continuation

To continue a single expression across multiple lines, make sure the subsequent lines are indented further than the initial line.

let sum =
    1 + 2 +
      3 + 4 -- Indented further, so it continues the expression

If you wrote it like this instead:

let sum =
    1 + 2 +
    3 + 4 -- Same indentation, so treated as separate statements

The compiler would treat 3 + 4 as a separate statement, leading to an error.

Inline Definitions

The layout rule can be ignored if the entire block is defined on the same line. If no newline is present after a layout trigger, no block is created and so the expression continues to the end of the line.

This allows for concise inline definitions:

let add x y = x + y  -- No newline after '=', so no block is created

Delimiters

Explicit delimiters (parentheses (), braces {}, and brackets []) can be used to group expressions and override the layout rules.

You can freely indent inside delimiters without affecting the layout, as long as a layout trigger is not encountered.

If you start a layout block inside a pair of delimiters, the block is automatically closed when the closing delimiter is reached.

let x = (
    let y = 10
    y * 2
)

This code is valid because the block started by let y = 10 is closed by the closing parenthesis.

Explicit Layout

While not typically recommended, you can disable the layout system entirely by using explicit braces {} and semicolons ; to denote blocks and separate statements.

When using explicit layout, the indentation rules are ignored.

let pythagoras a b = {
    let a2 = a * a;
 let b2 = b * b;
    sqrt (a2 + b2);
}

This code is valid despite the inconsistent indentation because the braces and semicolons explicitly define the block structure.

Using explicit layout disables layout entirely, including for child blocks. For example, the following code is valid:

let main =
    let x = 10
    let y = { let i = x * 2; i + 1; } -- Explicit block for y
    print y

but the following code is not:

let main = {
    let x = 10;
    let f n =
        sqrt n  -- Error: Cannot use indentation inside explicit block!
    f x;
}

Annotations

Annotations provide a way to attach metadata to various syntactic constructs in Elara.

Syntax

Annotations are always specified using the # symbol followed by an annotation name and optional parameters in parentheses. The eventual goal is that an annotation can be applied to any syntactic construct, however this is a work in progress.

Operator Information

When declaring custom operators, eg

def (++) : String -> String -> String
let (++) = ...

we use annotations to specify the operator’s metadata:

#LeftAssociative
#Fixity 6
def (++) : String -> String -> String

Annotation Arguments

Annotations can also take arguments. For example, the Fixity annotation above takes a single integer argument specifying the operator’s precedence level.

The syntax for expression arguments is a subset of Elara where every expression must evaluate to a constant value at compile time. Specifically, this permits:

Literal values (integers, strings, booleans, etc.)
Constructor application where all arguments are constant values
Tuples and lists of constant values

Defining Annotations

Currently, annotations are identical to data types. That is to say, every data type which only accepts constant values can be used as an annotation. For example, we can define an annotation to specify that a function should be memoised:

type Memoise = Memoise

The aforementioned LeftAssociative annotation is simply defined as:

type Associativity =
    LeftAssociative
    | RightAssociative
    | NonAssociative

TODO: I think we can extend this system in a lot of ways : for example, compile time metaprogramming, allowing annotations restrict where certain constructs can be used, aspect oriented programming, etc. could we make annotations first class / functions? eg annotation Memoise : (a -> b) -> (a -> b)

Comments

Single-Line Comments

Single line comments are initiated using two dashes (--). Everything following the -- on that line is considered part of the comment.

-- This is a single line comment
let x = 42  -- This comment is after code

Multi-Line Comments

Multi-line comments are enclosed between /- and -/. They can span multiple lines and can also be nested.

/- This is a
   multi-line comment
   which spans several lines -/
/- This is a multi-line comment
   /- which contains a nested comment -/
   and continues here -/
let y = 100

Module System Redesign

The Elara module system is designed to be simple, predictable, and flexible, drawing inspiration from Haskell and Rust.

Core Principles

File System as Truth: The identity of a module is primarily determined by its location in the file system relative to defined source roots.
No Scans: The compiler does not scan all files to discover modules. It resolves modules on-demand based on their import paths.
Optional Declarations: The module Name declaration in the file is optional. If omitted, the module name is inferred from the file path. If present, it must match the inferred name (checked by the compiler).

Source Roots

A project defines a set of source roots (e.g., src, lib, test, stdlib). When resolving a module name like Data.List, the compiler looks for corresponding files in these roots in order.

Module Resolution

To resolve a module named A.B.C, the compiler searches the source roots for the following files (in order of preference):

Nested: root/A/B/C.elr
Rust-style: root/A/B/C/mod.elr
Flat: root/A.B.C.elr

This hybrid approach allows for both organized nested structures and flat directory layouts where preferred.

Examples

Given source root src/:

import Data.List looks for:
- src/Data/List.elr
- src/Data/List/mod.elr
- src/Data.List.elr
import Main looks for:
- src/Main.elr
- src/Main/mod.elr
- src/Main.elr (Flat check, same as nested for top-level)

The `mod.elr` File

Inspired by Rust’s mod.rs, a mod.elr file represents the directory it resides in. For example, src/Data/mod.elr defines the module Data. This allows defining a module that also serves as a namespace for submodules (e.g., src/Data/List.elr).

Module Header

The module header is now optional.

-- src/Math/Utils.elr

-- Optional:
-- module Math.Utils 

let add x y = x + y

If the module declaration is provided, the compiler will verify that it matches the expected name derived from the file path. A mismatch results in a compile error.

Refactoring

Renaming a module is as simple as renaming the file or directory. Since the module declaration inside the file is optional, you often don’t need to touch the file content at all.

Principal Type Import

When importing a module M qualified (e.g. import M qualified), if the module exports a type with the same name as the module (i.e. type M = ...), that type is imported unqualified. All other members must be accessed with the qualifier (e.g. M.foo).

This is particularly useful for types like Result or Option where the module name matches the main type name.

-- In Result.elr
module Result
type Result e a = Ok a | Err e
let map f r = ...

-- In Main.elr
import Result qualified

let x : Result String Int = Result.Ok 1 -- Result is unqualified, Ok is qualified
let y = Result.map (\x -> x + 1) x

CLI

Elara has a command line interface (CLI) to build and run Elara programs. The CLI is invoked using the elara command followed by various options and flags.

With Cabal When running with Cabal in Development Mode, you should instead use cabal run elara -- [options] to interact with the CLI.

Subcommands

build Compiles the Elara source code.

This command is currently useless and will do nothing.

run Compiles and runs the Elara source code

By default, this runs in interpreted mode. You can use --target jvm to compile to JVM bytecode instead. For example, elara run source.elr --target jvm

Dumping

The compiler can intermediate representations of the code during compilation for debugging and development purposes.

You can enable this using the --dump flag followed by a comma-separated list of targets to dump. The available dump targets are:

Dump Target	Description
`lexed`	Dumps a list of tokens after lexing
`parsed`	Dumps the Frontend AST after parsing
`desugared`	Dumps the Desugared AST after desugaring
`renamed`	Dumps the Renamed AST after renaming
`shunted`	Dumps the Shunted AST after shunting
`typed`	Dumps the Typed AST after type checking
`core`	Dumps all stages of the Core language
`ir`	Dumps the JVM bytecode IR representation (only works when running with `--target jvm`)
`jvm`	Dumps the generated JVM bytecode representation from H2JVM (only works when running with `--target jvm`)

All dumps are written to the build/ directory in the current working directory.

Compiler Setup

This section provides instructions on how to set up the Elara compiler in your development environment.

The recommended workflow to build is with Nix, as this will ensure you have the correct versions of all dependencies and tools. If you don’t have / want Nix, you should be able to get away with a manually installed GHC 9.12.2 and Cabal

Building with Nix

Run nix build to build
You should be able to access Elara the executable from ./result/bin/elara

Hacking with Nix

Run nix develop to enter a shell with all dependencies
Use just run to run in development mode (with an interpreter)
To run unit tests, run just test

Building without Nix

Run cabal build to build

Running without Nix

Run cabal run to run
Run cabal test to run unit tests

Compiler Architecture

This document provides a high-level overview of the architecture of the Elara compiler. It is intended for those who wish to understand the internal workings of the compiler, contribute to its development, or extend its functionality.

Design Principles

Query-Based: The compiler is designed around the Rock query system between compilation stages, enabling memoisation and incremental compilation (not implemented yet). Each compilation stage is implemented as a query or group of queries that can fetch results from other stages.

Effectful: The the Effectful library is used to manage side effects in a structured way. For more information on how effects are used in the compiler, see the Effects section.

Trees that Grow: The compiler adopts the Trees that Grow pattern to avoid unnecessary AST duplication between compilation stages. Every Stage until ToCore uses the same AST type with different extensions to represent the information available at that stage.

Effects

Queries

The compiler uses a query-based architecture to manage the computational dependencies between different stages of the compilation process.

This is implemented using the Rock query system, and practically functions similarly to a Makefile.

The complete list of queries can be found in the Query.hs file. Every query is a constructor of the Query GADT.

Primitives

Elara has a small set of built-in primitive types. These have no definition in the language and are hardcoded into the compiler. The primitive types are:

Int_Prim: Represents integer numbers.
Float_Prim: Represents floating-point numbers.
Char_Prim: Represents single characters.
IO_Prim: Represents input/output operations.
String_Prim: Represents sequences of characters.
(): The unit type

Each primitive type has a public facing alias defined in the Elara.Prim module, eg type Int = Int_Prim.

The compiler will always desugar relevant values (e.g. unit literals) to the primitive type. For example, integer literals are treated as being of type Int_Prim.

This system should probably be redesigned in the future - I like that primitives have normal definitions but we are currently very inconsistent internally about how they’re handled which causes a lot of spaghetti.

Stages

The compiler is divided into a series of stages, each responsible for a specific part of the compilation process. Each stage (usually) takes input from the previous stage and produces output for the next stage. The inputs are fetched using the Query System

The stages are as follows:

Lexing: The source code is tokenized into a stream of tokens. This stage is implemented in Lexing.

Lexing

The Lexing stage is the first major stage of the compiler. It takes raw source code as input and produces a stream of tokens as output.

Keyboard shortcuts

Elara Documentation