The BELLE assembler - basm
Chapters
Quickstart
If the build script has not been executed yet, run this.
To assemble source code, execute this.
Or, if the assembler has been installed, run
from any directory.
Different flags can be passed to make the assembler emit different output, but none will affect how it assembles code.
Field | CLI | Variable type | Default value | Example |
---|---|---|---|---|
Source code | file.asm | String | "" | main.asm |
Output binary | -o or --output | String | "a.out" | -o main |
Verbose output | -v or --verbose | Boolean | false | -v |
Debug output | -d or --debug | Boolean | false | -d |
Display tips | -t or --tips | Boolean | false | -t |
Display help | -h or --help | Boolean | false | -h |
Syntax
Note
It is recommended to read the ISA documentation before delving into the documentation for the assembler. This document may be overly technical without prior knowledge of the ISA and assembly code.
Instruction syntax
The BELLE assembler is mostly case-agnostic, as when data is parsed, it either gets converted to upper or lowercase for further processing.
All instructions will be formatted instruction destination, source.
Different operands, depending on the type of operand, will have a different prefixed symbol.
Symbol | Meaning | Description | Example |
---|---|---|---|
; | Comment | A comment in the code. All following data on the line is ignored by the assembler | ; This is a comment |
# or “ | Literal | A literal value to be used as the source for an operation | 4 ; Literal 4 |
r | Register | A register to be used as the source or destination for an operation | r3 ; Register 3 |
$ or [ ] | Memory address | A memory address to be used as the source or destination for an operation | $400 ; Memory address 400 |
&r | Register pointer | A register that contains a memory address that can be accessed by treating the register as a pointer | &r4 ; Treat the value in register 4 as a memory address and obtain the value at that memory address |
&$ or & | Memory address pointer | A memory address whose value is treated as a pointer | &$10 ; Treat the value in memory address 10 as a pointer and obtain the value at the memory address |
@ | Subroutine call | A symbol used to refer to the memory address of a subroutine later in the program | @foo ; This is replaced with the memory address of the 'foo' subroutine at compile time |
. | CPU directive | A one-time directive given to the CPU when the memory is loaded. Expanded upon later | .ssp $40 ; Set stack pointer to memory address 40 |
'' | ASCII code | Resolves to a numeric literal at compile-time | 'a' |
”Variables”
basm
allows the user to declare global, constant values for certain numbers.
These must be prefixed for the assembler to recognize them.
As with operands, symbols for registers are also case-agnostic. However, subroutine calls are not. Therefore, a subroutine called banana
is different from a subroutine called BaNaNa
.
Subroutines
Subroutines are an abstraction at an assembly language level that allows the programmer to define certain locations in the code. Subroutines must be suffixed with a :
, and they can contain any lower and uppercase levels, as well as underscores, and they can begin with underscores and have as many as the programmer desires.
When a subroutine is called with either the jz @subroutine
or jmp @subroutine
instructions, the subroutine will be replaced with the actual memory address of the subroutine in the code.
When a subroutine is jumped to, the memory address for the current location is pushed onto the call stack, and when a ret
instruction is received to return from a subroutine, the value on the top of the stack is popped off and into the program counter.
Note
The value at the top of the stack may not always be the most recent jump, and the value at the top of the stack can be saved immediately after a jump by popping that value into a register. r4
and r5
are typically used to store the value.
Assembler directives
The BELLE assembler has a #include
directive, similar to C/C++, where the user can specify a file to include to the top of the file, allowing for projects to be split across multiple directories and many files.
CPU directives
The BELLE-ISA allows for parts of the CPU to be adjusted based on certain directives that it receives. The parts that it changes are only changed when the program is loaded into memory, and at runtime the changes will not be made.
Directive | Property changed | Description | Example |
---|---|---|---|
.ssp | Stack pointer | .ssp (Set Stack Pointer) changes the stack pointer’s initial value | .ssp $100 |
.sbp | Base pointer | .sbp (Set Base Pointer) changes the base pointer’s initial value | .sbp $100 |
Errors and debugging
Error emission reasons
The assembler is very lenient with arguments passed to each operation (ADD can take subroutines as arguments, JZ can take register values, etc.), however, it can still emit an error.
If the code passed to the assembler contains an error, it will stop assembling, emit the error, and exit.
The following is a list of possible reasons for the assembler to emit an error.
- A register value is too big
- A non-valid syntactical token is found
- A subroutine that is being called is not present in the code
- An invalid instruction is found in the code
- A memory address is too large (physically cannot be encoded into 16-bit instructions)
- A literal value is too large
- An instruction that doesn’t have the correct amount of arguments
Debugging source code
The assembler may emit an error depending on whether or not the code’s syntax is valid. Refer to docs/isa to view the ISA and syntax for the assembly code.
If the error happened at the syntax symbol and token validation stage (the lexer), the assembler will also print a red carrot (^) pointing to the location of the error in the line that contained an error.
Passing certain flags to the assembler, such as -d
or -v
will emit different output.
The -d
flag will display the entire process of assembling source code, and will show every token that the assembler lexes from the input file. The -v
flag will create verbose output, allowing examination of the binary output for every line, if interested.
The assembler can also emit tips for any instance of invalid syntax, and a bug report/issue/PR can be opened if an idea for better tip messages comes to mind for certain errors.
Other
Inspecting output
On most operating systems, there is a utility known as xxd
that can be utilized to view the contents of a binary file in binary form. xxd -b <binary>
can be executed to view the binary of the code, and xxd -b -c 2 <binary>
can be used to view the binary, 16 bits per row (as the instruction length is fixed to 16 bits).
Re-assembling binary
There is a utility in the BELLE program set known as bdump
, which is the BELLE disassembler. bdump
can be called with a binary name to emit the original assembly code. Do note that subroutines will not exist in the diassembled code, as all subroutine calls are simply replaced with memory addresses, and there is no way to make basm
emit binaries that contain subroutines that can be then disassembled.
Technical details
The pipeline
The assembler follows a pipeline to emit the binary code. First, the code is read for #include
directives.
Then, the assembler makes one pass of the code to identify subroutines, and appending them to a global map of subroutines with their respective memory addresses.
Once this is completed, the assembler makes one last pass of the code to assemble the source code, and subroutines are replaced with their respective memory addresses