PureBasic Survival Guide X - Assembly
PureBasic Survival Guide
a tutorial for using purebasic for windows 4.51

Part 0 - TOC
Part I - General
Part II - Converts
Part III - Primer I
Part IV - Primer II
Part V - Advanced
Part VI - 2D Graphics I
Part VII - 2D Graphics II
Part X - Assembly
Part XI - Debugger
Part XII - VirtualBox
Part XIII - Databases
Part XIV - Networking
Part XV - Regular Expressions
Part XVI - Application Data
Part XVII - DPI
Part XXVII - Irregular Expressions
Part XXIX - Projects
 

Part X - Assembly
v0.20 11.08.2016

10.1 It's machinecode, Jim!
10.2 Glossary
10.3 Registers and stack
10.4 PureBasic and assembly
 

10.1 It's machinecode, Jim!
 

I'm not an expert on this, and although I have been planning for ages to return to the subject, I will NOT do so now :-) still this page should be sufficient to get you started.
 

Okay, let's start by saying that I am NOT an expert on this. In fact, I'm pretty much the absolute beginner, which makes me perfectly suited to write an introduction to assembly in PureBasic... Not!

Here's a little information I gathered over time, but I definitely do not have much experience with this, so feel free to doubt my statements and correct my endless mistakes...


10.2 Glossary.
 

skipThis is not the next WikiPedia, but some terms will show up that you need to understand. I'll add some links if I run into them. Skip if you know and just want to see how PureBasic does things...


Bit.

The smallest part a computer knows, can be either 0 or 1.


Nibble.

A group of four bits, numbered from 0 to 3.

bit3 bit2 bit1 bit0
 It can contain values from 0 to 15.


Byte.

A group of eight bits, numbered from 0 to 7.

bit7 ... bit0
It can contain values from 0 to 255. A byte is eight bits, or two nibbles.


Word.

A group of sixteen bits, numbered from 0 to 15.

bit15 ... bit0
It can contain values from 0 to 2^16-1. A word is two bytes. The lower half (bit0 to bit7) is called the lo-byte, the upper half (bit8 to bit15) is called the hi-byte.


Long.

Also known as 'double word' or 'dword'. A group of 32 bits, numbered from 0 to 31, containing values from 0 to 2^32-1. A word contains four bytes or two words. The lower word (bit0 to bit15) is called the lo-word, the upper half (bit16 to bit31) is called the hi-word.


Quad.

A group of 64 bits, numbered from 0 to 63, containing values from 0 to 2^64-1. A quad contains eight bytes or four words or two longs.


Binary.

Also known as base 2. Your computer on its lowest level uses just two numbers, 0 and 1. We can represent any number in a combination of zeroes and ones. Each little part is called a 'bit'.

%00000001 = 2^0 = 1
%00000010 = 2^1 = 2
%00000011       = 3
%00000100 = 2^2 = 4
%00000101       = 5
%00000110       = 6
%00000111       = 7
%00001000 = 2^3 = 8
...
In Basic languages a 'percentage' symbol is often placed in front of a binary number to identify it, for example %1001 is equivalent to decimal 9. In other languages such as C the notation for that same binary number is different: &B1001.


Decimal

Also known as base 10. It's what we humans use. That's what you get for having ten fingers...


Hexadecimal.

Also known as base 16. Binary numbers are a bit hard to remember and way too long for practical purposes. Four bits together are called a 'nibble' and can be represented by one character in hexadecimal:

%00000001 = 2^0 =  1 = 16^0 = $01
%00000010 = 2^1 =  2        = $02
...
%00001000 = 2^3 =  8        = $08
%00001001       =  9        = $09
%00001010       = 10        = $0A
%00001011       = 11        = $0B
...
%00001111       = 15        = $0F
%00010000 = 2^4 = 16 = 16^1 = $10
...
In most Basic dialects the '$' character is placed in front of the number to indicate it's a hexadecimal number. In C the combination &H is used. For example decimal 255 is in PureBasic $FF, in C it's written as either &HFF or 0XFF.


Octal.

I still have to see a practical use for this one :-) but what the heck, I mostly listed this variation here as the C implementation has some consequences... Also known as base 8. Just group three bits together and turn them into their decimal equivalent.

%00000001 = 2^0 =  1 = 8^0 = &O01
%00000010 = 2^1 =  2       = &O02
...
%00001000 = 2^3 =  8 = 8^1 = &O08
%00001001       =  9       = &O10
...
%00011111       = 31       = &O23
...
There's no PureBasic equivalent for this one. In C however there are three ways of writing octals! Decimal 31 can be written as &O23 (that's an ampersand and an 'ooh'), &023 (that's an ampersand and a zero) and 023 (that's just a number starting with a zero). Mightily confusing, especially when using a font which does not clearly differentiate between zeroes and capital 'o'. Good thing we don't have them in PureBasic.


CPU.

The brains of our computer. A little black box that looks for instructions in memory, fetches the appropriate information, and then does something with it :-) A CPU only understands machinecode.


Register.

A smaller part inside the CPU, that can take some information. Depending on the type of CPU, registers can have different sizes.


Operator.

An instruction for the CPU. On older CPU's an instruction would be a single byte. On newer CPU's instructions can take multiple bytes.


Operand.

A parameter for the operand, typically data, numbers, memory addresses etc.


Machinecode.

The actual instruction that the CPU reads, and pretty much unreadable for humans. An example in 'human programming language':

$01         ; open fridge
$02 03      ; take 3 bottles of beer
$03        ; consume
$03         ; consume
$03         ; consume

Mnemonic.

An easier to remember equivalent of the numbers that actually make up machinecode. The code above would read then:

OPNFR       ; open FRidge
TKBEER 03   ; take beer 3 times
CONS        ; consume
CONS        ; consume
CONS        ; consume
Which, with sufficient exercise, may result in faster programming and you enlisting in Alcoholists Anonymous.


Assembly.

Also known as ASM. A collection of mnemonics and related data, making up a program. An assembler then takes the whole package and turns it into a program (well, almost, there's often a linker as well).


Assembler.

Sometimes called a compiler, turns mnemonics and data into a program, ready to be linked.


Linker.

Fills in the blanks. When the assembler is done it may have created a complete set of instructions and data, but it still may miss some information for your program to run. The linker fills in this missing information, and makes sure your program can run on your operating system. Assuming you haven't made any mistakes :-)

(In the old days, there were no linkers.)


x86.

80186, 80286, 80386, 80486, Pentium, Pentium II, Pentium III, Pentium IV, Core 2, Sempron, Celeron, Duron, Ahtlon, Amd64, Phenom, Opteron...

All those numbers and names refer to CPU's, all based on or related to the 8086 of old, and all these processors have some level of compatibility with each other. For simplicity, x86 mostly refers to this group, although the older models are pretty much ignored and miss many instructions that are considerd 'standard' these days (think everything before the Pentium III).


x64.

When AMD arrived on the scene with a 64 bit extension in the instruction set of the Amd64 processor, a new (64 bit) standard was set. It required certain hardware changes and a full rebuild of the operating systems to make the best use of the larger memory space. For the sake of simplicity, the x64 64 bit CPU's offer more memory and some protection mechanisms.


10.3 Registers and stack.
 

The generic registers.

Start the PureBasic IDE. Go to the menu Help / External Help / ASM.HLP. This will bring up an overview of the mnemonics, and you'll also find a little overview of the x86 family architecture. Of course only after you install the ASM.HLP file...

The basic 4 registers are each 4 bytes wide: EAX, EBX, ECX and EDX. There are multiple ways to refer to them. Take for example register A:

  • AL refers to bit0 to bit7 of register A, 8 bits, or the lo-byte of AX.
  • AH refers to bit8 to bit15 of register A, 8 bits, or the hi-byte of AX.
  • AX refers to bit0 to bit15 of register A, 16 bits, or the lo-word of EAX.
  • EAX refers to bit0 to bit31 of register A, 32 bit.
In a little table:
31..16 15..8 7..0
             -AL-
       -AH-
       ----AX----
-------EAX-------
Please be aware: not all registers are created equal... certain instructions work only on certain registers.
 

The other registers.

There's a lot more to be told about registers, but hey, this is only a survival guide :-) I'll add the stuff when I run into it :-) It's for now enough to know that they hold all other kinds of information, for example a pointer where the next instruction will be that the CPU executes, or a place where it stores the results of a calculation et. You may want to check the following links:


Stack.

The 'stack' is a little table that can hold values or addresses. It's a 'last in first out' table. Think about a stack of papers, we put new sheets on top, and take them out from the top, ie. the last one on top is the first one to go out.

A typical use of the stack is to store the return address when calling a 'subroutine'. And, as there are only four generic registers, it's also used to store temporary values.

The easiest way to store all registers is using PUSHAD, which would store ALL registers in the 'stack':

! PUSHAD
...
! POPAD
Of course that's not the smart thing to do if you just need to push a single register or value, for which we have PUSH and POP.
! PUSH dword EAX
! PUSH word BX
...
! POP dword EAX
! POP word BX
Note! If you plan to use local variables inside a procedure grab them first before messing around with the stack!


10.4 PureBasic and assembly.
 

PureBasic allows you to include assembly directly in your source code. The instructions will be processed by the PureBasic compiler, or directly passed on to the compiler. There are certain differences between the two methods.


Rules.

No matter what method you use, the following rules always apply:

  • variables and pointers have to be declared before you use them
  • labels should be preceeded by l_ and be in lowercase
  • use a ProcedureReturn without a parameter to use the contents of EAX as the return value
  • you can freely use EAX, ECX and EDX
  • you have to store / restore the other registers and the stack before continuing your PureBasic source
You have two options: Inline ASM or Direct to Compiler.


Inline ASM.

For Inline ASM: go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. By enclosing a section of your code with EnableASM and DisableASM you can now enter mnemonics directly as if they were PureBasic keywords...

EnableASM
a.l
MOV a.l,2
Debug a
DisableASM
As you can see you can use variable names directly.

If you have installed the AMS.HLP file you can move the cursor on top of the MOV instruction and hit F1, and you will see what that instruction does. Make sure you have enabled inline ASM support. Go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. You can now enter mnemonics directly as if they were PureBasic keywords, as well as access any purebasic variables...

Global b.l
Global c.l
;
Procedure x()
  Protected b.l
  a.l
  EnableASM
  MOV a.l,1   ; moving 1 into local var a
  MOV b.l,2   ; moving 2 into local var b
  MOV c.l,3   ; moving 3 into gloval var c
  DisableASM
  Debug a     ; display local var a - 1
EndProcedure
;
x()
Debug b       ; display global var b - 0
Debug c       ; display global var c - 3
As you can see, the regular rules apply to variable scope (local vs. global etc.).

We can return the results of our assembly code through a variable, by using MOV or something similar.

Procedure.l x()
  Protected r.l
  EnableASM
  MOV EDX,2
  MOV r.l,EDX
  DisableASM
  ProcedureReturn r
EndProcedure
;
Debug x()
In procedures we can also leave a value behind in EAX, which will be returned by a ProcedureReturn without parameter:
Procedure.l x()
  Protected r.l
  EnableASM
  MOV EAX,2
  DisableASM
  ProcedureReturn
EndProcedure
;
Debug x()

Direct to compiler.

It is also possible to pass the instructions directly to the assembler. In other words, the PureBasic compiler will not process the lines, but just passes them on. The following limitations apply:

  • each line is preceeded by an exclamation mark '!'
  • local variables have to be preceeded by p.v_
  • local pointers have to be preceeded by p.p_
  • you have to specify the size of the operand
  • you have to add square brackets '[' and ']' when you deal with the contents of a memory location instead of the location itself
  • you do not need the EnableASM and DisableASM keywords
In comparison, first in in-line ASM:
Procedure.l x()
  Protected r.l
  EnableASM
  MOV r,2
  MOV EAX,r
  DisableASM
  ProcedureReturn
EndProcedure
;
Debug x()
And when directly passed to the compiler:
Procedure.l x()
  Protected r.l
  ! MOV dword [p.v_r],2
  ! MOV dword EAX,[p.v_r]
  ProcedureReturn
EndProcedure
;
Debug x()
Here's another example, to show the differences between inline and direct:
; survival guide 10_4_400 assembly
; pb 4.40b3
;
EnableASM
;
a.l = 1
b.l = 0
;
; get the value of variable a using inline asm
;
MOV EAX, a
MOV b, EAX
Debug b
;
; get the value of variable a using direct asm
;
! MOV dword EAX, [v_a]
! MOV dword [v_b], EAX
Debug b.l
;
; equivalent in purebasic
;
b.l = PeekL(@a)
b.l = a
Debug b
;
; get the address of a variable using inline asm
;
MOV EAX, v_a
MOV b, EAX
Debug b
;
; notice that the following will not work!
;
; MOV EAX, @a
; MOV b, EAX
; Debug b
;
; get the address of variable a using direct asm
;
! MOV dword EAX, v_a
! MOV dword [v_b], EAX
Debug b.l
;
; equivalent in purebasic
;
b.l = @a
Debug b
;
DisableASM


That's all for now. One day I may return to this subject...