NightLive

PureBasic Survival Guide X - Assembly

PureBasic Survival Guide
a tutorial for using purebasic for windows 5.61

Part 0 - TOC
Part I - General
Part II - Converts
Part III - Primer I
Part IV - Primer II
Part V - Advanced
Part VI - 2D Graphics I
Part VII - 2D Graphics II
Part X - Assembly
Part XI - Debugger
Part XII - VirtualBox
Part XIII - Databases
Part XIV - Networking
Part XV - Regular Expressions
Part XVI - Application Data
Part XVII - DPI
Part XXVII - Irregular Expressions
Part XXIX - Projects

Part X - Assembly
v0.21 07.09.2018

10.1 It's machinecode, Jim!
10.2 Glossary
10.3 Registers and stack
10.4 PureBasic and assembly

10.1 It's machinecode, Jim!

I'm not an expert on this, and although I have been planning for ages to return to the subject, I will NOT do so now :-) still this page should be sufficient to get you started.

Okay, let's start by saying that I am NOT an expert on this. In fact, I'm pretty much the absolute beginner, which makes me perfectly suited to write an introduction to assembly in PureBasic... Not!

Here's a little information I gathered over time, but I definitely do not have much experience with this, so feel free to doubt my statements and correct my endless mistakes...

10.2 Glossary

This is not the next WikiPedia, but some terms will show up that you need to understand. I'll add some links if I run into them. Skip if you know and just want to see how PureBasic does things...

Bit

The smallest part a computer knows, can be either 0 or 1.

Nibble

A group of four bits, numbered from 0 to 3.

bit3 bit2 bit1 bit0

It can contain values from 0 to 15.

Byte

A group of eight bits, numbered from 0 to 7.

bit7 ... bit0

It can contain values from 0 to 255. A byte is eight bits, or two nibbles.

Word

A group of sixteen bits, numbered from 0 to 15.

bit15 ... bit0

It can contain values from 0 to 2^16-1. A word is two bytes. The lower half (bit0 to bit7) is called the lo-byte, the upper half (bit8 to bit15) is called the hi-byte.

Long

Also known as 'double word' or 'dword'. A group of 32 bits, numbered from 0 to 31, containing values from 0 to 2^32-1. A word contains four bytes or two words. The lower word (bit0 to bit15) is called the lo-word, the upper half (bit16 to bit31) is called the hi-word.

Quad

A group of 64 bits, numbered from 0 to 63, containing values from 0 to 2^64-1. A quad contains eight bytes or four words or two longs.

Binary

Also known as base 2. Your computer on its lowest level uses just two numbers, 0 and 1. We can represent any number in a combination of zeroes and ones. Each little part is called a 'bit'.

%00000001 = 2^0 = 1
%00000010 = 2^1 = 2
%00000011       = 3
%00000100 = 2^2 = 4
%00000101       = 5
%00000110       = 6
%00000111       = 7
%00001000 = 2^3 = 8
...

In Basic languages a 'percentage' symbol is often placed in front of a binary number to identify it, for example %1001 is equivalent to decimal 9. In other languages such as C the notation for that same binary number is different: &B1001.

Decimal

Also known as base 10. It's what we humans use. That's what you get for having ten fingers...

Hexadecimal

Also known as base 16. Binary numbers are a bit hard to remember and way too long for practical purposes. Four bits together are called a 'nibble' and can be represented by one character in hexadecimal:

%00000001 = 2^0 = 1 = 16^0 = $01
%00000010 = 2^1 = 2        = $02
...
%00001000 = 2^3 = 8        = $08
%00001001       = 9        = $09
%00001010       = 10        = $0A
%00001011       = 11        = $0B
...
%00001111       = 15        = $0F
%00010000 = 2^4 = 16 = 16^1 = $10
...

In most Basic dialects the '$' character is placed in front of the number to indicate it's a hexadecimal number. In C the combination &H is used. For example decimal 255 is in PureBasic $FF, in C it's written as either &HFF or 0XFF.

Octal

I still have to see a practical use for this one :-) but what the heck, I mostly listed this variation here as the C implementation has some consequences... Also known as base 8. Just group three bits together and turn them into their decimal equivalent.

%00000001 = 2^0 = 1 = 8^0 = &O01
%00000010 = 2^1 = 2       = &O02
...
%00001000 = 2^3 = 8 = 8^1 = &O08
%00001001       = 9       = &O10
...
%00011111       = 31       = &O23
...

There's no PureBasic equivalent for this one. In C however there are three ways of writing octals! Decimal 31 can be written as &O23 (that's an ampersand and an 'ooh'), &023 (that's an ampersand and a zero) and 023 (that's just a number starting with a zero). Mightily confusing, especially when using a font which does not clearly differentiate between zeroes and capital 'o'. Good thing we don't have them in PureBasic.

CPU

The brains of our computer. A little black box that looks for instructions in memory, fetches the appropriate information, and then does something with it :-) A CPU only understands machinecode.

Register

A smaller part inside the CPU, that can take some information. Depending on the type of CPU, registers can have different sizes.

Operator

An instruction for the CPU. On older CPU's an instruction would be a single byte. On newer CPU's instructions can take multiple bytes.

Operand

A parameter for the operand, typically data, numbers, memory addresses etc.

Machinecode

The actual instruction that the CPU reads, and pretty much unreadable for humans. An example in 'human programming language':

$01         ; open fridge
$02 03      ; take 3 bottles of beer
$03        ; consume
$03         ; consume
$03         ; consume

Mnemonic

An easier to remember equivalent of the numbers that actually make up machinecode. The code above would read then:

OPNFR       ; open FRidge
TKBEER 03   ; take beer 3 times
CONS        ; consume
CONS        ; consume
CONS        ; consume

Which, with sufficient exercise, may result in faster programming and you enlisting in Alcoholists Anonymous.

Assembly

Also known as ASM. A collection of mnemonics and related data, making up a program. An assembler then takes the whole package and turns it into a program (well, almost, there's often a linker as well).

Assembler

Sometimes called a compiler, turns mnemonics and data into a program, ready to be linked.

Linker

Fills in the blanks. When the assembler is done it may have created a complete set of instructions and data, but it still may miss some information for your program to run. The linker fills in this missing information, and makes sure your program can run on your operating system. Assuming you haven't made any mistakes :-)

(In the old days, there were no linkers.)

x86 / x32

80186, 80286, 80386, 80486, Pentium, Pentium II, Pentium III, Pentium IV, Core 2, Sempron, Celeron, Duron, Ahtlon, Amd64, Phenom, Opteron...

All those numbers and names refer to CPU's, all based on or related to the 8086 of old, and all these processors have some level of compatibility with each other. For simplicity, x86 mostly refers to this group, although the older models are pretty much ignored and miss many instructions that are considerd 'standard' these days (think everything before the Pentium III).

On Win32 you can only use eax, ecx, edx, xmm0, xmm1, xmm2 and xmm3.

x64

When AMD arrived on the scene with a 64 bit extension in the instruction set of the Amd64 processor, a new (64 bit) standard was set. It required certain hardware changes and a full rebuild of the operating systems to make the best use of the larger memory space. For the sake of simplicity, the x64 64 bit CPU's offer more memory and some protection mechanisms.

On Win64 you can use only use rax, rcx, rdx, r8, r9, xmm0, xmm1, xmm2 and xmm3.

10.3 Registers and stack

The generic registers

The basic 4 registers when using Win32 are each 4 bytes wide: EAX, EBX, ECX and EDX. There are multiple ways to refer to them. Take for example register A:

AL refers to bit0 to bit7 of register A, 8 bits, or the lo-byte of AX.
AH refers to bit8 to bit15 of register A, 8 bits, or the hi-byte of AX.
AX refers to bit0 to bit15 of register A, 16 bits, or the lo-word of EAX.
EAX refers to bit0 to bit31 of register A, 32 bit.

In a little table:

31..16 15..8 7..0
             -AL-
       -AH-
       ----AX----
-------EAX-------

Please be aware: not all registers are created equal... certain instructions work only on certain registers.

The other registers

There's a lot more to be told about registers, but hey, this is only a survival guide :-) I'll add the stuff when I run into it :-) It's for now enough to know that they hold all other kinds of information, for example a pointer where the next instruction will be that the CPU executes, or a place where it stores the results of a calculation et. You may want to check the following links:

Stack

The 'stack' is a little table that can hold values or addresses. It's a 'last in first out' table. Think about a stack of papers, we put new sheets on top, and take them out from the top, ie. the last one on top is the first one to go out.

A typical use of the stack is to store the return address when calling a 'subroutine'. And, as there are only four generic registers, it's also used to store temporary values.

The easiest way to store all registers is using PUSHAD, which would store ALL registers in the 'stack':

! PUSHAD
...
! POPAD

Of course that's not the smart thing to do if you just need to push a single register or value, for which we have PUSH and POP.

! PUSH dword EAX
! PUSH word BX
...
! POP dword EAX
! POP word BX

Note! If you plan to use local variables inside a procedure grab them first before messing around with the stack!

10.4 PureBasic and assembly

PureBasic allows you to include assembly directly in your source code. The instructions will be processed by the PureBasic compiler, or directly passed on to the compiler. There are certain differences between the two methods.

Rules

No matter what method you use, the following rules always apply:

variables and pointers have to be declared before you use them
labels should be preceeded by l_ and be in lowercase
use a ProcedureReturn without a parameter to use the contents of EAX as the return value
make sure you use the proper registers on Win32 and Win64
you have to store / restore the other registers and the stack before continuing your PureBasic source

You have two options: Inline ASM or Direct to Compiler.

Inline ASM

For Inline ASM: go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. By enclosing a section of your code with EnableASM and DisableASM you can now enter mnemonics directly as if they were PureBasic keywords...

EnableASM
a.l
MOV a.l,2
Debug a
DisableASM

As you can see you can use variable names directly.

If you have installed the AMS.HLP file you can move the cursor on top of the MOV instruction and hit F1, and you will see what that instruction does. Make sure you have enabled inline ASM support. Go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. You can now enter mnemonics directly as if they were PureBasic keywords, as well as access any purebasic variables...

Global b.l
Global c.l
;
Procedure x()
Protected b.l
a.l
EnableASM
MOV a.l,1   ; moving 1 into local var a
MOV b.l,2   ; moving 2 into local var b
MOV c.l,3   ; moving 3 into gloval var c
DisableASM
Debug a     ; display local var a - 1
EndProcedure
;
x()
Debug b       ; display global var b - 0
Debug c       ; display global var c - 3

As you can see, the regular rules apply to variable scope (local vs. global etc.).

We can return the results of our assembly code through a variable, by using MOV or something similar.

Procedure.l x()
Protected r.l
EnableASM
MOV EDX,2
MOV r.l,EDX
DisableASM
ProcedureReturn r
EndProcedure
;
Debug x()

In procedures we can also leave a value behind in EAX, which will be returned by a ProcedureReturn without parameter:

Procedure.l x()
Protected r.l
EnableASM
MOV EAX,2
DisableASM
ProcedureReturn
EndProcedure
;
Debug x()

Direct to compiler

It is also possible to pass the instructions directly to the assembler. In other words, the PureBasic compiler will not process the lines, but just passes them on. The following limitations apply:

each line is preceeded by an exclamation mark '!'
local variables have to be preceeded by p.v_
local pointers have to be preceeded by p.p_
you have to specify the size of the operand
you have to add square brackets '[' and ']' when you deal with the contents of a memory location instead of the location itself
you do not need the EnableASM and DisableASM keywords

In comparison, first in in-line ASM:

Procedure.l x()
Protected r.l
EnableASM
MOV r,2
MOV EAX,r
DisableASM
ProcedureReturn
EndProcedure
;
Debug x()

And when directly passed to the compiler:

Procedure.l x()
Protected r.l
! MOV dword [p.v_r],2
! MOV dword EAX,[p.v_r]
ProcedureReturn
EndProcedure
;
Debug x()

Here's another example, to show the differences between inline and direct:

; survival guide 10_4_400 assembly
; pb 5.61 win32
;
EnableASM
;
a.l = 1
b.l = 0
;
; get the value of variable a using inline asm
;
MOV EAX, a
MOV b, EAX
Debug b
;
; get the value of variable a using direct asm
;
! MOV dword EAX, [v_a]
! MOV dword [v_b], EAX
Debug b.l
;
; equivalent in purebasic
;
b.l = PeekL(@a)
b.l = a
Debug b
;
; get the address of a variable using inline asm
; no longer works with pb 5.61
;
; MOV EAX, v_a
; MOV b, EAX
; Debug b
;
; notice that the following will not work!
;
; MOV EAX, @a
; MOV b, EAX
; Debug b
;
; get the address of variable a using direct asm
; no longer works with pb 5.61
;
; ! MOV dword EAX, v_a
; ! MOV dword [v_b], EAX
; Debug b.l
;
; equivalent in purebasic
;
b.l = @a
Debug b
;
DisableASM

When using Win64:

; survival guide 10_4_401 assembly
; pb 5.61 win64
;
EnableASM
;
a.q = 1
b.q = 0
;
; get the value of variable a using inline asm
;
MOV RAX, a
MOV b, RAX
Debug b
;
; get the value of variable a using direct asm
;
! MOV qword RAX, [v_a]
! MOV qword [v_b], RAX
Debug b.q
;
; equivalent in purebasic
;
b.q = PeekI(@a)
b.q = a
Debug b
;
; get the address of a variable using inline asm
; no longer works with pb 5.61
;
; MOV RAX, v_a
; MOV b, RAX
; Debug b
;
; notice that the following will not work!
;
; MOV RAX, @a
; MOV b, RAX
; Debug b
;
; get the address of variable a using direct asm
; no longer works with pb 5.61
;
; ! MOV qword RAX, v_a
; ! MOV qword [v_b], RAX
; Debug b.l
;
; equivalent in purebasic
;
b.q = @a
Debug b
;
DisableASM

That's all for now. One day I may return to this subject...