Lost Era – Microsoft DOS, 16 bit .COM Assembly


In [here], we talk about the DOSBox, an x86 emulator for DOS that can bring us back the memory of DOS programs. And we also talk about the .COM binary executable, which is simple, has no header or meta data. All data and code are limited to a maximum size of 64 KB less 256 Bytes for PSP Program Segment Prefix.Why 64KB? In 8086 16-bit processor, the memory address (pointer) ranges from 0000H to FFFFH, which is 65536 Bytes. .COM fits perfectly in this model.

However, .COM file is often misused by malicious program because the DOS will look for the .COM extension first and then .EXE next and the last is .BAT for the same file name in this sequence. This can be used by malicious programs.Despite this, .COM format was popular, because it is small and quicker to load into memory. The format is simple and it is easy to generate a .COM program with little knowledge of 16-bit Assembly. With the help of assembler tools such as debug.exe or MASM32 it is convenient for programmers to write small DOS utilities.

As we know, in 32-bit Assembler, the memory address mode is flat, which means the code can address any memory in 4-GB (32-bit virtual memory space). In 16-bit .COM assembler, the mode is tiny. 

In DOS programs, we use interrupts (like sub-procedures) to call functions. For example, int 20h is to return to the command shell, but this is only recommended in .COM format where it is true that CS (Code Segment) register points to PSP. In .EXE, this might not be the case. Therefore, it is recommended to use the 4C interrupt (with the return value is store in register AL). [read more here]

mov ah, 4c
mov al, 2 
int 21h

The following image [press for a full resolution] shows some simple process to create the most simplest .COM program, that is to print the “Hello, World!”.

4003d6ce7a4fa6efc20947689cced60a.png500 Lost Era - Microsoft DOS, 16 bit .COM Assembly 16 bit assembly language DOS DOSBOX I/O File implementation MSDOS 16-bit optimization programming languages windows command shell

There is a famous Linux program, yes which is described in [here]. We can accomplish the same thing using 16-bit .COM assembly. We can print the string straight away using the 9th interrupt with register DX pointing to the address of the string. Or we can use 2nd interrupt to print a single character each time (8 bit register DL contains the ASCII code of the character to print).

If no checks of exiting the program (but DOS checks Ctrl + C), then we can simply do an endless loop with the assembly instruction jmp.

The code would be something like this for printing the whole string.

  mov ah, 9
  mov dx, offset yes
rep:
  int 21h
  jmp rep
yes db "yes$"

Or we can print each letter one by one. 0x0D and 0x0A together is the carriage return (CR-LF).

  mov ah, 2
rep:
  mov dl, 'y'
  int 21h
  mov dl, 'e'
  int 21h
  mov dl, 's'
  int 21h
  mov dl, 0d
  int 21h
  mov dl, 0a
  int 21h
  jmp rep

The 01 interrupt reads a character from STDIN with echo (meaning the pressed-key will be output to STDOUT). The 08 interrupt is similar but with no error (the character is not shown). Both returns the pressed key in register AL. We can then check the key for say, ‘q’ that will exit the program immediately otherwise keep printing the message yes. The modified versions are:

  mov dx, offset yes
rep:
  mov ah, 9
  int 21h
  mov ah, 08h
  int 21h
  cmp al, 'q'
  je exit
  jmp rep
exit:
  int 20h
yes db "yes$"

and here is another one:

rep:
  mov ah, 2
  mov dl, 'y'
  int 21h
  mov dl, 'e'
  int 21h
  mov dl, 's'
  int 21h
  mov dl, 0d
  int 21h
  mov dl, 0a
  int 21h
  mov ah, 08h
  int 21h
  cmp al, 'q'
  je exit
  jmp rep
exit:
  int 20h

We use cmp to compare the register AL (the ASCII key code pressed) with the key ‘q’. If it is equal (in this case, je instruction is short for jump if equal) then we terminate the program. We slightly adjust the position of label to jump to because calling several interrupts will often overwrite these general registers (output, return) such as AX, BX, CX and DX. These are 16-bit. These can be further divided into upper and lower half 8-bit  register: AL, AH, BL, BH, CL, CH, DL and DH.

It is noted that we can write it many ways and achieve the same tasks. However, the code efficiency vary. That is why code optimization is so important to figure out which instructions are better in terms of speed (faster with shorter CPU cycles) and storage (smaller size). For example, we can subtract the register AL with ‘q’ and check if the result equals to zero. In this case, we can use jump if zero which is je. There are lots of jump instructions which are suitable for different scenarios. You can understand how inconveniences for DOS programmers at old times because they have to check (not easy to remember all those difference)

For example, to set register AX to zero can be optimized. The straightforward way is to use the data move instruction mov.

mov ax, 0

But this is often optimized to this one (to perform the exclusive or operation)

xor ax, ax

or (to subtract itself, that results zero)

sub ax, ax

However, these three statements are not identically same in terms of setting flags. xor and sub will set the ZERO flag.

Another example is to check if equal to zero, we can use:

cmp ax, 0

or we can use the following:

or ax, ax

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
1307 words
Last Post: Lost Era - DOSBox, an x86 emulator with DOS - Hello World Assembly COM
Next Post: Lost Era, Microsoft DOS, .COM Assembly, Print Letters using Loop

The Permanent URL is: Lost Era – Microsoft DOS, 16 bit .COM Assembly

Leave a Reply