Lost Era, MS-DOS 16-bit Assembly, Generate a DOS .COM Message Print using Python, Write Binary Code


Binary files are not complex. The DOS .COM format is simple. The following demonstrates the idea of converting a short message (ASCII text) to .COM executable. As we know, the maximum size of the 16-bit DOS COM file is 65536-256 = 65280 bytes, there indeed is a max length restriction. This experiment is also helpful in understanding injection of executable files (or code in memory).

First, let’s recall that the simplest Hello World program in DOS .COM is:

mov ah, 9h
mov dx, offset msg
int 21h
int 20h
msg db "Hello, World!$"

The DOS interrupt 9th is to print a message that ends with dollar sign $. The Register AH specifies the function to call. The Register DX gives the address of string to print. int 21h calls the MS-DOS APIs and int 20h calls the DOS exiting function which requires the Register CS pointing to program’s PSP.

python-com-print Lost Era, MS-DOS 16-bit Assembly, Generate a DOS .COM Message Print using Python, Write Binary Code 16 bit assembly language code DOS DOSBOX I/O File implementation interpreter / compiler memory MSDOS 16-bit programming languages python tools / utilities tricks windows

However, there are many many other ways to achieve the same tasks. For example, you can re-arrange the code (putting message declaration at first). It goes like this.

  jmp print
  msg db "Hello, World!$"
print:
  mov ah, 9h
  mov dx, offset msg
  int 21h
  int 20h

If we dynamically adjust the length of the string (injection), the address of print will change and so will jmp print (machine code). The jmp x86 instruction will be actually translated to different machine code according to the type of jumps (short, near, far, register etc). The operand (address) comes after it is actually calculated as the offset to its next instruction (therefore, jmp 102 will be actually translated to EB00 if current jmp starts at address 100). But still, it is possible to compute this but just not as easy as the previous approach.

We can use AL = 2h function to print a single character each time, and put this in an iteration if the number of characters is known (this is for sure). We can aslo use AL = 40h, BX = 1 to print to STDOUT (given the number stored in Register CX). The address of the string is kept at Register DX. So a similar assembly code is like this:

mov ah, 40h
mov bx, 1
mov cx, 0ch
mov dx, offset msg
int 21h
int 20h
msg db "Hello, World!"

Similarly, we can re-arrange the data and we will have something equivalent like this:

  jmp print
  msg db "Hello, World!"
print:
  mov ah, 40h
  mov bx, 1
  mov cx, 0ch
  mov dx, offset msg
  int 21h
  int 20h

This approach has the same problem: The address of the print label changes as the length of msg changes so it will make the injection complicated. So we come up with the first approaches, the ones without jmp.

#!/usr/bin/env python
# https://helloacm.com

def WriteMessageToDosCOM9(filename, message):
    ls = len(message)
    if ls > 0xFFFF - 0x100 - 10:
        return False
    fp = open(filename, "wb")
    fp.write("\xB4\x09") # mov ah, 9h
    fp.write("\xBA\x09\x01")  # mov dx, 0109h
    fp.write("\xCD\x21") # int 21h
    fp.write("\xCD\x20") # int 20h
    fp.write(message) # start of message [0109]
    fp.write("\x24") # db '$' to end the message
    fp.close()

def WriteMessageToDosCOM40(filename, message):
    ls = len(message)
    if ls > 0xFFFF - 0x100 - 10 - 5:
        return False
    fp = open(filename, "wb")
    fp.write("\xB4\x40") # mov ah, 40h
    fp.write("\xBB\x01\x00")  # mov bx, 01
    fp.write("\xB9" )
    fp.write(chr(ls & 0xff) + chr(ls >> 8)) # length
    fp.write("\xBA\x0F\x01") # mov dx, 010f
    fp.write("\xCD\x21") # int 21h
    fp.write("\xCD\x20") # int 20h
    fp.write(message) # start of message [010f]
    fp.close()

if __name__ == "__main__":
    WriteMessageToDosCOM9("D:\\Dropbox\\DOS\\msg9.com", "Hello,$$ justyy!")
    WriteMessageToDosCOM40("D:\\Dropbox\\DOS\\msg40.com", "Hello,$$ justyy!")

The above Python provides two functions to write to binary .COM to print a short message. We generate two .COM executables namely msg9.com and msg40.com which use 9th and 40th to print the message respectively. Apart from file size differences (msg40.com is 5 bytes larger than msg9.com), there is one major difference: msg40.com allows you to print the dollar sign $ but msg9.com will print characters until it meets $.

msg40-9 Lost Era, MS-DOS 16-bit Assembly, Generate a DOS .COM Message Print using Python, Write Binary Code 16 bit assembly language code DOS DOSBOX I/O File implementation interpreter / compiler memory MSDOS 16-bit programming languages python tools / utilities tricks windows

How do we make some .COM message printing on the fly? We can use following:

dos-msg Lost Era, MS-DOS 16-bit Assembly, Generate a DOS .COM Message Print using Python, Write Binary Code 16 bit assembly language code DOS DOSBOX I/O File implementation interpreter / compiler memory MSDOS 16-bit programming languages python tools / utilities tricks windows

The above will append the ASCII string to the msg9.com and be noted that if it uses 09th function to print. You will have to manually add the $ sign at the end of string but you can omit this if you use msg40.com. So generally, msg40.com is far better in this case because it can print any ASCII characters and just at the cost of 5 additional bytes only! However, this does not work for msg40.com because we have to change to value of Register CX otherwise it won’t print the whole string (or it will print extra if the value stored in file is larger than the actual string).

The experiment shows how the code injection/generation kinda works.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
1086 words
Last Post: Lost Era, Microsoft DOS, 16-bit Assembly, Echo program revisited
Next Post: Python range and xrange

The Permanent URL is: Lost Era, MS-DOS 16-bit Assembly, Generate a DOS .COM Message Print using Python, Write Binary Code

Leave a Reply