Binary files are not complex. The DOS .COM format is simple. The following demonstrates the idea of converting a short message (ASCII text) to .COM executable. As we know, the maximum size of the 16-bit DOS COM file is 65536-256 = 65280 bytes, there indeed is a max length restriction. This experiment is also helpful in understanding injection of executable files (or code in memory).
First, let’s recall that the simplest Hello World program in DOS .COM is:
mov ah, 9h mov dx, offset msg int 21h int 20h msg db "Hello, World!$"
The DOS interrupt 9th is to print a message that ends with dollar sign $. The Register AH specifies the function to call. The Register DX gives the address of string to print. int 21h calls the MS-DOS APIs and int 20h calls the DOS exiting function which requires the Register CS pointing to program’s PSP.
However, there are many many other ways to achieve the same tasks. For example, you can re-arrange the code (putting message declaration at first). It goes like this.
jmp print msg db "Hello, World!$" print: mov ah, 9h mov dx, offset msg int 21h int 20h
If we dynamically adjust the length of the string (injection), the address of print will change and so will jmp print (machine code). The jmp x86 instruction will be actually translated to different machine code according to the type of jumps (short, near, far, register etc). The operand (address) comes after it is actually calculated as the offset to its next instruction (therefore, jmp 102 will be actually translated to EB00 if current jmp starts at address 100). But still, it is possible to compute this but just not as easy as the previous approach.
We can use AL = 2h function to print a single character each time, and put this in an iteration if the number of characters is known (this is for sure). We can aslo use AL = 40h, BX = 1 to print to STDOUT (given the number stored in Register CX). The address of the string is kept at Register DX. So a similar assembly code is like this:
mov ah, 40h mov bx, 1 mov cx, 0ch mov dx, offset msg int 21h int 20h msg db "Hello, World!"
Similarly, we can re-arrange the data and we will have something equivalent like this:
jmp print msg db "Hello, World!" print: mov ah, 40h mov bx, 1 mov cx, 0ch mov dx, offset msg int 21h int 20h
This approach has the same problem: The address of the print label changes as the length of msg changes so it will make the injection complicated. So we come up with the first approaches, the ones without jmp.
#!/usr/bin/env python # https://helloacm.com def WriteMessageToDosCOM9(filename, message): ls = len(message) if ls > 0xFFFF - 0x100 - 10: return False fp = open(filename, "wb") fp.write("\xB4\x09") # mov ah, 9h fp.write("\xBA\x09\x01") # mov dx, 0109h fp.write("\xCD\x21") # int 21h fp.write("\xCD\x20") # int 20h fp.write(message) # start of message [0109] fp.write("\x24") # db '$' to end the message fp.close() def WriteMessageToDosCOM40(filename, message): ls = len(message) if ls > 0xFFFF - 0x100 - 10 - 5: return False fp = open(filename, "wb") fp.write("\xB4\x40") # mov ah, 40h fp.write("\xBB\x01\x00") # mov bx, 01 fp.write("\xB9" ) fp.write(chr(ls & 0xff) + chr(ls >> 8)) # length fp.write("\xBA\x0F\x01") # mov dx, 010f fp.write("\xCD\x21") # int 21h fp.write("\xCD\x20") # int 20h fp.write(message) # start of message [010f] fp.close() if __name__ == "__main__": WriteMessageToDosCOM9("D:\\Dropbox\\DOS\\msg9.com", "Hello,$$ justyy!") WriteMessageToDosCOM40("D:\\Dropbox\\DOS\\msg40.com", "Hello,$$ justyy!")
The above Python provides two functions to write to binary .COM to print a short message. We generate two .COM executables namely msg9.com and msg40.com which use 9th and 40th to print the message respectively. Apart from file size differences (msg40.com is 5 bytes larger than msg9.com), there is one major difference: msg40.com allows you to print the dollar sign $ but msg9.com will print characters until it meets $.
How do we make some .COM message printing on the fly? We can use following:
The above will append the ASCII string to the msg9.com and be noted that if it uses 09th function to print. You will have to manually add the $ sign at the end of string but you can omit this if you use msg40.com. So generally, msg40.com is far better in this case because it can print any ASCII characters and just at the cost of 5 additional bytes only! However, this does not work for msg40.com because we have to change to value of Register CX otherwise it won’t print the whole string (or it will print extra if the value stored in file is larger than the actual string).
The experiment shows how the code injection/generation kinda works.
–EOF (The Ultimate Computing & Technology Blog) —
loading...
Last Post: Lost Era, Microsoft DOS, 16-bit Assembly, Echo program revisited
Next Post: Python range and xrange