Using self modifying code under Linux

by Karsten Scheibler
thanks to Stefan Esser and Maciej Hrebien

Remark (a):
Save this page as smc.html and use the following command to extract the source code with a sample Makefile:
sh -c "( mkdir smc && cd smc && awk '/^<!--eof/{d=0}{if(d)print>f}/^<!--sof/{f=\$2;d=1}' ) < smc.html"
Remark (b):
Some browsers try to be super smart and reformat the html code so you may have trouble to extract the files. This is the right time to learn more about wget, a tool to download files via http or ftp ;-)
Remark (c):
Some versions of nasm are known to generate broken code. I have used nasm-0.98 for this example which worked for me.

I know that this is not the cleanest way of programming, but sometimes this programming style is faster. Furthermore this method is used by JIT (just in time) compilers. Transmeta also uses some sort of self modifying code to implement a x86 software emulation. They call it Codemorphing ;-) This example code will show how this can be done under Linux.

The idea behind: There is a syscall sys_mprotect which allows a program to change the flags for (nearly) every page. A page is the smallest memory unit available for virtual memory management. On the x86 architecture the page size is 4KB. But as i noticed we didn't need this call to make a page in the section .bss executable, because on x86 a readable page is also a executable page, and the section .bss is read/write. But this behavior may be change with the appearance of the NX-flag in modern processors, so the sys_mprotect becomes mandatory.

The first two examples copy codesnippets to section .bss and execute them. Because we have read/write/execute permissions in this memory area the code is allowed to modify itself. The first example copies a snippet of code which writes text to screen (via sys_write), but before calling it we modify some values in the code itself (the start and the length of the text). The second one does some real self modification (see code2_start). The rep stosb instruction overwrites the first four inc ebx with nop, so that the line put on screen contains a 04h instead of the expected 08h. The third example modifies code in section .text. It looks like an endless loop, but it isn't. Find out yourself why ;-).

Note 1:

The call instruction in this code must be done indirect, because if the address is given directly it is a relative address (signed 32 Bit) after copying and starting the code you will get a SEGFAULT. If the address is given indirectly it is an absolute value, which should work position independent.

Note 2:

If you see a 08h instead of a 04h in the output, you get to know another strange effect of self modifying code. If i remember right there was something like a prefetch queue. This prefetch queue holds the next bytes after the currently processed instruction in the processor (on my old 386SX it was 13 Bytes long, examined with a DOS Assemblyprogram long time ago ;-). If you overwrite this queued instructions it depends on the processor if the prefetch queue is reloaded or not. I expected it also on my K6-2, but as Stefan Esser mentioned: "any Clone of the Pentium or above should behave friendly cause with the Pentium Intel build some Prefetch Queue Modification Detection into the CPU. If you write to the code that normaly would be inside the prefetch range the pentium automaticly discards its queque and reloads...". If you want to make sure that the reload happens use a jmp instruction (try a jmp after the rep stosb).


;****************************************************************************
;****************************************************************************
;*
;* USING SELF MODIFYING CODE UNDER LINUX
;*
;* written by Karsten Scheibler, 2004-AUG-09
;*
;****************************************************************************
;****************************************************************************





global smc_start



;****************************************************************************
;* some assign's
;****************************************************************************
%assign SYS_WRITE			4
%assign SYS_MPROTECT			125

%assign PROT_READ			1
%assign PROT_WRITE			2
%assign PROT_EXEC			4



;****************************************************************************
;* data
;****************************************************************************
section .bss
					alignb	4
modified_code:				resb	0x2000



;****************************************************************************
;* smc_start
;****************************************************************************
section .text
smc_start:

	;calculate the address in section .bss, it must lie on a page
	;boundary (x86: 4KB = 0x1000)
	;NOTE: In this example obsolete because each segment is page
	;      aligned and we use it only once, so we know that it is
	;      aligned to a page boundary, but if you have more than
	;      one section .bss in your code (or link several objects
	;      together) you can't be sure about that

	mov	dword ebp, (modified_code + 0x1000)
	and	dword ebp, 0xfffff000

	;change flags of this page to read + write + executable,
	;NOTE: On x86 Architecture this call is obsolete, because for
	;      section .bss PROT_READ and PROT_WRITE are already set.
	;      PROT_EXEC is on x86 also set if PROT_READ is set, this
	;      results in rwx for this segment, but this behavior may
	;      change with appearance of the NX-flag in modern processors

	mov	dword eax, SYS_MPROTECT
	mov	dword ebx, ebp
	mov	dword ecx, 0x1000
	mov	dword edx, (PROT_READ | PROT_WRITE | PROT_EXEC)
	int	byte  0x80
	test	dword eax, eax
	js	near  smc_error

	;execute unmodified code first

code1_start:
	mov	dword eax, SYS_WRITE
	mov	dword ebx, 1
	mov	dword ecx, hello_world_1
code1_mark_1:
	mov	dword edx, (hello_world_2 - hello_world_1)
code1_mark_2:
	int	byte  0x80
code1_end:

	;copy code snippet from above to our page (address is still in ebp)

	mov	dword ecx, (code1_end - code1_start)
	mov	dword esi, code1_start
	mov	dword edi, ebp
	cld
	rep movsb

	;append 'ret' opcode to it, so that we can do a call to it

	mov	byte  al, [return]
	stosb

	;change some values in the copied code: start address of the text
	;and its length

	mov	dword eax, hello_world_2
	mov	dword ebx, (code1_mark_1 - code1_start)
	mov	dword [ebx + ebp - 4], eax
	mov	dword eax, (hello_world_3 - hello_world_2)
	mov	dword ebx, (code1_mark_2 - code1_start)
	mov	dword [ebx + ebp - 4], eax

	;finally call it

	call	dword ebp

	;copy second example

	mov	dword ecx, (code2_end - code2_start)
	mov	dword esi, code2_start
	mov	dword edi, ebp
	rep movsb

	;do something real nasty: edi points right after the 'rep stosb'
	;instruction, so this will really modify itself

	mov	dword edi, ebp
	add	dword edi, (code2_mark - code2_start)
	call	dword ebp

	;modify code in section .text itself

endless:
	;allow us to write to section .text

	mov	dword eax, SYS_MPROTECT
	mov	dword ebx, smc_start
	and	dword ebx, 0xfffff000
	mov	dword ecx, 0x2000
	mov	dword edx, (PROT_READ | PROT_WRITE | PROT_EXEC)
	int	byte  0x80
	test	dword eax, eax
	js	near  smc_error

	;write message to screen

	mov	dword eax, SYS_WRITE
	mov	dword ebx, 1
	mov	dword ecx, endless_loop
	mov	dword edx, (hello_world_1 - endless_loop)
	int	byte  0x80

	;here comes the magic, which prevents endless execution

	mov	dword ecx, (smc_end_1 - smc_end)
	mov	dword esi, smc_end
	mov	dword edi, endless
	rep movsb

	;do it again

	jmp	short endless



;****************************************************************************
;* code2
;****************************************************************************

	;this is the ret opcode we copy above
	;and the nop opcode needed by code2

return:
	ret
no_operation:
	nop

	;here some real selfmodifying code, if copied
	;to .bss and edi correctly loaded ebx should contain 0x4 instead
	;of 0x8

code2_start:
	mov	byte  al, [no_operation]
	xor	dword ebx, ebx
	mov	dword ecx, 0x04
	rep stosb
code2_mark:
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	inc	dword ebx
	call	dword [function_pointer]
	ret
code2_end:
					align 4
function_pointer:			dd	write_hex



;****************************************************************************
;* write_hex
;****************************************************************************
write_hex:
	mov	byte  bh, bl
	shr	byte  bl, 4
	add	byte  bl, 0x30
	cmp	byte  bl, 0x3a
	jb	short .number_1
	add	byte  bl, 0x07
.number_1:
	mov	byte  [hex_number], bl
	and	byte  bh, 0x0f
	add	byte  bh, 0x30
	cmp	byte  bh, 0x3a
	jb	short .number_2
	add	byte  bh, 0x07
.number_2:
	mov	byte  [hex_number + 1], bh
	mov	dword eax, SYS_WRITE
	mov	dword ebx, 1
	mov	dword ecx, hex_text
	mov	dword edx, 9
	int	byte  0x80
	ret

section .data
hex_text:		db	"ebx: "
hex_number:		db	"00h", 10



;****************************************************************************
;* some text
;****************************************************************************
endless_loop:		db	"No endless loop here!", 10
hello_world_1:		db	"Hello World!", 10
hello_world_2:		db	"This code was modified!", 10
hello_world_3:



;****************************************************************************
;* smc_error
;****************************************************************************
section .text
smc_error:
	xor	dword eax, eax
	inc	dword eax
	mov	dword ebx, eax
	int	byte  0x80



;****************************************************************************
;* smc_end
;****************************************************************************
section .text
smc_end:
	xor	dword eax, eax
	xor	dword ebx, ebx
	inc	dword eax
	int	byte  0x80
smc_end_1:
;*********************************************** linuxassembly@unusedino.de *