Government Security
Network Security Resources

Jump to content

Photo

How A Program Works....

- - - - - security windows unix buffer overflow bug exploit beginner programming assembly
  • Please log in to reply
3 replies to this topic

#1 Thiseas

Thiseas

    Private

  • Members
  • 13 posts

Posted 12 October 2009 - 08:29 AM

This is not a buffer overflow exploit, but a required background that will help to understand how CPU & memory "collaborate" each other to execute a program.  I have read many articles about 'buffer overflow'. Most of them starting from a specific point by 'stowing' the basic knowledge one must have to deeply understand what is going on (behind the scenes). I wrote this article to cover (I hope) this gap.  If at the end of this article you feel more comfortable with concepts like CALL, RETN and how a function is executed using the memory (buffer, stack, et.) then I will consider this article as a successful one...
First, I would like to point out that everything we say, is about the processor xx86 family. In addition, most memory addresses are expressed in a decimal notation (for the shake of clarity, for beginners) instead of hexadecimal that actually represented by real world software systems.

Requirements in order to read this article:
1. A basic understanding of assembly language.
2. A basic understanding of C language.

Every process starts in a computer memory (RAM – Random Access Memory) in three basic segments:
  •    -Code Segment
  •    -Data Segment (the well known BSS)
  •    -Stack Segment
CODE SEGMENT
In this memory segment, "live" all instructions of our program. Nobody... (nobody? well OK, almost nobody) can write to this memory segment i.e. is a read only segment.
For example
All assembly instructions (in C code here) are located in code segment:

   /*Set the 1st diagonal items to 1 otherwise 0 */
   for (i = 0; i < 100; i++)
         for (j = 0; j < 100; j++)
            if (i<>j)
             a[i][j] = 0
            else
             a[i][j] = 1;
PS: The remarks /*...*/ are not included... in the data segment. The compiler does not produce code for the remarks.

DATA SEGMENT
All initialized or un-initialized global variable are stored in this non-read only segment.
For example:
   int i;
   int j = 0;
   int a[100][100];

STACK SEGMENT
All function variables, return addresses and function addresses are stored in this non-read-only memory.
This segment is actually a stack data structure (for those that have attended a basic information technology course). This, actually means, that we put variables in a stack in memory. The last putted (or pushed) variable is in the top on stack i.e. the first available. The well known LIFO (Last In First Out) data structure.
The processor register ESP (Extended Stack Pointer) is used to keep the address of the first current available element of the stack.
In the stack: we can put (PUSH) and get (POP) values.
There are two important “secrets” here:
  • PUSH and POP instructions are done in 4-byte-units because of the 32bit architecture of xx86 processors family.
  • Stack grows downward, that is, if SP=256, just after a “PUSH 34” instruction, SP will become 252 and the value of EAX will be placed on address 252.
For example:

   STACK
   adrs      memory
   ---- ------------------
   256    |   xy          |
   252    |               |
   248    |               |
   244    |               |
   ...    .................
   (ESP=256)
   
   Instruction > PUSH EAX    ; remark: suppose EAX = 34 

   STACK
   256    |   xy          |
   252    |   34          |
   248    |               |
   244    |               |
   ...    .................
   (ESP=252)

   Instruction > POP EAX   ; remark: Get the value from the stack into EAX register
  
   STACK
   256    |   xy          |
   252    |   34          |
   248    |               |
   244    |               |
   ...    .................
   (ESP=256)

   Instruction > PUSH 15    ; remark: suppose EAX = 15 
   Instruction > PUSH 16    ; remark: suppose EBX = 16 

   STACK
   256    |   xy          |
   252    |   15          |
   248    |   16          |
   244    |               |
   ...    .................
   (ESP=248)
  

What is behind a function-call
Before we explain what is behind, we must say a few words about the EIP (Extended Instruction Pointer or simple 'Instruction pointer'). This register keeps the code segment address of the instruction that will be executed by the CPU.
Every time CPU executes an instruction stores into EIP the address of the instruction that follows the currently executed.
But, how does CPU find the address of the next instruction?
Well... we have two cases here...
  • The address is immediately after the instruction currently executed.
  • There is a 'JMP' (jump, i.e. a function call) so the instruction that needs to be executed next is in an address which is not next to the current.
In case 1 the address is calculated by simply add the Length of the currently executed instruction to the current EIP value.
Example:
Suppose we have the following 2 instruction to the addresses 100, 101

   100 push EDX
   101 mov  ESP 0

   Suppose that at the starting point of our little program we have: EIP = 100
   CPU executes the instruction at address 100.
   CPU checks the instruction:
   Is it a JUMP? No, so calculate its size. CPU knows that the push instruction is 1 byte long.
   So,... the new value of
   EIP = EIP + size(push EDX) =>
   EIP = 100 + 1 =>
   EIP = 101     
   So,.... CPU executes the instruction at address 101, and so forth...

In case 2, we have a jump... things are a bit more different.
Actually, just before we JMP to another address (i.e. call a function), we save the address of the next instruction in a temporary register, say in EDX; and before returning from the function we write the address in EDX to EIP back again.
   CALL and RETN assembly instructions are used ... by the CPU to calculate the above addresses:
   The CALL is used to do 2 things:
   1. To "remember" the next instruction that will be executed after function returns (by pushing its address to the stack) and
   2. To write into the EIP the address of the calling function i.e. to perform the function call.
   The RETN instruction is called at the end of the function:
   It pops (gets) the "return address" that CALL pushes into the stack to continue the execution after the end of the function.
  

The Base pointer (EBP)
Each function in any program (even the main() function in C) has its own stack frame. A stack frame is a logical group of consecutive variables in the stack that keeps variables and addresses for every function that is currently executed.
Every address in the stack’s frame is a relative address. That means, we address the locations of data in our stack in relative to some criterion. And this criterion is EBP, which is the acronym for Extended Base Pointer.
EBP has the stack pointer of the caller function. We PUSH the old ESP to the stack, and utilize another register,named EBP to relatively reference local variables in the callee function.
I hope the use of the base pointer will be more clear in the following example.


A REAL EXAMPLE C PROGRAM:
Consider the following C program:
   void function1(int , int , int );
   void main()
   {
       function1 (1, 2, 3);
   }
   void function1 (int a, int b, int c)
   {
           char z[4];
   } 
I compile/link the above program and I use the olly debugger to check the assembly code created.
Bypassing the operating systems instructions (which is the 90% of the assembly code) the rest is the code that corresponds to our little program:

   0040123C  /. 55             PUSH EBP
   0040123D  |. 8BEC           MOV EBP,ESP
   0040123F  |. 6A 03          PUSH 3                ; /Arg3 = 00000003
   00401241  |. 6A 02          PUSH 2                ; |Arg2 = 00000002
   00401243  |. 6A 01          PUSH 1                ; |Arg1 = 00000001
   00401245  |. E8 05000000    CALL bo1.0040124F     ; \bo1.0040124F
   0040124A  |. 83C4 0C        ADD ESP,0C
   0040124D  |. 5D             POP EBP
   0040124E  \. C3             RETN
   0040124F  /$ 55             PUSH EBP
   00401250  |. 8BEC           MOV EBP,ESP
   00401252  |. 51             PUSH ECX
   00401253  |. 59             POP ECX
   00401254  |. 5D             POP EBP
   00401255  \. C3             RETN 

ANALYSIS:
   The addresses from 0040123C to 0040124E is the main() function.
   The addresses from 0040124F to 00401255 is the function1() function.

   0040123C  /. 55             PUSH EBP
   Backs up the old stack pointer. It pushes it onto the stack.

   0040123D  |. 8BEC           MOV EBP,ESP
   Copy the old stack pointer to the ebp register
   From then on, inside the function, we'll reference function's local variables with EBP.  These two instructions are called the Procedure Prologue.

   The stack has the EBP value:
   [ebp]
   STACK
   256    |   [ebp]       |
   ...    .................
   (ESP=256) 

   0040123F  |. 6A 03          PUSH 3             ; /Arg3 = 00000003
   00401241  |. 6A 02          PUSH 2             ; |Arg2 = 00000002
   00401243  |. 6A 01          PUSH 1             ; |Arg1 = 00000001
   
   Here we put the arguments into the stack
   The stack is:
   STACK
   256    |   [ebp]       |
   252    |     3         |
   248    |     2         |
   244    |     1         |
   ...    .................
   (ESP=244)

   00401245  |. E8 05000000    CALL bo1.0040124F    ; \bo1.0040124F
   call the function at address 0040124F. bo1 is the name of my executable.
   The stack becomes:
   STACK
   256    |   [ebp]       |
   252    |     3         |
   248    |     2         |
   244    |     1         |
   240    |  0040124A     | <- the return address when the function1 ends.
   ...    .................
   (ESP=240)

Let’s follow the execution, so go to address 0040124F (the function1):
   0040124F  /$ 55             PUSH EBP
   00401250  |. 8BEC           MOV EBP,ESP
Hmm... this is the Procedure Prologue again (remember this must be executed in every function). It sets up its own stack frame. The EBP register is currently pointing at a location in mains stack frame. This value must be preserved. So, EBP is pushed onto the stack. Then the contents of ESP is transferred to EBP. This allows the arguments to be referenced as an offset from EBP and frees up the stack register ESP to do other things.
The stack now, is:

   STACK
   256    |   [ebp]       |
   252    |     3         |
   248    |     2         |
   244    |     1         |
   240    |  0040124A     | <- the return address when the function1 ends.
   236    |  <main’s EBP> | <- Note that ESP=EBP indicates this address.
   ...    .................
   (ESP=236)

   00401253  |. 59             POP ECX
   00401254  |. 5D             POP EBP

   After two pops the actual stack becomes:
   STACK
   256    |   [ebp]       |
   252    |     3         |
   248    |     2         |
   244    |     1         |
   ...    .................
   (ESP=244)

   00401255  \. C3             RETN 
   The function ends and returns to the 0040124A (remember our definition of the RET instruction).
   0040124A  |. 83C4 0C        ADD ESP,0C
After the function RETurned, we add 12 or 0C in hex (since we pushed 3 args into the stack, each allocating 4 bytes (integers)) into Stack Pointer. Increasing the ESP we actually decreasing the stack (remember that we fill stack downwards from high to low memory addresses i.e. ESP = 244 + 12 = 256).

   STACK
   256    |   [ebp]       |
   ...    .................
   (ESP=256)
Thus, the ESP has the value that has at the first step of the programs execution before the function call.
I hope that you get a basic understanding of the use of Stack and Stack Pointer.
In another article I will describe how nasty things can happen here. Hint: How about overwriting the stack item (at address 240 in our example above) or how about overwriting  the value of the Instruction Pointer (EIP)...

I suggest you to try my little program or better create your own and test, check, review, test, check, review, test, check, review!!


Happy Programming Guys!!
  

References:
[1] BUFFER OVERFLOWS DEMYSTIFIED by murat@enderunix.org
[2] C Function Call Conventions and the Stack (UMBC CMSC 313, Computer Organization & Assembly Language, Spring 2002, Section 0101)
[3] The Assembly Language Book for IBM PC by Peter Norton (ISBN 960-209-028-6)
[4] Analysis of Buffer Overflow Attacks from http://www.windowsec...ow_Attacks.html
[5] 8088 8086 Programming and Applications for IBM PC/XT & Compatibles by Nikos Nasoufis
when you 've got a hammer everything starts to look like a nail...

#2 Gensou

Gensou

    Private

  • Members
  • 1 posts

Posted 29 November 2009 - 10:17 PM

Did you write this? I could have sworn that I've seen this on another forum somewhere before. Anyways, thanks. I suggest everybody to read this, very informative.

#3 Thiseas

Thiseas

    Private

  • Members
  • 13 posts

Posted 05 December 2009 - 01:38 PM

Did you write this? I could have sworn that I've seen this on another forum somewhere before. Anyways, thanks. I suggest everybody to read this, very informative.

Both are true.


I have posted this (in English) in hellbound a couple of years ago...

In addition I have, also, posted it (in greek) to the greek security forum p0wnbox.com and publish it to a Greek Magazine about security using the same nickname...


Btw, thanks for your remarks.
when you 've got a hammer everything starts to look like a nail...

#4 cyb3rl0rd1867

cyb3rl0rd1867

    Private

  • Members
  • 2 posts

Posted 01 January 2010 - 08:15 AM

Excellent tut. Very well written.





Also tagged with one or more of these keywords: security, windows, unix, buffer overflow, bug, exploit, beginner, programming, assembly