I found this article and searched here and didnt find a posting on it. Thought some people would be interested.
Multi-stage loading shellcode
Introduction
Multi-stage loading is based on the idea that if some small snippet of shellcode can be executed on target, in theory it is possible that this code can take over the network connection and transfer and execute another shellcode, or maybe even a full-sized executable binary. The key idea is to divide the loading of the final program in stages. That way, the only size-critical piece of code is the first stage loader. This concept has been introduced in great detail in the excellent paper Understanding Windows Shellcode
Stage 1 - shellcode loader
This is the most critical part in the loading chain. This code must be small, free of NULLs and must be able to execute on the various platforms. It's main function is to locate the socket file descriptor used in the connection and read additional data through it. When finished, the loader jumps to the data and the CPU starts executing it.
Stage 2 - binary loader
If the desired result can be achieved with additional shellcode, the first stage loader might be enough. But there may be a need to run more complex code on the target machine and higher-level language such as C is much more comfortable is such situations. Unfortunately, it is not possible to code in C/C++/Java/whatever without losing the size and efficiency of assembly code. One solution to this problem is to use another loader that is executed by the first loader. This 2nd stage loader is then responsible for loading and executing the actual executable binary. This all happens in a single TCP (or UDP) session and that is one of the most beatiful side-effects of the approach. It gets through almost any packet filtering setup. Of course, the executable payload and the loaders shellcode payload might get caught by some content inspection device, but that's another story.
It should be noted that there is conceptually no problem in loading the binary in the 1st stage. It just might not be practical because of the increased payload size. That is why I have chosen this 2-stage approach.
POC: nload
nload is capable of both loading methods described above. It is a C-program that is responsible for coordination of the loading stages. The package includes source code for nload and ASM sources for the loaders. It also includes a demonstration server "srv.exe" that does nothing but reads a code from TCP socket and executes it. nload works with Unix/Win32, but the loaders are available only for Windows NT/2000/XP.
Without further talk, let us jump in to an example session. Here we have a Linux machine that is targeting to a Windows machine with IP 10.10.10.10:
$ ./nload 10.10.10.10 1111 -s example.s
nload v1.0 - load staged shellcode over network connection
Copyright © Jarkko Turkulainen 2004. All rights reserved.
Sending loader... 230 bytes sent
Sending shellcode... 204 bytes sent
Microsoft Windows XP [Version 5.1.2600]
© Copyright 1985-2001 Microsoft Corp.
C:\>
In the target machine, the demonstration server "srv.exe" looks like
C:\>srv 1111
Oops.. I'm 0wned.
Ok now.. Here's what happened:
In the 1st stage, nload sends the 1st stage loader to target machine and executes it (Sending loader... 222 bytes sent)
In the 2nd stage, nload sends the actual shellcode (example.s) and executes it (Sending shellcode... 204 bytes sent)
After that, nload just waits for a shell to appear (that is all what example.s does - execute cmd.exe and redirect the IO to already established TCP connection)
This example is very simplistic because the server "srv.exe" is not really exploited, all that it takes is a working shellcode and aligment header "head.s" that contains a couple of NOPs. In a real world situation, it doesn't work that easy. It should be also noted that example.s works only if the host process has opened the socket with WSASocket winsock API call (and that is exactly how it is done in "srv.exe").
Another example:
$ ./nload 10.10.10.10 1111 -b backdoor.exe exp_head.s exp_tail.s
nload v1.0 - load staged shellcode over network connection
Copyright © Jarkko Turkulainen 2004. All rights reserved.
Sending loader... 222 bytes sent
Sending 2nd loader... 328 bytes sent
Sending payload...
This example is just a bit more complicated:
"exp_head.s" and "exp_tail.s" are used for 1st stage shellcode aligment instead of default "head.s" and "tail.s"
In the 1st stage, nload sends the 1st stage loader to target machine and executes it (Sending loader... 222 bytes sent)
In the 2nd stage, nload sends the 2nd stage loader and executes it (Sending 2nd loader... 328 bytes sent)
2nd stage loader reads in the executable payload "backdoor.exe" and executes it
Some considerations
In the current implementation of nload, the socket file descriptor is searched by comparing the current TCP connection's source port with that of the returned by getpeername() for each valid file descriptor. This search may fail if the client is making the connection behind NAT device. To be more precise, the NAT device may alter the source port and IP address of the client. Most NAT implementations do that, it is called source NAT, PAT, hide NAT, or whatever. And because the original source port is hard-coded in the loader shellcode, the correct file descriptor is never found. One solution for this NAT problem may be implementing a simple protocol where the client sends some authentication data to 1st stage loader which loops through all file descriptors trying to find the data. If successful, the correct descriptor is found, no matter how much the original network headers has been altered.
Another annoying limitation currently in nload is that in binary mode the executable payload is saved on disk before execution. nload tries to delete the binary, but it's still there. Unfortunately, there is no trivial way to execute PE binary in memory. I'm very pleased if someone proves me wrong in this matter.
Download
nload 1.0
Credits
Thanks to Matt Miller <mmiller[at]hick.org> for proof-reading the document and giving me valuable feedback.
Feedback
Bug reports, discussion, etc.: Jarkko Turkulainen . jt@klake.org




