Security under Linux : the Buffer Overflow Problem

Disclaimer

The programs and methods listed below are only intended for intellectual purpose, better programming methods, and helping in understanding some bugs. You can use them on your system at your own risks, provided you are responsible for this system and/or you have the authorization from your administrator. If you are looking for programs to crack sites, go away and take them from other sites. Under no circumstance I could be liable for any misuse of what is explained here.

Description of the methods

The problem comes from the fact that many programmers often use parameter parsing code like this:

   void parse(char *arg) {
     char param[1024];
     int  localdata;

     strcpy(param,arg);
     .../...
     return;
   }

   main(int argc, char **argv) {
     parse(argv[1]);
     .../...
   }

or even

   main(int argc, char **argv) {
     char param[1024];
     int index=0;

     while ((param[index++]=getchar())!=EOF);
.../...
   }

As you see, no check is made on the string length. Lazy programmers think "bahh, no one gives a file name which is 1024 chars long. And if it does, he will have a Segmentation Fault". Good programmers dont think like that, but sometimes they quickly add a little feature to one of their programs to help them debug, but when it works well, they forget to add all the necessary boundary tests.

If you don't understand very well the problem, here's a little explaination:
Under Linux (and nearly all Intelx86-based systems), the stack goes up-to-down. That means that when a program calls a function, the return address is pushed and the stack pointer (ESP) is decremented by 4 (32 bits). Then, if a function define local variables (like "param" in the examples), the array is defined below the return address of the function. When executing strcpy() in the first program, the stack looks approximately like this:

     [-empty-] [-localdata-] [-param-] [-return address-] [-main stack-] [-exit addr-]
     <-------> <-----------> <-------> <----------------> <------------> <----------->
      N bytes     4 bytes    1024 bytes     4 bytes           M bytes       4 bytes
              ^
              |_ Current ESP when calling strcpy()

Now you should understand that if you fill with *EXACTLY* 1024 bytes containing the necessary code to execute a shell, AND you add a 4-byte pointer to the beginning of your code inside , then when the function returns, it branches directly to your code inside .

So I wrote a very little (20 bytes) "assembler-level execlp() call" and tried to overflow many buffers (specially the suid-root programs).

My program is a shell script which consists in four parts:

garbage filling to reach the end of a buffer
execlp() assembler coding
choose the executable to be run by execlp (typically /bin/sh or /bin/id)
calculation of an approximate stack value

All these parts generate data on their output which can then be concatenated and given as a parameter to an executable.

Garbage filler

The garbage filler fills the beginning of the buffer you want to overflow with nearly any data. In reality, you must know exactly what you put here because as you will see below, the stack calculation isn't very precise and the program you are executing can branch inside the garbage. The first idea is to use NOP encoding (0x90), but you'll notice that this is a 8-bit code and it is not printable so this is not very easy for the tests. Better use another opcode which does something harmless: INC EAX (0x41 = 'A'). EAX is not used by the execlp() code so that's not a problem, and it's sometimes usefull to select the 'A' chain on the screen with a mouse and paste it anywhere else !
The garbage filler I use accepts one parameter which is the number of bytes you want to fill. This is the most important parameter of the tester because it determines how the code will be aligned relatively to the stack pointer. The return pointer is the ONLY parameter that MUST go out of the buffer. The only output is a long 'AAA...AAA' chain which length is the number you requested.

execlp() assembler coding

To detect the maximum of bugs, you must have the smallest execlp() code so that it can fit in smaller buffers. This one is only 20 bytes long ! To make it so small, I've used the fact that we are executing in the stack so we already know where our parameters are (relative to ESP, of course !). To execlp(), you have to call INT 80h with EAX=0bh, EDX and ECX pointing to argv, and EBX pointing to the executable you want to run, terminated by a zero. First, if argv[0]=NULL, that's not a problem because that means the executable won't have a name. No matter. That means that argv can point to the 0x00 terminating the executable name if it's 32-bit. The executable name must be given just after the code. The stack looks like this:

  before the RET:
      [garbage] [program] [progname] [stack pointer]
      <---N---> <---20--> <---X----> <------4------>
                                    ^_ ESP

  after the RET:
      [garbage] [program] [progname] [stack pointer]
      <---N---> <---20--> <---X----> <------4------>
                                                    ^_ ESP

  before int 80:
     [garbage] [program] [progname] [0000]
     <---N---> <---20--> <---X----> <--4->
                        ^          ^_ ECX=EDX -> NULL
                        |_ EBX -> prog name

The problem is that you must know the program name length (7) to make EBX point to it because it is relative to ESP. Another way would be to CALL IP+5 and pop EBX, and then add the code length to ebx to calculate the program address. But at this time, this has not been used yet. The zero is simply pushed into the stack. No need to explicitly write at an indirect address. The code follows:

    mov ecx,esp
    xor eax,eax
    push eax
    lea ebx,[esp-7]
    add esp,12
    push eax
    push ebx
    mov edx,ecx
    mov al,11
    int 0x80

Get the assembler version and/or the binary version.

Choose the executable to be run

The executable name is simply passed by an 'echo -n "/tmp/sh"'. The reason for using "/tmp/sh" instead of "/bin/sh" is that you can copy any program in /tmp/sh (I usually use /usr/bin/id) which helps redirecting to a log file, and doesn't present the risk that someone gets a shell on your console when you make it run as a loop. Moreover, sometimes you get a program executed without root privileges. This is a "semi-bug": a buffer is overflowed, but not in suid parts. In this case, /usr/bin/id is more interesting than /bin/sh to discover well what's happening.

Stack pointer calculation

To calculate the stack pointer that the program will use, I use the fact that when you run a program immediately after another one, they are called from the same function in the shell, so with the same ESP value. That make it possible to have a program which calculates ESP before executing the "victim program". My program also provides an option for subtracting a value to the actual stack pointer, and return that in a binary form. (it gives the stack value twice because this sometimes helps finding faster). This value isn't very important because you just need to make the program branch into the garbage preceeding the execlp().

How to prevent this from being used on your system

The only way to be secure is to be faster than the crackers. You should have a mailing list or web addresses which are often updated and quickly test all the new bugs announced.
Get my old package and GENOVEX, the new one , and test them on every suid-root executable you have. To find them, first make a list and print it:

  (echo "SUID-ROOT LIST" ; find / -user root -type f -perm -4000) | lpr

Also search for root-owner/world writable files and directories:

  (echo "WORLD WRITABLE LIST" ; find / -user root -perm -022) | lpr

For EACH suid-root executable you find, read its man and try my scripts with ALL options and parameters. For example, this test will succeed on not-so-older LPR (as in slackware 3.1).

   ./tryall.generic lpr -C
   ./tryall.generic lpr -J

You may get lots of Segmentation Faults. If a program gives you a core or a Segmentation Fault, this means it has a bug anyway and it is risky. Note it on your paper-list, you'll try it later. Once you have noticed some risky programs, modify the script to try to adapt values to begin with a normal behaviour, and get the segmentation fault after. In this case, you could get a shell. Sometimes, you have a shell very late after the Seg Faults.
I *STRONGLY* recommend that every program that gives a Segmentation Fault be CHMODed to 755 so that it won't be suid anymore. Note that even if you can't succeed in making it give you a shell, 2 things are possible:

someone has more patience than you an make it work;
someone uses it badly so that it makes you system hang.

All the programs that seem buggy should be replaced. You should allways get the sources and compile them yourself.
I give a list of addresses to consult at the bottom of this page. You can use them to search correction and patches for your buggy programs.

What to do in your programs

Each time you use strcpy(), you should replace it by strncpy() or test the length of the string you are about to copy. The easier way to do this is to define MACROs for strcpy(), fread(), read(), memcpy(), bcopy() ... that test the argument size each time it is possible. When it is difficult to make these tests, better malloc() the buffer instead of letting them in your local variables. With a malloc(), there's no risk because the data there will never be executed. If you don't want to malloc(), try defining your buffers globally. They will be in data sections so once more, it will be impossible to execute them. After that, you could try my program on yours to verify if there's no risk. A simpler way for that is to generate a long string with 'rpt' and pass it to your program. If this make it hang, review it.

What could be done in Linux

Intel CPUs provide two things to make this bug harder or impossible to use:

it is possible to choose the direction of the stack: ascending or descending. When it is ascending, you can fill it as long as you want, the return addresses are before your first character so you can't overwrite it. At first, I thought it was really impossible to do anything with this method, but Aleph One gave me an example where it was still possible. In general, when a function which contains its own buffer calls another one with the pointer on that buffer, then the return pointer of this last function can be overwritten. I admit this is more difficult, but it is possible.
the stack segment can be marked read-write-noexec. That should really be enough, because if the return pointer pointed to a stack area, this would generate a segmentation fault and nothing more. After reading BugTraq postings, I can say that this is very difficult because some Linux mechanisms rely on this feature (signals, trampoline gcc). A patch exists for the kernel to deny execution in the stack, but the author explains this could cause problems with some *very rare* programs such as SuperProbe which, in fact, do not need to run as suid root.

Willy Tarreau - 1997-1998
willy@w.ods.org