Preventing the Exploitation of SEH Overwrites
9/2006
skape
mmiller@hick.org
    

1) Foreword

Abstract: This paper proposes a technique that can be used to prevent
the exploitation of SEH overwrites on 32-bit Windows applications
without requiring any recompilation.  While Microsoft has attempted to
address this attack vector through changes to the exception dispatcher
and through enhanced compiler support, such as with /SAFESEH and /GS,
the majority of benefits they offer are limited to image files that have
been compiled to make use of the compiler enhancements. This limitation
means that without all image files being compiled with these
enhancements, it may still be possible to leverage an SEH overwrite to
gain code execution.  In particular, many third-party applications are
still vulnerable to SEH overwrites even on the latest versions of
Windows because they have not been recompiled to incorporate these
enhancements. To that point, the technique described in this paper does
not rely on any compile time support and instead can be applied at
runtime to existing applications without any noticeable performance
degradation.  This technique is also backward compatible with all
versions of Windows NT+, thus making it a viable and proactive solution
for legacy installations.

Thanks: The author would like to thank all of the people who have helped
with offering feedback and ideas on this technique.  In particular, the
author would like to thank spoonm, H D Moore, Skywing, Richard Johnson,
and Alexander Sotirov.


2) Introduction

Like other operating systems, the Windows operating system finds itself
vulnerable to the same classes of vulnerabilities that affect other
platforms, such as stack-based buffer overflows and heap-based buffer
overflows.  Where the platforms differ is in terms of how these
vulnerabilities can be leveraged to gain code execution.  In the case of
a conventional stack-based buffer overflow, the overwriting of the
return address is the most obvious and universal approach. However,
unlike other platforms, the Windows platform has a unique vector that
can, in many cases, be used to gain code execution through a stack-based
overflow that is more reliable than overwriting the return address.
This vector is known as a Structured Exception Handler (SEH) overwrite.
This attack vector was publicly discussed for the first time, as far as
the author is aware, by David Litchfield in his paper entitled Defeating
the Stack Based Buffer Overflow Prevention Mechanism of Microsoft
Windows 2003 Server However, exploits had been using this technique
prior to the publication, so it is unclear who originally found the
technique.

In order to completely understand how to go about protecting against SEH
overwrites, it's prudent to first spend some time describing the
intention of the facility itself and how it can be abused to gain code
execution. To provide this background information, a description of
structured exception handling will be given in section 2.1.  Section 2.2
provides an illustration of how an SEH overwrite can be used to gain
code execution. If the reader already understands how structured
exception handling works and can be exploited, feel free to skip ahead.
The design of the technique that is the focus of this paper will be
described in chapter  3 followed by a description of a proof of concept
implementation in chapter 4. Finally, potential compatibility issues are
noted in chapter 5.


2.1) Structured Exception Handling


Structured Exception Handling (SEH) is a uninform system for dispatching
and handling exceptions that occur during the normal course of a
program's execution.  This system is similar in spirit to the way that
UNIX derivatives use signals to dispatch and handle exceptions, such as
through SIGPIPE and SIGSEGV.  SEH, however, is a more generalized and
powerful system for accomplishing this task, in the author's opinion.
Microsoft's integration of SEH spans both user-mode and kernel-mode and
is a licensed implementation of what is described in a patent owned by
Borland.  In fact, this patent is one of the reasons why open source
operating systems have not chosen to integrate this style of exception
dispatching.

In terms of implementation, structured exception handling works by
defining a uniform way of handling all exceptions that occur during the
normal course of process execution.  In this context, an exception is
defined as an event that occurs during execution that necessitates some
form of extended handling.  There are two primary types of exceptions.
The first type, known as a hardware exception, is used to categorize
exceptions that originate from hardware.  For example, when a program
makes reference to an invalid memory address, the processor will raise
an exception through an interrupt that gives the operating system an
opportunity to handle the error. Other examples of hardware exceptions
include illegal instructions, alignment faults, and other
architecture-specific issues.  The second type of exception is known as
a software exception.  A software exception, as one might expect,
originates from software rather than from the hardware.  For example, in
the event that a process attempts to close an invalid handle, the
operating system may generate an exception.

One of the reasons that the word structured is included in structured
exception handling is because of the fact that it is used to dispatch
both hardware and software exceptions.  This generalization makes it
possible for applications to handle all types of exceptions using a
common system, thus allowing for greater application flexibility when it
comes to error handling.

The most important detail of SEH, insofar as it pertains to this
document, is the mechanism through which applications can dynamically
register handlers to be called when various types of exceptions occur.
The act of registering an exception handler is most easily described as
inserting a function pointer into a chain of function pointers that are
called whenever an exception occurs.  Each exception handler in the
chain is given the opportunity to either handle the exception or pass it
on to the next exception handler.

At a higher level, the majority of compiler-generated C/C++ functions
will register exception handlers in their prologue and remove them in
their epilogue. In this way, the exception handler chain mirrors the
structure of a thread's stack in that they are both LIFOs
(last-in-first-out).  The exception handler that was registered last
will be the first to be removed from the chain, much the same as last
function to be called will be the first to be returned from.

To understand how the process of registering an exception handler
actually works in practice, it makes sense to analyze code that makes
use of exception handling. For instance, the code below illustrates what
would be required to catch all exceptions and then display the type of
exception that occurred:


__try
{
    ...
} __except(EXCEPTION_EXECUTE_HANDLER)
{
    printf("Exception code: %.8x\n", GetExceptionCode());
}

In the event that an exception occurs from code inside of the try / except
block, the printf call will be issued and GetExceptionCode will return the
actual exception that occurred.  For instance, if code made reference to an
invalid memory address, the exception code would be 0xc0000005, or
EXCEPTION_ACCESS_VIOLATION. To completely understand how this works, it is
necessary to dive deeper and take a look at the assembly that is generated from
the C code described above.  When disassembled, the code looks something like
what is shown below:


00401000 55               push    ebp
00401001 8bec             mov     ebp,esp
00401003 6aff             push    0xff
00401005 6818714000       push    0x407118
0040100a 68a4114000       push    0x4011a4
0040100f 64a100000000     mov     eax,fs:[00000000]
00401015 50               push    eax
00401016 64892500000000   mov     fs:[00000000],esp
0040101d 83c4f4           add     esp,0xfffffff4
00401020 53               push    ebx
00401021 56               push    esi
00401022 57               push    edi
00401023 8965e8           mov     [ebp-0x18],esp
00401026 c745fc00000000   mov     dword ptr [ebp-0x4],0x0
0040102d c6050000000001   mov     byte ptr [00000000],0x1
00401034 c745fcffffffff   mov     dword ptr [ebp-0x4],0xffffffff
0040103b eb2b             jmp     ex!main+0x68 (00401068)
0040103d 8b45ec           mov     eax,[ebp-0x14]
00401040 8b08             mov     ecx,[eax]
00401042 8b11             mov     edx,[ecx]
00401044 8955e4           mov     [ebp-0x1c],edx
00401047 b801000000       mov     eax,0x1
0040104c c3               ret

0040104d 8b65e8           mov     esp,[ebp-0x18]
00401050 8b45e4           mov     eax,[ebp-0x1c]
00401053 50               push    eax
00401054 6830804000       push    0x408030
00401059 e81b000000       call    ex!printf (00401079)
0040105e 83c408           add     esp,0x8
00401061 c745fcffffffff   mov     dword ptr [ebp-0x4],0xffffffff
00401068 8b4df0           mov     ecx,[ebp-0x10]
0040106b 64890d00000000   mov     fs:[00000000],ecx
00401072 5f               pop     edi
00401073 5e               pop     esi
00401074 5b               pop     ebx
00401075 8be5             mov     esp,ebp
00401077 5d               pop     ebp
00401078 c3               ret


The actual registration of the exception handler all occurs behind the scenes
in the C code.  However, in the assembly code, the registration of the
exception handler starts at 0x0040100a and spans four instructions.  It is
these four instructions that are responsible for registering the exception
handler for the calling thread.  The way that this actually works is by
chaining an EXCEPTION_REGISTRATION_RECORD to the front of the list of exception
handlers.  The head of the list of already registered exception handlers is
found in the ExceptionList attribute of the NT_TIB structure.  If no exception
handlers are registered, this value will be set to 0xffffffff.  The NT_TIB
structure makes up the first part of the TEB, or Thread Environment Block,
which is an undocumented structure used internally by Windows to keep track of
per-thread state in user-mode.  A thread's TEB can be accessed in a
position-independent fashion by referencing addresses relative to the fs
segment register.  For example, the head of the exception list chain be be
obtained through fs:[0].

To make sense of the four assembly instructions that register the custom
exception handler, each of the four instructions will be described
individually.  For reference purposes, the layout of the
EXCEPTION_REGISTRATION_RECORD is described below:


   +0x000 Next             : Ptr32 _EXCEPTION_REGISTRATION_RECORD
   +0x004 Handler          : Ptr32


1. push 0x4011a4

   The first instruction pushes the address of the CRT generated excepthandler3
   symbol.  This routine is responsible for dispatching general exceptions that
   are registered through the except compiler intrinsic.  The key thing to note
   here is that the virtual address of a function is pushed onto the stack that is
   excepted to be referenced in the event that an exception is thrown.  This push
   operation is the first step in dynamically constructing an
   EXCEPTION_REGISTRATION_RECORD on the stack by first setting the Handler
   attribute.

2. mov  eax,fs:[00000000]

   The second instruction takes the current pointer to the first
   EXCEPTION_REGISTRATION_RECORD and stores it in eax.

3. push eax

   The third instruction takes the pointer to the first exception registration
   record in the exception list and pushes it onto the stack.  This, in turn, sets
   the Next attribute of the record that is being dynamically generated on the
   stack.  Once this instruction completes, a populated
   EXCEPTION_REGISTRATION_RECORD will exist on the stack that takes the following
   form:


   +0x000 Next             : 0x0012ffb0
   +0x004 Handler          : 0x004011a4     ex!_except_handler3+0


4. mov  fs:[00000000],esp

   Finally, the dynamically generated exception registration record is stored as
   the first exception registration record in the list for the current thread.
   This completes the process of inserting a new registration record into the
   chain of exception handlers.


The important things to take away from this description of exception handler
registration are as follows.  First, the registration of exception handlers is
a runtime operation.  This means that whenever a function is entered that makes
use of an exception handler, it must dynamically register the exception
handler.  This has implications as it relates to performance overhead.  Second,
the list of registered exception handlers is stored on a per-thread basis.
This makes sense because threads are considered isolated units of execution and
therefore exception handlers are only relative to a particular thread.  The
final, and perhaps most important, thing to take away from this is that the
assembly generated by the compiler to register an exception handler at runtime
makes use of the current thread's stack.  This fact will be revisited later in
this section.

In the event that an exception occurs during the course of normal execution,
the operating system will step in and take the necessary steps to dispatch the
exception.  In the event that the exception occurred in the context of a thread
that is running in user-mode, the kernel will take the exception information
and generate an EXCEPTION_RECORD that is used to encapsulate all of the
exception information.  Furthermore, a snapshot of the executing state of the
thread is created in the form of a populated CONTEXT structure.  The kernel
then passes this information off to the user-mode thread by transferring
execution from the location that the fault occurred at to the address of
ntdll!KiUserExceptionDispatcher.  The important thing to understand about this
is that execution of the exception dispatcher occurs in the context of the
thread that generated the exception.

The job of ntdll!KiUserExceptionDispatcher is, as the name implies, to dispatch
user-mode exceptions.  As one might guess, the way that it goes about doing
this is by walking the chain of registered exception handlers stored relative
to the current thread.  As the exception dispatcher walks the chain, it calls the
handler associated with each registration record, giving that handler the
opportunity to handle, fail, or pass on the exception.


While there are other things involved in the exception dispatching process,
this description will suffice to set the stage for how it might be abused to
gain code execution.


2.2) Gaining Code Execution

There is one important thing to remember when it comes to trying to gain code
execution through an SEH overwrite. Put simply, the fact that each exception
registration record is stored on the stack lends itself well to abuse when
considered in conjunction with a conventional stack-based buffer overflow.  As
described in section , each exception registration record is composed of a Next
pointer and a Handler function pointer.  Of most interest in terms of
exploitation is the Handler attribute. Since the exception dispatcher makes use
of this attribute as a function pointer, it makes sense that should this
attribute be overwritten with attacker controlled data, it would be possible to
gain code execution.  In fact, that's exactly what happens, but with an added
catch.

While typical stack-based buffer overflows work by overwriting the return
address, an SEH overwrite works by overwriting the Handler attribute of an
exception registration record that has been stored on the stack.  Unlike
overwriting the return address, where control is gained immediately upon return
from the function, an SEH overwrite does not actually gain code execution until
after an exception has been generated.  The exception is necessary in order to
cause the exception dispatcher to call the overwritten Handler.

While this may seem like something of a nuisance that would make SEH overwrites
harder to exploit, it's not.  Generating an exception that leads to the calling
of the Handler is as simple as overwriting the return address with an invalid
address in most cases.  When the function returns, it attempts to execute code
from an invalid memory address which generates an access violation exception.
This exception is then passed onto the exception dispatcher which calls the
overwritten Handler.

The obvious question to ask at this point is what benefit SEH overwrites have
over the conventional practice of overwriting the return address.  To
understand this, it's important to consider one of the common practices
employed in Windows-based exploits. On Windows, thread stack addresses tend to
change quite frequently between operating system revisions and even across
process instances.  This differs from most UNIX derivatives where stack
addresses are typically predictable across multiple operating system revisions.
Due to this fact, most Windows-based exploits will indirectly transfer control
into the thread's stack by first bouncing off an instruction that exists
somewhere in the address space.  This instruction must typically reside at an
address that is less prone to change, such as within the code section of a
binary.  The purpose of this instruction is to transfer control back to the
stack in a position-independent fashion.  For example, a jmp esp instruction
might be used. While this approach works perfectly fine, it's limited by
whether or not an instruction can be located that is both portable and reliable
in terms of the address that it resides at.  This is where the benefits of SEH
overwrites begin to become clear.

When simply overwriting the return address, an attacker is often limited to a
small set of instructions that are not typically common to find at a reliable
and portable location in the address space. On the other hand, SEH overwrites
have the advantage of being able to use another set of instructions that are
far more prevalent in the address space of most every process.  This set of
instructions is commonly referred to as pop/pop/ret.  The reason this class of
instructions can be used with SEH overwrites and not general stack overflows
has to do with the method in which exception handlers are called by the
exception dispatcher.  To understand this, it is first necessary to know what
the specific prototype is for the Handler field in the
EXCEPTION_REGISTRATION_RECORD structure:


typedef EXCEPTION_DISPOSITION (*ExceptionHandler)(
        IN EXCEPTION_RECORD ExceptionRecord,
        IN PVOID EstablisherFrame,
        IN PCONTEXT ContextRecord,
        IN PVOID DispatcherContext);


The field of most importance is the EstablisherFrame.  This field actually
points to the address of the exception registration record that was pushed onto
the stack.  It is also located at [esp+8] when the Handler is called.
Therefore, if the Handler is overwritten with the address of a pop/pop/ret
sequence, the result will be that the execution path of the current thread will
be transferred to the address of the Next attribute for the current exception
registration record.  While this field would normally hold the address of the
next registration record, it instead can hold four bytes of arbitrary code that
an attacker can supply when triggering the SEH overwrite.  Since there are only
four contiguous bytes of memory to work with before hitting the Handler field,
most attackers will use a simple short jump sequence to jump past the handler
and into the attacker controlled code that comes after it.


3) Design

The one basic requirement of any solution attempting to prevent the leveraging
of SEH overwrites is that it must not be possible for an attacker to be able to
supply a value for the Handler attribute of an exception registration record
that is subsequently used in an unchecked fashion by the exception dispatcher
when an exception occurs.  If a solution can claim to have satisfied this
requirement, then it should be true that the solution is secure.

To that point, Microsoft's solution is secure, but only if all of the images
loaded in the address space have been compiled with /SAFESEH.  Even then, it's
possible that it may not be completely secure For example, it should be
possible to overwrite the Handler with the address of some non-image associated
executable region, if one can be found. If there are any images that have not
been compiled with /SAFESEH, it may be possible for an attacker to overwrite
the Handler with an address of an instruction that resides within an
unprotected image.  The reason Microsoft's implementation cannot protect
against this is because SafeSEH works by having the exception dispatcher
validate handlers against a table of image-specific safe exception handlers
prior to calling an exception handler.  Safe exception handlers are stored in a
table that is contained in any executable compiled with /SAFESEH. Given this
limitation, it can also be said that Microsoft's implementation is not secure
given the appropriate conditions.  In fact, for third-party applications, and
even some Microsoft-provided applications, these conditions are considered by
the author to be the norm rather than the exception. In the end, it all boils
down to the fact that Microsoft's solution is a compile-time solution rather
than a runtime solution.  With these limitations in mind, it makes sense to
attempt to approach the problem from the angle of a runtime solution rather
than a compile-time solution.

When it comes to designing a runtime solution, the important consideration that
has to be made is that it will be necessary to intercept exceptions before they
are passed off to the registered exception handlers by the exception
dispatcher.  The particulars of how this can be accomplished will be discussed
in chapter .  Assuming a solution is found to the layering problem, the next
step is to come up with a solution for determining whether or not an exception
handler is valid and has not been tampered with.  While there are many
inefficient solutions to this problem, such as coming up with a solution to
keep a ``secure'' list of registered exception handlers, there is one solution
in particular that the author feels is bested suited for the problem.

One of the side effects of an SEH overwrite is that the attacker will typically
clobber the value of the Next attribute associated with the exception
registration record that is overwritten.  This occurs because the Next
attribute precedes the Handler attribute in memory, and therefore must be
overwritten before the Handler in the case of a typical buffer overflow.  This
has a very important side effect that is the key to facilitating the
implementation of a runtime solution.  In particular, the clobbering of the
Next attribute means that all subsequent exception registration records would
not be reachable by the exception dispatcher when walking the chain.

Consider for the moment a solution that, during thread startup, places a custom
exception registration record as the very last exception registration record in
the chain.  This exception registration record will be symbolically referred to
as the validation frame henceforth.  From that point forward, whenever an
exception is about to be dispatched, the solution could walk the chain prior to
allowing the exception dispatcher to handle the exception.  The purpose of
walking the chain before hand is to ensure that the validation frame can be
reached.  As such, the validation frame's purpose is similar to that of stack
canaries. If the validation frame can be reached, then that is evidence of the
fact that the chain of exception handlers has not been corrupted. As described
above, the act of overwriting the Handler attribute also requires that the Next
pointer be overwritten.  If the Next pointer is not overwritten with an address
that ensures the integrity of the exception handler chain, then this solution
can immediately detect that the integrity of the chain is in question and
prevent the exception dispatcher from calling the overwritten Handler.

Using this technique, the act of ensuring that the integrity of the exception
handler chain is kept intact results in the ability to prevent SEH overwrites.
The important questions to ask at this point center around what limitations
this solution might have.  The most obvious question to ask is what's to stop
an attacker from simply overwriting the Next pointer with the value that was
already there.  There are a few things that stop this.  First of all, it will
be common that the attacker does not know the value of the Next pointer.
Second, and perhaps most important, is that one of the benefits of using an SEH
overwrite is that an attacker can make use of a pop/pop/ret sequence. By
forcing an attacker to retain the value of the Next pointer, the major benefit
of using an SEH overwrite in the first place is gone.  Even conceding this
point, an attacker who is able to retain the value of the Next pointer would
find themselves limited to overwriting the Handler with the address of
instructions that indirectly transfer control back to their code.  However, the
attacker won't simply be able to use an instruction like jmp esp because the
Handler will be called in the context of the exception dispatcher.  It's at
this point that diminishing returns are reached and an attacker is better off
simply overwriting the return address, if possible.

Another important question to ask is what's to stop the attacker from
overwriting the Next pointer with the address of the validation frame itself
or, more easily, with 0xffffffff.  The answer to this is much the same as
described in the above paragraph.  Specifically, by forcing an attacker away
from the pop/pop/ret sequence, the usefulness of the SEH overwrite vector
quickly degrades to the point of it being better to simply overwrite the return
address, if possible.  However, in order to be sure, the author feels that
implementations of this solution would be wise to randomize the location of the
validation frame.

It is the author's opinion that the solution described above satisfies the
requirement outlined in the beginning of this chapter and therefore qualifies
as a secure solution.  However, there's always a chance that something has been
missed.  For that reason, the author is more than happy to be proven wrong on
this point.


4) Implementation

The implementation of the solution described in the previous chapter relies on
intercepting exceptions prior to allowing the native exception dispatcher to
handle them such that the exception handler chain can be validated.  First and
foremost, it is important to identify a way of layering prior to the point that
the exception dispatcher transfers control to the registered exception
handlers.  There are a few different places that this layering could occur at,
but the one that is best suited to catch the majority of user-mode exceptions
is at the location that ntdll!KiUserExceptionDispatcher gains control.
However, by hooking ntdll!KiUserExceptionDispatcher, it is possible that this
implementation may not be able to intercept all cases of an exception being
raised, thus making it potentially feasible to bypass the exception handler
chain validation.

The best location would be to layer at would be ntdll!RtlDispatchException. The
reason for this is that exceptions raised through ntdll!RtlRaiseException, such
as software exceptions, may be passed directly to ntdll!RtlDispatchException
rather than going through ntdll!KiUserExceptionDispatcher first. The condition
that controls this is whether or not a debugger is attached to the user-mode
process when ntdll!RtlRaiseException is called.  The reason
ntdll!RtlDispatchException is not hooked in this implementation is because it
is not directly exported.  There are, however, fairly reliable techniques that
could be used to determine its address.  As far as the author is aware, the act
of hooking ntdll!KiUserExceptionDispatcher should mean that it's only possible
to miss software exceptions which are much harder, and in most cases
impossible, for an attacker to generate.

In order to layer at ntdll!KiUserExceptionDispatcher, the first few
instructions of its prologue can be overwritten with an indirect jump to a
function that will be responsible for performing any sanity checks necessary.
Once the function has completed its sanity checks, it can transfer control back
to the original exception dispatcher by executing the overwritten instructions
and then jumping back into ntdll!KiUserExceptionDispatcher at the offset of the
next instruction to be executed.  This is a nice and ``clean'' way of
accomplishing this and the performance overhead is miniscule Where ``clean'' is
defined as the best it can get from a third-party perspective.

In order to hook ntdll!KiUserExceptionDispatcher, the first n instructions,
where n is the number of instructions that it takes to cover at least 6 bytes,
must be copied to a location that will be used by the hook to execute the
actual ntdll!KiUserExceptionDispatcher.  Following that, the first n
instructions of ntdll!KiUserExceptionDispatcher can then be overwritten with an
indirect jump.  This indirect jump will be used to transfer control to the
function that will validate the exception handler chain prior to allowing the
original exception dispatcher to handle the exception.

With the hook installed, the next step is to implement the function that will
actually validate the exception handler chain.  The basic steps involved in
this are to first extract the head of the list from fs:[0] and then iterate
over each entry in the list.  For each entry, the function should validate that
the Next attribute points to a valid memory location. If it does not, then the
chain can be assumed to be corrupt.  However, if it does point to valid memory,
then the routine should check to see if the Next pointer is equal to the
address of the validation frame that was previously stored at the end of the
exception handler chain for this thread. If it is equal to the validation
frame, then the integrity of the chain is confirmed and the exception can be
passed to the actual exception dispatcher.

However, if the function reaches an invalid Next pointer, or it reaches
0xffffffff without encountering the validation frame, then it can assume that
the exception handler chain is corrupt.  It's at this point that the function
can take whatever steps are necessary to discard the exception, log that a
potential exploitation attempt occurred, and so on.  The end result should be
the termination of either the thread or the process, depending on
circumstances.  This algorithm is captured by the pseudo-code below:


01: CurrentRecord = fs:[0];
02: ChainCorrupt  = TRUE;
03: while (CurrentRecord != 0xffffffff) {
04:     if (IsInvalidAddress(CurrentRecord->Next))
05:         break;
06:     if (CurrentRecord->Next == ValidationFrame) {
07:         ChainCorrupt = FALSE;
08:         break;
09:     }
10:     CurrentRecord = CurrentRecord->Next;
11: }
12: if (ChainCorrupt == TRUE)
13:     ReportExploitationAttempt();
14: else
15:     CallOriginalKiUserExceptionDispatcher();


The above algorithm describes how the exception dispatching path should be
handled.  However, there is one important part remaining in order to implement
this solution.  Specifically, there must be some way of registering the
validation frame with a thread prior to any exceptions being dispatched on that
thread. There are a few ways that this can be accomplished.  In terms of a
proof of concept, the easiest way of doing this is to implement a DLL that,
when loaded into a process' address space, catches the creation notification of
new threads through a mechanism like DllMain or through the use of a TLS
callback in the case of a statically linked library.  Both of these approaches
provide a location for the solution to establish the validation frame with the
thread early on in its execution.  However, if there were ever a case where the
thread were to raise an exception prior to one of these routines being called,
then the solution would improperly detect that the exception handler chain was
corrupt.

One solution to this potential problem is to store state relative to each
thread that keeps track of whether or not the validation frame has been
registered.  There are certain implications about doing this, however.  First,
it could introduce a security problem in that an attacker might be able to
bypass the protection by somehow toggling the flag that tracks whether or not
the validation frame has been registered.  If this flag were to be toggled to
no and an exception were generated in the thread, then the solution would have
to assume that it can't validate the chain because no validation frame has been
installed.  Another issue with this is that it would require some location to
store this state on a per-thread basis.  A good example of a place to store
this is in TLS, but again, it has the security implications described above.

A more invasive solution to the problem of registering the validation frame
would be to somehow layer very early on in the thread's execution -- perhaps
even before it begins executing from its entry point.  The author is aware of a
good way to accomplish this, but it will be left as an exercise to the reader
on what this might be.  This more invasive solution is something that would be
an easy and elegant way for Microsoft to include support for this, should they
ever choose to do so.

The final matter of how to go about implementing this solution centers around
how it could be deployed and used with existing applications without requiring
a recompile.  The easiest way to do this in a proof of concept setting would be
to implement these protection mechanisms in the form of a DLL that can be
dynamically loaded into the address space of a process that is to be protected.
Once loaded, the DLL's DllMain can take care of getting everything set up.  A
simple way to cause the DLL to be loaded is through the use of AppInitDLLs,
although this has some limitations.  Alternatively, there are more invasive
options that can be considered that will accomplish the goal of loading and
initializing the DLL early on in process creation.

One interesting thing about this approach is that while it is targeted at being
used as a runtime solution, it can also be used as a compile-time solution.
This means that applications can use this solution at compile-time to protect
themselves from SEH overwrites.  Unlike Microsoft's solution, this will even
protect them in the presence of third-party images that have not been compiled
with the support. This can be accomplished through the use of a static library
that uses TLS callbacks to receive notifications when threads are created, much
like DllMain is used for DLL implementations of this solution.

All things considered, the author believes that the implementation described
above, for all intents and purposes, is a fairly simplistic way of providing
runtime protection against SEH overwrites that has minimal overhead. While the
implementation described in this document is considered more suitable for a
proof-of-concept or application-specific solution, there are real-world
examples of more robust implementations, such as in Wehnus's WehnTrust product,
a commercial side-project of the author's.  Apologies for the shameless plug.


5) Compatibility

Like most security solutions, there are always compatibility problems that must
be considered.  As it relates to the solution described in this paper, there
are a couple of important things to keep in mind.

The first compatibility issue that might happen in the real world is a scenario
where an application invalidates the exception handler chain in a legitimate
fashion.  The author is not currently aware of situations where an application
would legitimately need to do this, but it has been observed that some
applications, such as cygwin, will do funny things with the exception handler
chain that are not likely to play nice with this form of protection. In the
event that an application invalidates the exception handler chain, the solution
described in this paper may inadvertently detect that an SEH overwrite has
occurred simply because it is no longer able to reach the validation frame.

Another compatibility issue that may occur centers around the fact that the
implementation described in this paper relies on the hooking of functions.  In
almost every situation it is a bad idea to use function hooking, but there are
often situations where there is no alternative, especially in closed source
environments. The use of function hooking can lead to compatibility problems
with other applications that also hook ntdll!KiUserExceptionDispatcher.  There
may also be instances of security products that detect the hooking of
ntdll!KiUserExceptionDispatcher and classify it as malware-like behavior.  In
any case, these compatibility concerns center less around the fundamental
concept and more around the specific implementation that would be required of a
third-party.


6) Conclusion

Software-based vulnerabilities are a common problem that affect a wide array of
operating systems.  In some cases, these vulnerabilities can be exploited with
greater ease depending on operating system specific features.  One particular
case of where this is possible is through the use of an SEH overwrite on 32-bit
applications on the Windows platform. An SEH overwrite involves overwriting the
Handler associated with an exception registration record. Once this occurs, an
exception is generated that results in the overwritten Handler being called.
As a result of this, the attacker can more easily gain control of code
execution due to the context that the exception handler is called in.

Microsoft has attempted to address the problem of SEH overwrites with
enhancements to the exception dispatcher itself and with solutions like SafeSEH
and the /GS compiler flag. However, these solutions are limited because they
require a recompilation of code and therefore only protect images that have
been compiled with these flags enabled. This limitation is something that
Microsoft is aware of and it was most likely chosen to reduce the potential for
compatibility issues.

To help solve the problem of not offering complete protection against SEH
overwrites, this paper has suggested a solution that can be used without any
code recompilation and with negligible performance overhead.  The solution
involves appending a custom exception registration record, known as a
validation frame, to the end of the exception list early on in thread startup.
When an exception occurs in the context of a thread, the solution intercepts
the exception and validates the exception handler chain for the thread by
making sure that it can walk the chain until it reaches the validation frame.
If it is able to reach the validation frame, then the exception is dispatched
like normal.  However, if the validation frame cannot be reached, then it is
assumed that the exception handler chain is corrupt and that it's possible that
an exploit attempt may have occurred.  Since exception registration records are
always prepended to the exception handler chain, the validation frame is
guaranteed to always be the last handler.

This solution relies on the fact that when an SEH overwrite occurs, the Next
attribute is overwritten before overwriting the Handler attribute.  Due to the
fact that attackers typically use the Next attribute as the location at which
to store a short jump, it is not possible for them to both retain the integrity
of the list and also use it as a location to store code. This important
consequence is the key to being able to detect and prevent the leveraging of an
SEH overwrite to gain code execution.

Looking toward the future, the usefulness of this solution will begin to wane
as 64-bit versions of Windows begin to dominate the desktop environment.  The
reason 64-bit versions are not affected by this solution is because exception
handling on 64-bit versions of Windows is inherently secure due to the way it's
been implemented.  However, this only applies to 64-bit binaries.  Legacy
32-bit binaries that are capable of running on 64-bit versions of Windows will
continue to use the old style of exception handling, thus potentially leaving
them vulnerable to the same style of attacks depending on what compiler flags
were used. On the other hand, this solution will also become less necessary due
to the fact that modern 32-bit x86 machines support hardware NX and can
therefore help to mitigate the execution of code from the stack.  Regardless of
these facts, there will always be a legacy need to protect against SEH
overwrites, and the solution described in this paper is one method of providing
that protection.

A. References

Borland.  United States Patent: 5628016. 
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=2Fnetahtml2FPTO2Fsrchnum.htm&r=1&f=G&l=50&s1=5,628,016.PN.&OS=PN/5,628,016&RS=PN/5,628,016;
accessed Sep 5, 2006.


Litchfield, David.  Defeating the Stack based Buffer
Overflow Prevention Mechanism of Microsoft Windows 2003 Server. 

http://www.blackhat.com/presentations/bh-asia-03/bh-asia-03-litchfield.pdf;
accessed Sep 5, 2006.


Microsoft Corporation.  Structured Exception Handling. 

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/structured_exception_handling.asp;
accessed Sep 5, 2006.


Microsoft Corporation.  Working with the AppInitDLLs
registry value. 

http://support.microsoft.com/default.aspx?scid=kb;en-us;197571;
accessed Sep 5, 2006.


Microsoft Corporation.  /GS (Buffer Security Check) 

http://msdn2.microsoft.com/en-us/library/8dbf701c.aspx;
accessed Sep 5, 2006.


Nagy, Ben.  SEH (Structured Exception Handling) Security
Changes in XPSP2 and 2003 SP1. 

http://www.eeye.com/html/resources/newsletters/vice/VI20060830.html#vexposed;
accessed Sep 8, 2006.


Pietrek, Matt.  A Crash Course on the Depths of Win32
Structured Exception Handling. 

http://www.microsoft.com/msj/0197/exception/exception.aspx;
accessed Sep 8, 2006.


skape.  Improving Automated Analysis of Windows x64
Binaries. 
http://www.uninformed.org/?v=4&a=1&t=sumry; accessed
Sep 5, 2006.


Wehnus.  WehnTrust. 
http://www.wehnus.com/products.pl; accessed Sep 5,
2006.


Wikipedia.  Matryoshka Doll. 
http://en.wikipedia.org/wiki/Matryoshka_doll;
accessed Sep 18, 2006.


Wine.  CompilerExceptionSupport. 
http://wiki.winehq.org/CompilerExceptionSupport;
accessed Sep 5, 2006.