1 Introduction

Securing systems that interact with malicious parties can be a tremendous challenge. Indeed, systems written in C are especially difficult to secure, given C's tendency to sacrifice safety for efficiency. One of the more subtle pitfalls facing implementors is the so-called format string vulnerability. Since the discovery of this failure mode in the past year, security experts have identified format string vulnerabilities in dozens of widely-deployed security-critical systems [2,4,5,8,9,10,11,22,23,24,25,27,30,35,43], and attackers have begun exploiting these security holes on a large scale [10,27], gaining root access on vulnerable systems. It seems likely that many legacy applications still contain undiscovered format string vulnerabilities.

Format string bugs arise from design misfeatures in the C standard library combined with a problematic implementation of variable-argument functions. Consider a typical usage of format strings:

$\displaystyle \texttt{printf(''\%s'', buf);}$

(correct)

$\displaystyle \texttt{printf(buf);}$

(may be incorrect!)

A perhaps unexpected consequence of format string bugs is that they can be devastating to security. When a knowledgeable adversary has control of the value of the format string s involved in a format string bug, they can use s to write to arbitrary memory locations. For example, including the ``%n'' specifier in a format string causes printf-like functions to store the number of characters printed so far into a location pointed to by the associated argument. When combined with other tricks, this often leads to a complete compromise of security. Techniques for exploiting format string bugs have been described elsewhere [30]; for the purposes of this paper, the details are unimportant.

The main contribution of this paper is to describe a system for automatically detecting format string bugs at compile-time. Our system applies static, type-theoretic analysis techniques from the programming languages literature to the task of detecting potential security holes. We have implemented our system as a tool built on top of an extensible type qualifier framework [19]. We have tested our tool on a number of real-world software systems, in the process independently re-discovering several format string bugs that were unknown to the authors at the time.

Before describing the ideas behind our tool in more detail, we discuss some of the alternatives to static analysis; more are discussed in Section 6.

One natural alternative to static analysis is testing. The main weakness of testing is coverage--it is extremely difficult to construct a test suite that exercises all possible paths through a program. Unfortunately, a security auditor is most interested in exactly the paths that are never followed in ordinary operation. For example, a major source of format string bugs comes from error reporting code (e.g., calls to syslog()). Such code is triggered only on rare, exceptional paths, and it is easy to overlook such paths--and hence, such bugs--with run-time testing. With static analysis, on the other hand, vulnerabilities can be proactively identified and fixed before the code is ever run.

Another alternative to automated static analysis is manual code review. Unfortunately, humans are not especially good at finding format string bugs by inspection. Figure 1 shows a representative example, excerpted from a recent version of wuftpd [2,43]. The code in Figure 1 reads a line of text from the network and passes it to lreply(), where it will later be used as a format string specifier to vsnprintf(). The correct syntax would have been lreply(200, "%s", buf), but the programmer omitted the "%s". As before, this introduces a serious security vulnerability.

**Figure 1:** A format string vulnerability found in `wuftpd` 2.6.0, paraphrased for brevity.
$\begin{figure}\begin{center}\small\tt\begin{tabbing} while (fgets(buf, sizeof bu... ..., ap);\\ \>\vdots\\ %%\>VA\_END;\\ \} \end{tabbing}\end{center}\end{figure}$

In real code, the omission of a format string is often located far away from the place where the requirement for a trusted format string specifier becomes apparent. In the case of our wuftpd example, the offending call to lreply() was not even in the same file as the eventual use of vsnprintf(). Figure 1 also shows why naive static analysis--e.g., searching for all occurrences of printf(s) and replacing them with printf("%s", s)--does not work in practice. Very often format string bugs occur within wrapper functions to printf(), and these non-localized bugs require more sophisticated analysis techniques.

A third alternative would be to re-implement the application in a safe language (such as Java). However, such an approach is likely to be too costly for most legacy applications.