Re: Formal semantics for C

  >   > Compiler optimization researchers perform non-trivial analyses of C
  >   > programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
  >   > operations: the simply assume that unsafe operations are never executed.
  >   This is not "ducking the issue".  On the contrary, this is an approach
  >   whose correctness can be proven from the semantics of C, as specified
  >   in the C standard or more formally in the works referred to earlier.
  > Nonsense!  These optimizations are incorrect (do not preserve the
  > meaning of programs) because they presume programs never perform unsafe
  > operation, but in fact they do.  

  No, they do preserve the meanings (denotational semantics) of such programs.

A semantic definition for C that maps all unsafe operations to
undefined is misleading for the vast majority of potential
applications of the semantics (e.g. optimizing compilers, general
static debuggers).  In essence, such a "sanitized" semantics converts
C to a safe programming language.  The definition of operational
equivalence for such a language is vastly different than it is for any
particular implementation of the C language.  One would hope that
operational equivalence under the sanitized semantics implies
operational equivalance in specific implementations, but it doesn't.

Real C implementations DO NOT DETECT unsafe operations; they
simply execute the underlying machine instructions implementing those
operations.  Executing those instructions either aborts program
execution (e.g, generates a segmentation fault) or corrupts program
data structures.  Most of the security holes that have been found in
existing operating systems involve the latter behavior; malicious
hackers use unsafe operations embedded in the operating system code to
modify its code and data.  According to the sanitized semantics that
maps all unsafe operations to "undefined", none of these security
breaches can possibly happen but, of course, they do.

Pretending that unsafe operations return "undefined" is not appropriate
for most applications of program analysis becuase such an analysis is
inconsisent with the behavior of programs that contain unsafe
operations (which includes nearly all large production programs
written in C).  Nevertheless, there are exceptions.  I know of one
project, namely the SLAM project at Microsoft Research, that uses such
an analysis appropriately.  The SLAM system relies on a "sanitized"
pointer aliasing analysis to perform its own model-checking analysis
for synchronization errors.  So the correctness of the SLAM
synchronization analysis hinges on the correctness of the pointer
aliasing analysis which assumes freedom from unsafe operations.  In
essence, SLAM reduces synchronization correctness to the absence of
unsafe operations.  Since SLAM is used to analyze device drivers that
have been very carefully scrutinized for unsafe operations by many
expert programmers, SLAM increases my confidence that the drivers do
not have any synchronization errors.  But the SLAM analysis falls
short of a convincing proof unless it is supplemented by a proof that
the analyzed code does not contain an unsafe operations.

-- Corky Cartwright

Follow-Ups: References: