[Prev][Next][Index][Thread]

Re: Formal semantics for C



On 26-Dec-2001, Corky Cartwright <cork@rice.edu> wrote:
> Of course these statements are true, but they are not pertinent to the
> point that Matthias raised.  How are you going to characterize the
> effects of all possible machine and implementation dependencies in any
> mathematically interesting way using non-determinism?

Implementation-defined, implementation-specific and locale-specific
characteristics of the C implementation can be handled by making the
semantics parameterized on those characteristics.

Unspecified behaviour can be handled using non-determinism, by having
the denotation of each operation with unspecified behaviour be a set.

Undefined behaviour can be handled by having the denotation of the program
as a whole be the universal set of all possible behaviours.

> For example,
> what is the result of an unsafe operation such as updating a
> out-of-bounds array element?  It is hard to identify any set of states
> short of all possible program states (including aborting errors) as
> the result such an operation.

Looks like you answered your own question here: use the full set.

However, note that this is the semantics of the program, not the operation.
For the individual operations a distinguished element (called say
"undefined_behaviour") which propagates upwards should be used.

This is needed because in C undefined behaviour is allowed to
effect operations which would normally have occurred previously.
For example, the following program

	int a[10];

	int main() {
		printf("Hello world\n");
		fflush(stdout);
		printf("%d\n", a[42]);
		return 0;
	}

is not required to print out "Hello world"; the compiler is free to for
instance reorder the load of a[42] before the first call to printf().

> All of the "semantics" that I have seen for C cheat on the issue of
> unsafe operations.  Either they codify the behavior of a particular C
> implementation (including a complete machine-level memory model) or
> they punt when an unsafe operation is performed.

Why would the latter approach be "cheating"?
And why do you characterize it as not "mathematically interesting"?

Personally I think mathematically modelling what the C standard specifies
is very interesting.

> Compiler optimization researchers perform non-trivial analyses of C
> programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
> operations: the simply assume that unsafe operations are never executed.

This is not "ducking the issue".  On the contrary, this is an approach
whose correctness can be proven from the semantics of C, as specified
in the C standard or more formally in the works referred to earlier.

(I presume here that by "unsafe operations", you mean operations with
undefined behaviour -- otherwise what you say would be factually incorrect.)

Now you can argue that _the C language_ "ducks the issue", by leaving
the behaviour of programs that execute such operations undefined.
And you can certainly criticize that as being bad language design
(though not all will agree).
However, don't blame the compiler optimization researchers, who would be
foolish if they did not take advantage of this language design.
And don't blame the formal semantics researchers, who just give precise
mathematical models for a language design which has already been specified
(in not-quite-so-formal Standardese) by others.
If there is blame to be apportioned, it should go to the C language
designers and standardizers, and to the programmers and managers who
chose C rather than languages whose semantics had less nondeterminism.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

Follow-Ups: References: