More on Formal Semantics of C
[ I've been batching up messages on this topic to avoid filling
everybody's mailboxes with too many individual contributions. Here's
the latest batch. --B ]
From: Michael Norrish <Michael.Norrish@cl.cam.ac.uk>
Date: Mon, 11 Mar 2002 12:08:21 +0000
Cc: Jim Huggins <firstname.lastname@example.org>, email@example.com
Subject: Re: Formal semantics for C
> there is an immediate objection: of course in a sense full
> abstraction, definability or soundness cannot be given in this
> setting because the semantics of C has never been formally specified
> in the first place. this is true, but it only means that what you
> have defined is a semantics (rather than a semantics of C).
If you're willing to admit that C has a semantics at all (maybe it
doesn't?), then surely the evolving algebra semantics is an
approximation to it. Further it can be compared with other formal
semantics, such as mine, or anyone else's. How can we possibly do any
better? Should we be agitating for the ISO Committee to adopt a
formal semantics against which we could then compare our efforts?
> in addition, and slightly contradicting what i have said, every C
> compiler + libraries implicitly *defines* a formal semantics, albeit
> a rather complicated one, which may or may not approximate what one
> may expect to be *the* semantics of C. you could test your semantics
> against a compiler, although i don't recommend it
A semantics that properly reflected the standard's deliberate
under-determinism would allow more behaviours than a compiler would
display. But, given a semantics for the underlying hardware or
intermediate language level or whatever, you could verify that the
compiler did at least cause appropriate behaviours on legal inputs.
>> "it is not difficult to extend our description of C to include
>> any desired library function or functions."
> i don't believe this. the C libraries contain some fancy operations
> like longjmp or the POSIX thread creation & manipulation mechanisms.
> i don't think it is at all straightforward to model them. POSIX
> threads can have very complicated interactions through distributed
> shared memory with non-trivial memory consistency models that may
> include features like memory access reordering, memory barriers,
> spurious wakeups of condition variables ... i'm not even going to
> start talking about timers and networks which are also integral and
> heavily used part of the C libraries ...
C's standard library (as defined in the ISO standard) is complicated,
and does include longjmp and signal, but it certainly doesn't include
shared memory, POSIX threads (these are "defined" in the POSIX
standard), timers or networks. Many people (witness comp.std.c) would
also claim that the language in the complicated sections of the
standard, dealing with things like signal-handling, is so vague as to
be nearly meaningless. Instead, compiler writers implement what they
believe to be reasonable in the context of operating system
That's the real problem with specifying libraries; if you're not
careful you end up modelling an entire operating system. C's
minimalist requirements of its O/S suggest that it might just be
possible to write a satisfactory model, but I'm not volunteering!
From: "Nikolaos S. Papaspyrou" <firstname.lastname@example.org>
Subject: Re: Formal semantics for C
"S.J.Thompson" <S.J.Thompson@ukc.ac.uk> writes:
> Query about the formal semantics of C:
> Are types readers aware of any work on formalizing the semantics
> of the C programming language?
"Sava Krstic" <email@example.com> writes:
> Another reference:
> Nikolaos S. Papaspirou, Denotational Semantics of ANSI C. Computer
> Standards and Interfaces, 23, 169--185, 2001
Apart from this paper, you can find my humble contribution to the
semantics of C on my thesis' web page:
I have found (and not yet corrected) some mistakes in the semantics,
but I believe the general picture is right and quite different from
the work of Gurevich/Huggins and Norrish. It doesn't cover the
standard library nor C99. The former would be quite hard to add.
Nikolaos S. Papaspyrou | Tel: +30-1-7722486
National Technical University of Athens | Home: +30-1-7524801
Dept. of Electrical & Computer Engineering | Fax: +30-1-7722519
Software Engineering Laboratory |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
15780 Zografou, Athens, Greece | Happiness is not a state to
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=| arrive at, but a manner of
Email: firstname.lastname@example.org | travelling.
URL: http://www.softlab.ntua.gr/~nickie/ | --- Margaret Lee Runbeck
Cc: email@example.com, Jim Huggins <firstname.lastname@example.org>,
> If you're willing to admit that C has a semantics at all (maybe it
> doesn't?), then surely the evolving algebra semantics is an
> approximation to it. Further it can be compared with other formal
> semantics, such as mine, or anyone else's. How can we possibly do
> any better? Should we be agitating for the ISO Committee to adopt a
> formal semantics against which we could then compare our efforts?
C has an informal semantics. the study of formal semantics is, or at
least seem to me not advanced enough to make a formal specification of
the whole language (as opposed to the trivial sequential part)
feasible right now. Hence it might be wise for standardising bodies
not to even attempt such a formal specification
as semanticists we must try and do better. this is the context of my
critique. it is simply bad slightly sloppy to give a semantics
without specifying the intended notion of observation and
equivalence. its like giving a category through its objects while
omitting the morphisms. it is meaningless. the reason why people get
away with this is that languages like the sequential and untimed core
of C are totally uncontroversial (when was the last time you asked
yourself what if-then-else did?). however, that does not mean that
there are no hidden assumptions. but these hidden assumptions are
shared by everybody who may use the semantics. that's why they do not
usually cause problems. let me give you an example. the following
clause might be part of a translation of C into assembly language. it
deals with sequential composition.
trans( P;Q ) = trans( P ); sleep( 1 year ); trans( Q )
as far as i know it is a perfectly valid translation of C, yet if we
would recompile all the world's C code in this fashion, the entire
planet's information infrastructure would collapse. this
underspecification of C does not cause a problem because we all
implicitly understand that the above program is a really bad idea, we
all share this hidden assumption
with concurrency, memory consistency or more delicate temporal
properties, hidden assumptions are rarely shared and do cause real
problems. the point of semantics is to be very precise about all
assumptions. giving a notion of observation and equivalence seems a
necessary prerequisite for making explicit one's assumptions. for the
sequential core of C it is not even difficult to come up with a notion
of observation and equivalence
> C's standard library (as defined in the ISO standard) is
> complicated, and does include longjmp and signal, but it certainly
> doesn't include shared memory, POSIX threads (these are "defined" in
> the POSIX standard), timers or networks. Many people (witness
> comp.std.c) would also claim that the language in the complicated
> sections of the standard, dealing with things like signal-handling,
> is so vague as to be nearly meaningless.
are you suggesting this is good?
> That's the real problem with specifying libraries; if you're not
> careful you end up modelling an entire operating system.
the distinction between OS and language is sociological not technical.
if you are interested in solid semantics of non-trivial computational
features you need to specify all the way down to the assembly level,
where you have an extremely precise semantics given by the CPU
manufacturer. one of the challenges for semanticists, then, is to
connect the precise low-level semantics of the CPU with higher level
semantics. of course there is not one right way of doing this. for
many applications it is irrelevant to know exactly how many cycles a
command takes, as long as it is below some very large number. so we
not only need to connect high and low-level semantics, we must be able
to do so with a required degree of precision. programmers have
invented many abstractions for layering software and managing
complexity: transactions, memory consistency protocols, functions,
threads, files ... can semanticists do the same?