[Prev][Next][Index][Thread]

More on C semantics / nondeterminism



[ Another crop of messages posted on these topics during the past
  couple of days...  --B ]

-------------------------------------------------------------------------  

Subject: Re: Nondeterminism / formal semantics of C
To: types@cis.upenn.edu
Date: Sat, 29 Dec 2001 00:43:40 -0500 (EST)

> > Matthias Felleisen writes:
> > > The question that I have is what does it mean to have a semantics
> > > of C?  In my mind, a semantics is a mathematical model that
> > > predicts the behavior of a program in some programming language.
> >
> > > Unless you nail down the precise hardware, os specs, compiler
> > > specs, I can't see how one can possibly specify the semantics of
> > > C. One could specify a very small kernel of C, but as soon as
> > > you admit arrays you're not going to produce anything that will
> > > predict the behavior in a reliable manner.
> >
> > I apologise if this sounds as if I'm belabouring the point, but the
> > C Standard explicitly defines the notion of a "strictly conforming
> > program", one whose execution is guaranteed to produce the same
> > behaviour on all platforms.
> >
> > Other programs might be guaranteed to work, but because of
> > non-determinism will give different answers in different
> > circumstances.  Other programs are beyond the pale.
> >
> > A formal semantics for C would classify programs according to the
> > above, allowing you to predict, for example, that program X will
> > always run, producing the answer 42; will run, producing a number
> > in the range 10-15; or that it's illegal, and that its behaviour
> > is thus unpredictable, or at least, completely dependent on the
> > vagaries of the implementation.
> >
> > Michael.
>
> Excuse my ignorance, but is the Standard's notion of "conforming
> program" a decidable notion?  If so, is your semantics accompanied by
> an algorithm?
>
> -- Matthias

It seems pretty clear that the standard's notion of a conforming program
is one that cannot get to a particular "undefined" state that is entered
when, for example, an out-of-bounds array pointer is dereferenced. I see
no reason why such an operational semantics cannot be defined. To do so,
we don't, contrary to Matthias's assertion, need to know about details
of the operating system or the heap layout in order to get a semantics
that is sufficiently detailed to capture the behavior of conforming
programs.

Determinism is overrated.  Nondeterminism in the operational semantics
is a good thing because it makes it easy to compile better code. Java
makes evaluation order deterministic, but this doesn't save programmers
from bugs because normally they don't think about evaluation order
anyway. In the places where Java (rather arbitrarily) refines a
computation to just one of the evaluations that would have been possible
in C, it's just as likely that the programmer didn't expect the result
that happens to pop out.

The problem of whether C programs are conforming is clearly
not decidable in general. But this misses the point. We don't
need a decision process; we just need to be able to show that
particular programs that we are interested in are conforming,
or deterministic. This should be the goal of the C semantics
discussed. That the semantics fails to identify all conforming or
deterministic programs as such is not necessarily troubling as long
as it is precise enough that programs of interest can be shown to be
conforming. Because skilled C programmers can write conforming programs,
I am sure that this goal is attainable.

Cheers,

-- Andrew




Date: Sat, 29 Dec 2001 17:21:10 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: bcpierce@cis.upenn.edu
Cc: types@cis.upenn.edu
Subject: Re: Nondeterminism / formal semantics of C

Matthias Felleisen <matthias@ccs.neu.edu> writes:
> 
> Excuse my ignorance, but is the Standard's notion of "conforming program" a
> decidable notion?  If so, is your semantics accompanied by an algorithm? 

The C standard's notion of "conforming program" is indeed decidable,
but unfortunately it is vacuous: every program is a conforming program.
The rationale for this is apparently to allow people who write non-portable
programs to be able to nevertheless claim that their programs are "conforming".

The C standard also has a notion of "strictly conforming" program.
This one, fortunately, is more useful.  Like all interesting run-time
program properties, it is undecidable in general, but it can be proven
for particular programs.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.





From: Corky Cartwright <cork@rice.edu>
Date: Sat, 29 Dec 2001 01:22:36 -0600 (CST)
To: Fergus Henderson <fjh@cs.mu.oz.au>
Subject: Re: Formal semantics for C
Cc: types@cis.upenn.edu


The "unsafe-operations-yield-undefined semantics" for C is
deterministic; it collapses all unsafe computations into the "bottom"
element of the semantic domain (cpo) of program values.  This is the
well-known "trivial" approach to handling run-time errors in a
denotational semantics: simply equate all errors with divergence.

If you use a non-determinstic semantics where unsafe operations yield
the set of all possible states, then you cannot use such a semantics
justify any interesting static program analysis.  Any potentially
unsafe operation (e.g., an array update or a assignment to field of
heap allocated structure) destroys all of the information that an
analysis tries to propagate.  Consider the case of analyzing which
pointer fields in memory can be aliases of one another.  As soon as an
unsafe operation has been executed, any pointer field can alias any
other pointer field.  Such a semantics is "honest" in modeling the
catastrophic consequences of unsafe operations, but I don't how it
can be used to produce any positive results from the perspective
of program analysis.  

-- Corky Cartwright






Date: Sat, 29 Dec 2001 18:46:51 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Corky Cartwright <cork@rice.edu>
Cc: types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 29-Dec-2001, Corky Cartwright <cork@rice.edu> wrote:
> 
> The "unsafe-operations-yield-undefined semantics" for C is
> deterministic; it collapses all unsafe computations into the "bottom"
> element of the semantic domain (cpo) of program values.

Any deterministic semantics for C does not correctly reflect the C standard.
C has several different kinds of "unsafe" operations.  You can handle
operations which have "undefined behaviour" in the manner you describe
above, but that doesn't work for operations which have "unspecified
behaviour"; you need nondeterminism in the semantics to handle those.

> If you use a non-determinstic semantics where unsafe operations yield
> the set of all possible states, then you cannot use such a semantics
> justify any interesting static program analysis.  [...]
> Such a semantics is "honest" in modeling the
> catastrophic consequences of unsafe operations, but I don't how it
> can be used to produce any positive results from the perspective
> of program analysis.  

It's true that such a semantics is not directly useful for the purpose
of program optimization, but it's not completely uninteresting -- you can
use such a semantics to analyse when programs have undefined behaviour.

Analyses used for program optimization will assume that the program
does not have undefined behaviour, and so will use a simpler semantics.
But the non-deterministic semantics where unsafe operations yield the
set of all possible states is still useful, in a fairly trivial sense,
for justifying the optimizations applied as a result of analyses using the
simpler semantics.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.





From: Michael Norrish <Michael.Norrish@cl.cam.ac.uk>
Date: Sat, 29 Dec 2001 10:51:01 +0000
To: matthias@ccs.neu.edu
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk
Subject: Re: Formal semantics for C

Matthias Felleisen writes:
 
> Excuse my ignorance, but is the Standard's notion of "conforming program" a
> decidable notion?  If so, is your semantics accompanied by an algorithm? 
 
No.  A strictly conforming program isn't allowed to divide by zero
(among many other restrictions) because this represents undefined
behaviour. 

There's no way of deciding if a program is or isn't going to divide by
zero, so the class is undecidable.

Michael.




To: types@cis.upenn.edu
Subject: Re: Nondeterminism / formal semantics of C
Message-ID: <1009625124.3c2da8246be6e@webmail.dcs.qmul.ac.uk>
Date: Sat, 29 Dec 2001 11:25:24 +0000 (GMT)
From: martinb@dcs.qmul.ac.uk

> As for whether the semantics should be _mathematical_, well, currently
> structured English ("Standardese") seems in most cases to be more useful
> than mathematics for language definitions and programming language
> standards, because it can be more easily understood by a wider audience.

this may be correct at the moment but there is an important audience which 
cannot understand structured english:  verification tools!
 
> But currently the state-of-the-art in tools for making use of formal
> semantics seems to be well behind that for grammars.  Until we have
> better tools, and these tools become widely used and widely accepted,
> the advantages of mathematical specifications will be limited.

this is incorrect. there is a mathematically totally rigorous, yet also
conceptually simple tool for semantic specifications: ad-hoc operational
semantics, SOS, labelled transition systems and the like. i cannot
think of a single example where the ad-hoc semantics wasn't leaner cleaner
in comparison with alternative semantic specifications, be they game-theoretic, 
pi-calculistic or functional (lambda calculi or domains). 

of course there is a  problem with ad-hoc operational semantics and that 
is that they

* don't allow (or facilitate) technology transfer. by that i mean that
  because of the lack of constraints on, say, rule formats, it is usually
  not possible to reuse technology such as theorem provers
  or control-flow analyses (developed for one ad-hoc semantical framework)
  in another, at least not without having to do conceptually trivial yet 
  nevertheless substantial reworking of proofs
 
* in addition and for the same reasons there isn't a widely applicable
  collections of reasoning methods for ad-hoc semantics.

nevertheless, the benefits (in precision) of ad-hoc semantics over informal 
methods are so substantial that the reasons for refusing to do formal 
specifications of programming languages can no longer be technical, just 
social.

BUT of course it would be nice if it was possible to identify a 
'universal' ad-hoc operational semantics that was

* expressive enough to allow precise and concise modelling of all
  computational phenomena, but was

* small and constrained enough so that we can finally start (and hopefully
  eventually succeed) to develop non-trivial and practically useful
  mathematical theories of computing (comparable to the role of differential
  equations in physics)

now the good news: such a universal ad-hoc semantics may already exist!
it is called the (asynchronous) pi-calculus. 

martin





From: Matthias Felleisen <matthias@ccs.neu.edu>
To: Michael.Norrish@cl.cam.ac.uk
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk, cork@cs.rice.edu
Subject: Re: Formal semantics for C
Date: Sat, 29 Dec 2001 11:05:34 -0500 (EST)


Thanks, that's what I thought. Indeed, I thought you'd reference the more
interesting and relevant problem that you can't decide whether an array 
index is in the proper range. I rest my case. -- Matthias

  > From: Michael Norrish <Michael.Norrish@cl.cam.ac.uk>
  > Content-Type: text/plain; charset=us-ascii
  > Date: Sat, 29 Dec 2001 10:51:01 +0000
  > Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk
  > Reply-To: Michael.Norrish@cl.cam.ac.uk
  > Sender: Michael Norrish <Michael.Norrish@cl.cam.ac.uk>
  > 
  > Matthias Felleisen writes:
  >  
  > > Excuse my ignorance, but is the Standard's notion of "conforming program" a
  > > decidable notion?  If so, is your semantics accompanied by an algorithm? 
  >  
  > No.  A strictly conforming program isn't allowed to divide by zero
  > (among many other restrictions) because this represents undefined
  > behaviour. 
  > 
  > There's no way of deciding if a program is or isn't going to divide by
  > zero, so the class is undecidable.
  > 
  > Michael.
  > 





Date: Sat, 29 Dec 2001 12:49:03 -0500
From: Dan Grossman <danieljg@cs.cornell.edu>
To: types@cis.upenn.edu
CC: matthias@ccs.neu.edu, kfl@it.edu, cork@rice.edu
Subject: Re: Formal semantics for C


Matthias writes:

> It is my believe that C is an "assembly" language for building and
> "scripting" Unix and that the people who designed C didn't think about
> a mathematical, machine-independent semantics and, frankly, they and their
> friends who use it, probably don't care about it. 

Okay, I'll take the bait. :-)

(1) Is C an assembly language?

C's machine model is very close to the machine model of an assembly
language.
In particular, the behavior of illegal C programs can be explained only in
terms usually reserved for assembly language.  There are differences,
however.  C doesn't have any notion of registers (ignoring the ignorable
"register" keyword).

C's term language is quite different than an assembly language.  It's easy
to overlook the obvious here: For example, C has hierarchical scope and
aggregate type definitions.  Now, *if* C were memory safe, features like
these would be great for user-defined abstractions.  Not being memory safe
is not equivalent to being an assembly language, however.

Finally, translating C to an assembly language is "much easier" than
translating one assembly language to another.  One reason is precisely that
C defines many programs to be illegal.

(2) Was C designed for building and "scripting" Unix?

Sure.  And there are a thousand things I would change about C.  More
important than its original use, though, is that C is used for far too many
things today -- it can't possibly be the "right" language for all of them. 
More on that below.

(3) Did the designers care about mathematical, machine-independent
semantics?

No. Many language designers don't, and that's unfortunate.  But it seems
clear from the conversation and the cited work that the real issue here is
safety.
That is, with safety, formalizing C would be a nearly solved problem.

Machine-independence is where I'll make my most controversial point: The
world needs non-assembly languages with "machine-dependent behavior".  Okay,
I don't quite mean that.  What I mean is that you should be able to write a
program that assumes, for example, pointers are 32-bits.  Try to compile it
for a 64-bit machine, and maybe the compiler just says no.  The second
biggest problem with C (safety the first), is that it is extremely hard to
write portable programs and you get very little automated help in doing so.

What we need is a well-defined portable (sub)language, an unportable
(sub)language, and easy interoperability between the two.  I'm surprised I
haven't seen this need articulated a gazillion times.

We also need a well-defined safe (sub)language, an unsafe (sub)language, and
easy interoperability between the two.  I'm working on that one. :-)

--Dan

-- 
| Dan Grossman     www.cs.cornell.edu/home/danieljg H:607 256 0724 |
| 5157 Upson Hall  danieljg@cs.cornell.edu          O:607 255 9834 |





Subject: RE: Nondeterminism / formal semantics of C
Date: Sun, 30 Dec 2001 20:16:11 -0500
From: "Gregory Morrisett" <jgm@cs.cornell.edu>
To: <types@cis.upenn.edu>

In defense of C, what is the meaning of the following ML
program?

  1+1;

Is it 2 or is it "uncaught exception Overflow"?  

Also, as someone else has pointed out, it is possible to
produce a "memory-safe" implementation of C (see for instance
Necula's CCured or the previous Safe-C work out of Wisconsin.)

Now to attack C:  

The language leaves many (important) things "undefined".
We cannot use a denotational semantics that maps such
program transitions to "any possible state" simply because
we cannot construct a useful mathematical model of what
those states might be.  They might be the state in which
the Universe explodes, or your credit card number gets
sent to Microsoft.  All things are possible and indeed,
many fantastic things have been achieved when programs
have stepped into this undefined territory (e.g., buffer
overruns.)

-Greg