[Prev][Next][Index][Thread]

Nondeterminism / formal semantics of C



The previous exchange of messages about nondeterminism and formal
semantics of C seems to have stirred up a lot of excitement.  I've
bundled a bunch of messages from the past couple of days, to avoid
filling everybody's mailboxes with too many separate messages...

     - B

-------------------------------------------------------------------------

From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Viktor Kuncak <vkuncak@mit.edu>
Cc: types@cis.upenn.edu
Subject: Re: Nondeterminism (was: Formal semantics for C)

On 25-Dec-2001, Viktor Kuncak <vkuncak@mit.edu> wrote:
> It also seems to me that nondeterminism allows us to use
> languages with side effects without fixing an evaluation
> model.  I have impression that functional languages try to
> avoid such nondeterminism.  Is this because it is general
> hard to reason about such semantics?

Yes.

> Having nondeterminism is nice because it gives us more flexibility in
> program refinement.

Yes.  It's a trade-off.  More nondeterminism gives the compiler more
freedom to optimize the program, but can make it harder for programmers
to reason about their code.

Different languages allow differing amounts of nondeterminism,
depending on how the language designers weighed up this trade-off.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.






Date: Fri, 28 Dec 2001 00:15:34 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Corky Cartwright <cork@rice.edu>
Cc: matthias@ccs.neu.edu, kfl@it.edu, types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 26-Dec-2001, Corky Cartwright <cork@rice.edu> wrote:
> Of course these statements are true, but they are not pertinent to the
> point that Matthias raised.  How are you going to characterize the
> effects of all possible machine and implementation dependencies in any
> mathematically interesting way using non-determinism?

Implementation-defined, implementation-specific and locale-specific
characteristics of the C implementation can be handled by making the
semantics parameterized on those characteristics.

Unspecified behaviour can be handled using non-determinism, by having
the denotation of each operation with unspecified behaviour be a set.

Undefined behaviour can be handled by having the denotation of the program
as a whole be the universal set of all possible behaviours.

> For example,
> what is the result of an unsafe operation such as updating a
> out-of-bounds array element?  It is hard to identify any set of states
> short of all possible program states (including aborting errors) as
> the result such an operation.

Looks like you answered your own question here: use the full set.

However, note that this is the semantics of the program, not the operation.
For the individual operations a distinguished element (called say
"undefined_behaviour") which propagates upwards should be used.

This is needed because in C undefined behaviour is allowed to
effect operations which would normally have occurred previously.
For example, the following program

	int a[10];

	int main() {
		printf("Hello world\n");
		fflush(stdout);
		printf("%d\n", a[42]);
		return 0;
	}

is not required to print out "Hello world"; the compiler is free to for
instance reorder the load of a[42] before the first call to printf().

> All of the "semantics" that I have seen for C cheat on the issue of
> unsafe operations.  Either they codify the behavior of a particular C
> implementation (including a complete machine-level memory model) or
> they punt when an unsafe operation is performed.

Why would the latter approach be "cheating"?
And why do you characterize it as not "mathematically interesting"?

Personally I think mathematically modelling what the C standard specifies
is very interesting.

> Compiler optimization researchers perform non-trivial analyses of C
> programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
> operations: the simply assume that unsafe operations are never executed.

This is not "ducking the issue".  On the contrary, this is an approach
whose correctness can be proven from the semantics of C, as specified
in the C standard or more formally in the works referred to earlier.

(I presume here that by "unsafe operations", you mean operations with
undefined behaviour -- otherwise what you say would be factually incorrect.)

Now you can argue that _the C language_ "ducks the issue", by leaving
the behaviour of programs that execute such operations undefined.
And you can certainly criticize that as being bad language design
(though not all will agree).
However, don't blame the compiler optimization researchers, who would be
foolish if they did not take advantage of this language design.
And don't blame the formal semantics researchers, who just give precise
mathematical models for a language design which has already been specified
(in not-quite-so-formal Standardese) by others.
If there is blame to be apportioned, it should go to the C language
designers and standardizers, and to the programmers and managers who
chose C rather than languages whose semantics had less nondeterminism.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.





From: Matthias Felleisen <matthias@ccs.neu.edu>
To: fjh@cs.mu.oz.au
Cc: cork@rice.edu, kfl@it.edu, types@cis.upenn.edu
Subject: Re: Formal semantics for C
Date: Thu, 27 Dec 2001 08:26:42 -0500 (EST)


Fergus Henderson <fjh@cs.mu.oz.au> writes: 

> Undefined behaviour can be handled by having the denotation of the program
> as a whole be the universal set of all possible behaviours.

If you call this mathematically useful, please don't let us stop you from
spending your time on designing a "semantics" for C (or whatever). 

It is my believe that C is an "assembly" language for building and
"scripting" Unix and that the people who designed C didn't think about
a mathematical, machine-independent semantics and, frankly, they and their
friends who use it, probably don't care about it. 

-- Matthias




Date: Fri, 28 Dec 2001 00:54:17 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Matthias Felleisen <matthias@ccs.neu.edu>
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk,
   Michael.Norrish@cl.cam.ac.uk
Subject: Re: Formal semantics for C

On 26-Dec-2001, Matthias Felleisen <matthias@ccs.neu.edu> wrote:
> 
> My conjecture is that you probably didn't mean "non-deterministic" but
> indeterminate.

Well, I meant what I said, but maybe you understood it to mean something
different ;-).

I have in mind a specific meaning for the word in this context:
a non-deterministic semantics is one in which the denotation of
a program is a _set_ of possible behaviours. 

Maybe you have in mind some other meaning.  But I'm not sure what
you have in mind.  If you have some particular suggestions about what
terminology I should use, you'd need to explain them in more detail.

> Perhaps you're trying to say that all unsafe operations
> should be bottom (denotationally speaking), since we have no
> information about them.

"unsafe" operations may include those whose behaviour is
unspecified or implementation-defined, as well as those which
have undefined behaviour.  The former two should not be modelled as bottom.

Also, simply modelling the behaviour of individual operations with
undefined behaviour as bottom is not sufficient to capture the semantics
of C; you also need a distinguished semantic element indicating undefined
behaviour, which gets propagated up in the semantics of all program
constructs except the whole program.

And once you have such a distinguished semantic element, then you don't
*need* to model the semantics of individual operations with undefined
behaviour as bottom (the universal set); they can be denoted by any set
containing this distinguished element, and so it could be simpler to
use the singleton set containing just that element.

However, the denotation of a _whole program_ which executes any C
operation which has "undefined behaviour" should be bottom, yes.
That's what "undefined behaviour" means: any behaviour is permitted.

> This, however, would be a bad mistake [...]

The design of C is full of bad mistakes.
The question, for people who want a formal semantics for C,
is not how to fix those mistakes, but how to model them.

> In C's case the real problem is that you don't even know when to
> specify that your semantics is unspecified without going through an
> extreme detour. Just think about how you would specify that an array
> access is out of bounds. When the array is passed to a procedure, you
> typically don't even have access to its bound.

"You" (the programmer) may not, and the C implementation may not,
but in the formal semantics for C, arrays and pointers are certainly
accompanied by their bounds.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.




From: Matthias Felleisen <matthias@ccs.neu.edu>
To: fjh@cs.mu.oz.au
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk,
   Michael.Norrish@cl.cam.ac.uk
Subject: Re: Formal semantics for C
Date: Thu, 27 Dec 2001 09:21:38 -0500 (EST)


What's the boundary of a pointer? -- Matthias





Date: Fri, 28 Dec 2001 01:58:33 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Matthias Felleisen <matthias@ccs.neu.edu>
Cc: cork@rice.edu, kfl@it.edu, types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 27-Dec-2001, Matthias Felleisen <matthias@ccs.neu.edu> wrote:
> 
> Fergus Henderson <fjh@cs.mu.oz.au> writes: 
> 
> > Undefined behaviour can be handled by having the denotation of the program
> > as a whole be the universal set of all possible behaviours.
> 
> If you call this mathematically useful, please don't let us stop you from
> spending your time on designing a "semantics" for C (or whatever). 

I am happy that Norrish and others have been working on formal semantics for C.
This is good foundational work which may prove useful in the future.

But that is not what I have been doing.  I myself happen to prefer
to put my efforts into things which are more directly useful.  I have
spent some of my time encouraging the C and C++ communities to reduce
the amount of undefined and unspecified behaviour in those languages.
But most of my efforts have been devoted to taking the other path:
designing and implementing a different language which eliminates almost
all the undefined and unspecified behaviour that you find in C and C++.

However, even for pure functional (or logic) languages,
and even when you are explicitly striving to minimize the amount
of unspecified and undefined behaviour in your language,
in practice there are still a number of areas in which
unspecified or even undefined behaviour is desirable.
So the same issues apply to languages like SML, Haskell,
and Mercury, although of course it applies to a much smaller
subset of the language than is the case for C or C++.

> It is my believe that C is an "assembly" language for building and
> "scripting" Unix and that the people who designed C didn't think about
> a mathematical, machine-independent semantics and, frankly, they and their
> friends who use it, probably don't care about it. 

I think everyone on the C standards committee values precisely specified
semantics, and understands the advantages of machine-independent semantics.
However, those are not the only things they value, and in many cases there
are other things which they value higher.

As for whether the semantics should be _mathematical_, well, currently
structured English ("Standardese") seems in most cases to be more useful
than mathematics for language definitions and programming language
standards, because it can be more easily understood by a wider audience.

The use of formal grammar notations for programming language syntax is
a notable exception.  In that case, the advantages of a formal notation
are clear.  These notations are well-known in the industry, are easy to
understand, and allow the semantics to be specified both concisely and
precisely.  Futhermore, there are a plethora of automated tools for
processing such grammars, and some of these tools are widely used.

But currently the state-of-the-art in tools for making use of formal
semantics seems to be well behind that for grammars.  Until we have
better tools, and these tools become widely used and widely accepted,
the advantages of mathematical specifications will be limited.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.






Date: Fri, 28 Dec 2001 02:14:00 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Matthias Felleisen <matthias@ccs.neu.edu>
Cc: types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 27-Dec-2001, Matthias Felleisen <matthias@ccs.neu.edu> wrote:
> 
> What's the boundary of a pointer?

If the pointer points to an element of an array object,
then the bounds are the bounds of that array (see C99 6.5.6 [#8]).

If the pointer points to an object that is not an element of an array,
then it "behaves the same as a pointer to the first element of an array
of length one" (C99 6.5.6 [#7]).

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.






Date: Fri, 28 Dec 2001 02:49:13 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Types List <types@cis.upenn.edu>
Subject: Component Pascal type system is unsound

While we're on the topic of languages with unsound type
systems, let me mention another one: Component Pascal.
<http://www.oberon.ch/docu/language_report.html>.

According to www.oberon.ch, Component Pascal is a refined version of
Pascal, Modula-2, and Oberon, and moreover it is advertised as being
"type-safe".

However, in July this year, in response to some mail from John Gough,
I constructed an example which demonstrates a hole in Component Pascal's
type checking.

The basic problem arises due to the parameter compatibility rules for
"OUT"-mode parameters (which use pass-by-reference) and "extended types"
(Component Pascal's name for derived classes).
These rules allow, for instance, "OUT"-mode parameters whose type is
a pointer to an extended type to be bound to variables whose type is
a pointer to the base type.  This can lead to problems when such
parameters are aliased.

Here's an example which shows the problem.

	MODULE Foo;
	TYPE
		base = RECORD 
			x : INTEGER;
		END;
		derived1 = RECORD(base)
			y : INTEGER;
		END;
		derived2 = RECORD(base)
			z : INTEGER;
		END;

	VAR
		b : POINTER to base;

	PROCEDURE bar(OUT a1 : POINTER TO derived1;
	              OUT a2 : POINTER TO derived2);
	BEGIN
		NEW(a2);
		NEW(a1);
		a2^.z := 42;
	END bar;

	BEGIN
		bar(b, b);
	END Foo.

Component Pascal allows this sort of code, but this program ends up
assigning to the `z' member of an object of type `derived1', which
does not have any `z' member.

(I probably have some of the details of this program wrong, since I don't
have a Component Pascal implementation around to check it.  But John Gough
has verified that the problem I describe did occur for his Component
Pascal implementation, and last I heard the authors of the Component
Pascal told him that they are planning to revise the language to correct
this type loophole... in particular removing OUT parameter variance.)

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.






From: Corky Cartwright <cork@rice.edu>
Date: Thu, 27 Dec 2001 11:51:27 -0600 (CST)
To: Fergus Henderson <fjh@cs.mu.oz.au>
Subject: Re: Formal semantics for C
Cc: matthias@ccs.neu.edu, kfl@it.edu, types@cis.upenn.edu

  > Compiler optimization researchers perform non-trivial analyses of C
  > programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
  > operations: the simply assume that unsafe operations are never executed.

  This is not "ducking the issue".  On the contrary, this is an approach
  whose correctness can be proven from the semantics of C, as specified
  in the C standard or more formally in the works referred to earlier.

Nonsense!  These optimizations are incorrect (do not preserve the
meaning of programs) because they presume programs never perform unsafe
operation, but in fact they do.  

-- Corky Cartwright





Date: Fri, 28 Dec 2001 06:18:00 +1100
From: Fergus Henderson <fjh@cs.mu.oz.au>
To: Corky Cartwright <cork@rice.edu>
Cc: matthias@ccs.neu.edu, kfl@it.edu, types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 27-Dec-2001, Corky Cartwright <cork@rice.edu> wrote:
> 
>   > Compiler optimization researchers perform non-trivial analyses of C
>   > programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
>   > operations: the simply assume that unsafe operations are never executed.
> 
>   This is not "ducking the issue".  On the contrary, this is an approach
>   whose correctness can be proven from the semantics of C, as specified
>   in the C standard or more formally in the works referred to earlier.
> 
> Nonsense!  These optimizations are incorrect (do not preserve the
> meaning of programs) because they presume programs never perform unsafe
> operation, but in fact they do.  

No, they do preserve the meanings (denotational semantics) of such programs.

They may not necessarily preserve the *behaviour* of such programs,
but this only happens in cases where the *meaning* of the program is
the infinite set of all possible behaviours; in such cases, any
behaviour is valid, and substituting one behaviour for another
does not change the meaning of the program.

(I'm presuming here that by "unsafe operation", you mean
"operation whose behaviour is undefined according to the C standard".
If you mean something different, you'll need to explain.)

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.





Date: Thu, 27 Dec 2001 14:18:43 -0500
From: Dan Grossman <danieljg@cs.cornell.edu>
To: matthias@ccs.neu.edu
CC: fjh@cs.mu.oz.au, kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk,
   Michael.Norrish@cl.cam.ac.uk
Subject: Re: Formal semantics for C

I agree with most everything that has been said.  As I usually summarize the
matter, "If you don't have memory safety as a foundation on which to build,
you're not going to get very far without sacrificing soundness."  

But just to defend those "terrible" C-analyzers and C-optimizers that punt
on these important issues, they can be quite good at defining their (false)
assumptions clearly, simply, and concisely, which semanticists should agree
is a "good thing".

For example, you can define a quite low-level model of the heap and then
define your model's allocater such that:
* objects are never deallocated
* all objects are allocated infinitely far apart from each other

My point here is small, just that the issues can be skirted in a precise
way.  The model remains inaccurate.

--Dan

Matthias Felleisen wrote:
> 
> [----- The Types Forum, http://www.cis.upenn.edu/~bcpierce/types -----]
> 
> Corky has made the most important points.
> 
> My conjecture is that you probably didn't mean "non-deterministic" but
> indeterminate. Perhaps you're trying to say that all unsafe operations
> should be bottom (denotationally speaking), since we have no
> information about them. This, however, would be a bad mistake that was
> made in the early history of the full abstraction problem. When you
> computation is erroneous, there is information. (You may not like it,
> but it is not absence of information, as bottom would make you think.)
> [See the papers that Corky and I wrote on full abstraction for
> sequential higher-order languages.]
> 
> In C's case the real problem is that you don't even know when to
> specify that your semantics is unspecified without going through an
> extreme detour. Just think about how you would specify that an array
> access is out of bounds. When the array is passed to a procedure, you
> typically don't even have access to its bound. If you do, what exactly
> are you going to say?
> 
> -- Matthias

-- 
| Dan Grossman     www.cs.cornell.edu/home/danieljg H:607 256 0724 |
| 5157 Upson Hall  danieljg@cs.cornell.edu          O:607 255 9834 |







To: Fergus Henderson <fjh@cs.mu.oz.au>
Subject: Re: Formal semantics for C
Cc: matthias@ccs.neu.edu, kfl@it.edu, types@cis.upenn.edu


  > 
  >   > Compiler optimization researchers perform non-trivial analyses of C
  >   > programs (e.g., a pointer alias analysis) by ducking the issue of unsafe
  >   > operations: the simply assume that unsafe operations are never executed.
  > 
  >   This is not "ducking the issue".  On the contrary, this is an approach
  >   whose correctness can be proven from the semantics of C, as specified
  >   in the C standard or more formally in the works referred to earlier.
  > 
  > Nonsense!  These optimizations are incorrect (do not preserve the
  > meaning of programs) because they presume programs never perform unsafe
  > operation, but in fact they do.  

  No, they do preserve the meanings (denotational semantics) of such programs.

A semantic definition for C that maps all unsafe operations to
undefined is misleading for the vast majority of potential
applications of the semantics (e.g. optimizing compilers, general
static debuggers).  In essence, such a "sanitized" semantics converts
C to a safe programming language.  The definition of operational
equivalence for such a language is vastly different than it is for any
particular implementation of the C language.  One would hope that
operational equivalence under the sanitized semantics implies
operational equivalance in specific implementations, but it doesn't.

Real C implementations DO NOT DETECT unsafe operations; they
simply execute the underlying machine instructions implementing those
operations.  Executing those instructions either aborts program
execution (e.g, generates a segmentation fault) or corrupts program
data structures.  Most of the security holes that have been found in
existing operating systems involve the latter behavior; malicious
hackers use unsafe operations embedded in the operating system code to
modify its code and data.  According to the sanitized semantics that
maps all unsafe operations to "undefined", none of these security
breaches can possibly happen but, of course, they do.

Pretending that unsafe operations return "undefined" is not appropriate
for most applications of program analysis becuase such an analysis is
inconsisent with the behavior of programs that contain unsafe
operations (which includes nearly all large production programs
written in C).  Nevertheless, there are exceptions.  I know of one
project, namely the SLAM project at Microsoft Research, that uses such
an analysis appropriately.  The SLAM system relies on a "sanitized"
pointer aliasing analysis to perform its own model-checking analysis
for synchronization errors.  So the correctness of the SLAM
synchronization analysis hinges on the correctness of the pointer
aliasing analysis which assumes freedom from unsafe operations.  In
essence, SLAM reduces synchronization correctness to the absence of
unsafe operations.  Since SLAM is used to analyze device drivers that
have been very carefully scrutinized for unsafe operations by many
expert programmers, SLAM increases my confidence that the drivers do
not have any synchronization errors.  But the SLAM analysis falls
short of a convincing proof unless it is supplemented by a proof that
the analyzed code does not contain an unsafe operations.

-- Corky Cartwright







Date: Thu, 27 Dec 2001 20:16:55 -0800
To: Viktor Kuncak <vkuncak@mit.edu>
From: Michal Walicki <mwalicki@csc.calpoly.edu>
Subject: Re: Nondeterminism (was: Formal semantics for C)
Cc: types@cis.upenn.edu

>[----- The Types Forum, http://www.cis.upenn.edu/~bcpierce/types -----]
>
>Fergus Henderson:
>
> > A semantics doesn't need to be deterministic.  It can specify a set of
> > possible behaviours.  Indeed for almost all programming languages the
> > semantics is not deterministic.
> >
> > The semantics for a programming language can also be parameterized
> > (i.e. dependent on certain implementation-defined characteristics).
>
>Hi!
>
>I have a few questions about nondeterminism.
>
>I noticed that nondeterminism can neatly solve the following
>problem with languages that have object allocation.
>
>Consider these two pieces of code:
>
>    x = new C
>    y = new C
>    x.f = y
>    y = x
>
>  and
>
>    y = new C
>    x = new C
>    x.f = y
>    y = x
>
>If we model an allocator in a deterministic way
>(e.g. incrementing a counter), then these two programs will
>not be equivalent even though their states have isomorphic
>heaps.  However, if we let allocator nondeterministically
>return every possible fresh object identifier, we get that
>the two pieces represent two equivalent nondeterministic
>programs.
>
>The POPL02 paper by Banerjee and Naumann mentions that some
>of the problems can be solved by taking the quotient of the
>traces (pp4), but gives no references (and the quotient
>solution is not applied in this paper).  What is the right
>reference for this technique?  It seems to me that it comes
>for free if I use nondeterminism.
>
>@InProceedings{BanerjeeNaumann02RepresentationIndependenceConfinement,
>  author =       {Anindya Banerjee and David A. Naumann},
>  title =        {Representation Independence, Confinement,
>                  and Access Control},
>  booktitle =    POPL02,
>  year =         2002,
>  obtained = {http://guinness.cs.stevens-tech.edu/~naumann/publications.html}
>}
>
>It also seems to me that nondeterminism allows us to use
>languages with side effects without fixing an evaluation
>model.  I have impression that functional languages try to
>avoid such nondeterminism.  Is this because it is general
>hard to reason about such semantics?  Having nondeterminism
>is nice because it gives us more flexibility in program
>refinement.
>
>Thanks,
>

Nondeterminism has been used for modeling and solving this (and similar)
kinds of problems in the context of algebraic specifications for about 10
years. I do not have the very first references (which contained the
treatement of the example you mention) but you can check some more recent
items:


@InProceedings{ADT,
  author =       {Michal Walicki and Sigurd Meldal},
  title =        {Multialgebras, power algebras and complete calculi
                  of identities and inclusion},
  booktitle =    Recent Trends in Data Type Specification,
  series =       LNCS,
  volume =      906,
  year =         1995,
}


@Article{CS,
  author =       {Michal Walicki and Sigurd Meldal},
  title =        {A complete calculus for the multialgebraic
                  and functional semantics of nondeterminism},
  journal =    ACM ToPLaS,
  volume = 	17
  number =      2
  year =         1995,
}



An overview of earlier techniques can be found in :


@Article{CS,
  author =       {Michal Walicki and Sigurd Meldal},
  title =        {Algebraic Approaches to Nondeterminism
                  - an Overview},
  journal =    Computing Surveys,
  volume = 	21
  number =      1
  month =       March
  year =         1997,
}


Regrads

        Michal



Date: Fri, 28 Dec 2001 12:47:16 +0000
To: matthias@ccs.neu.edu
Cc: fjh@cs.mu.oz.au, kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk
Subject: Re: Formal semantics for C

Matthias Felleisen writes:
 
> What's the boundary of a pointer? -- Matthias

While the pointer value itself probably won't include a boundary, the
state of the machine in terms of its "contents of memory" can record
where things are allocated, so that attempts to look at parts of the
memory where a program shouldn't will be classed as undefined. 

Naturally, when memory is allocated, you have to specify its actual
placement in an under-determined way; maybe next to other values,
maybe not.  

The question of the level at which to stop describing the combined
system of language implementation, operating system etc, is an
interesting one, but one that is easy enough to answer; in my case, I
attempted to model the very minimalist assumptions that the ISO
standard itself makes.  (I also made life easier by ignoring the
Library.)

This is fairly careful to specify that doing things like 
  * looking at uninitialised memory
  * writing or reading unallocated memory
  * causing signed integers to overflow
  * dividing by zero
  * breaking the arcane sequence point rules (e.g., i + i++)
all cause undefinedness, at which point your semantics can throw up
its hands, and say "illegal program" / "abort".

The fact that many implementations won't do this, and will instead
continue to look as if they're doing useful work, is neither here nor
there.

Michael.




To: matthias@ccs.neu.edu
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk
Subject: Re: Formal semantics for C

Matthias Felleisen writes:
 
> The question that I have is what does it mean to have a semantics of C?  In
> my mind, a semantics is a mathematical model that predicts the behavior of
> a program in some programming language.
 
> Unless you nail down the precise hardware, os specs, compiler specs,
> I can't see how one can possibly specify the semantics of C. One
> could specify a very small kernel of C, but as soon as you admit
> arrays you're not going to produce anything that will predict the
> behavior in a reliable manner.
 
I apologise if this sounds as if I'm belabouring the point, but the C
Standard explicitly defines the notion of a "strictly conforming
program", one whose execution is guaranteed to produce the same
behaviour on all platforms.  

Other programs might be guaranteed to work, but because of
non-determinism will give different answers in different
circumstances.  Other programs are beyond the pale. 

A formal semantics for C would classify programs according to the
above, allowing you to predict, for example, that program X will
always run, producing the answer 42; will run, producing a number in
the range 10-15; or that it's illegal, and that its behaviour is thus
unpredictable, or at least, completely dependent on the vagaries of
the implementation.

Michael.





To: Corky Cartwright <cork@rice.edu>
Cc: types@cis.upenn.edu
Subject: Re: Formal semantics for C

On 27-Dec-2001, Corky Cartwright <cork@rice.edu> wrote:
> 
> According to the sanitized semantics that
> maps all unsafe operations to "undefined", none of these security
> breaches can possibly happen

Why do you say that?

If you're talking about the semantics which maps programs that
execute unsafe operations to the universal set of all possible
behaviours, then your claim that this semantics says that these
security breaches can't happen seems obviously false to me.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.





From: Matthias Felleisen <matthias@ccs.neu.edu>
To: Michael.Norrish@cl.cam.ac.uk
Cc: kfl@it.edu, types@cis.upenn.edu, S.J.Thompson@ukc.ac.uk
Date: Fri, 28 Dec 2001 08:50:34 -0500 (EST)


Excuse my ignorance, but is the Standard's notion of "conforming program" a
decidable notion?  If so, is your semantics accompanied by an algorithm? 

-- Matthias