[Prev][Next][Index][Thread]

Re: Formal semantics for C




Matthias writes:

> It is my believe that C is an "assembly" language for building and
> "scripting" Unix and that the people who designed C didn't think about
> a mathematical, machine-independent semantics and, frankly, they and their
> friends who use it, probably don't care about it. 

Okay, I'll take the bait. :-)

(1) Is C an assembly language?

C's machine model is very close to the machine model of an assembly
language.
In particular, the behavior of illegal C programs can be explained only in
terms usually reserved for assembly language.  There are differences,
however.  C doesn't have any notion of registers (ignoring the ignorable
"register" keyword).

C's term language is quite different than an assembly language.  It's easy
to overlook the obvious here: For example, C has hierarchical scope and
aggregate type definitions.  Now, *if* C were memory safe, features like
these would be great for user-defined abstractions.  Not being memory safe
is not equivalent to being an assembly language, however.

Finally, translating C to an assembly language is "much easier" than
translating one assembly language to another.  One reason is precisely that
C defines many programs to be illegal.

(2) Was C designed for building and "scripting" Unix?

Sure.  And there are a thousand things I would change about C.  More
important than its original use, though, is that C is used for far too many
things today -- it can't possibly be the "right" language for all of them. 
More on that below.

(3) Did the designers care about mathematical, machine-independent
semantics?

No. Many language designers don't, and that's unfortunate.  But it seems
clear from the conversation and the cited work that the real issue here is
safety.
That is, with safety, formalizing C would be a nearly solved problem.

Machine-independence is where I'll make my most controversial point: The
world needs non-assembly languages with "machine-dependent behavior".  Okay,
I don't quite mean that.  What I mean is that you should be able to write a
program that assumes, for example, pointers are 32-bits.  Try to compile it
for a 64-bit machine, and maybe the compiler just says no.  The second
biggest problem with C (safety the first), is that it is extremely hard to
write portable programs and you get very little automated help in doing so.

What we need is a well-defined portable (sub)language, an unportable
(sub)language, and easy interoperability between the two.  I'm surprised I
haven't seen this need articulated a gazillion times.

We also need a well-defined safe (sub)language, an unsafe (sub)language, and
easy interoperability between the two.  I'm working on that one. :-)

--Dan

-- 
| Dan Grossman     www.cs.cornell.edu/home/danieljg H:607 256 0724 |
| 5157 Upson Hall  danieljg@cs.cornell.edu          O:607 255 9834 |