A System for Proving Optimizing Transformations Sound


Contents

Introduction

A compiler optimizer analyzes and transforms programs to improve their performance. This allows programmers to focus on functionality of programs without having to bother about efficiency of the generated code. Optimizers have therefore become an integral part of the modern compilers. However, a mistake in the design or the implementation of an optimizer can proliferate in the form of bugs in the softwares compiled through it.

The issue of soundness of optimizers is addressed at two levels: (1) One time guarantees are obtained at the design level by verifying optimization specifications and (2) run-time guarantees are obtained at the implementation level by validating optimizations performed. We have developed an approach which is uniformly applicable at both the levels.

Optimizations with similar objectives comprise of similar program transformations. Consequently, their proofs of semantic equivalence also consist of similar patterns. To simplify specification, we structure optimizing transformations as sequential compositions of primitive program transformations. For each primitive, we identify soundness conditions which guarantee that the transformation is semantics preserving. Proving soundness of an optimization therefore reduces to merely showing that the soundness conditions of the underlying primitives are satisfied. This avoids repeated proofs of semantic equivalence.

We use a temporal logic to specify program analyses and soundness conditions. Since programs are transformed step-by-step, we need to correlate temporal properties across transformations of programs. We therefore develop a logic called Temporal Transformation Logic and use it to verify optimization specifications.

We also propose a novel validation scheme. An optimizer can be instrumented to generate a trace of its execution in terms of predefined transformation primitives. If the trace conforms with the execution and soundness conditions of each of the primitive in the trace are satisfied then the optimizer preserves semantics of the particular input program.

We have developed a system for proving optimizations sound (SPOTS). It consists of a PVS based framework for specification and verification of optimizations and a framework to validate optimizers of GCC v4.1.0. We have applied both the frameworks to several optimizations.

Keywords: Compiler optimizations, Verification, Translation validation, Temporal logic, Transformations of Kripke structures, GCC

Overview

The SPOTS system consists of the following two components:
  1. SPOTS/PVS is a framework for specification and verification of optimizations.
  2. SPOTS/GCC is a framework for validation of GCC optimizers operating over the Register Transfer Language intermediate representation.

SPOTS/PVS

Figure 1: SPOTS/PVS Framework

SPOTS/PVS is a Prototype Verification System (PVS) based design of a specification language and supporting libraries. An optimization specification consists of definitions of local data flow properties, global program analyses, and an optimizing transformation as a sequence of applications of primitive program transformations. In Figure 1, opt.pvs denotes a PVS file containing specification of an optimization opt.

A program is seen as a Kripke structure whose states are program points and transition relation is the control flow structure of the program. The local data flow properties determine the atomic propositions and the labeling of the states of the Kripke structure with the atomic propositions. A transformation of program statements determines how the local data flow properties of the input and the transformed programs relate. This in turn defines the relation between the atomic propositions of the corresponding Kripke structures.

The TTL library consists of definitions of temporal logic operators and primitive graph transformations. These formulations are novel in their use of boolean matrix algebra. Program analyses in optimization specifications and soundness conditions are defined in temporal logic. Transformations of control flow structure of programs are defined in terms of the primitive graph transformations.

The OVerification library consists of definitions of primitive program transformations. For each primitive, we define soundness conditions which guarantee that the transformation is semantics preserving. We show separately that the transformation primitives preserve semantics if their soundness conditions are satisfied. These are one-time proofs. Since the primitives are small-step transformations, these proofs are much easier than similar proofs for optimizations.

We have designed a utility VCGEN for generation of verification conditions. It analyzes the PVS internal representation of optimization specifications and generates verification conditions. The verification conditions are instantiations of the soundness conditions of the transformation primitives in the context in which the primitives are used in the specification. It generates an auxiliary file opt_lemmas.pvs for introducing optimization specific lemmas. User can add lemmas to this file and use them as intermediate steps for proving the verification conditions stored in the opt_soundness.pvs file.

Proving soundness of the optimization involves discharging the verification conditions. Program analyses and soundness conditions are both expressed in a temporal logic. The program analysis required for an optimization is defined on the input program. Since the program is transformed step-by-step, the soundness conditions of a transformation primitive need to be shown on the version of the program on which it is applied. We therefore develop the Temporal Transformation Logic (TTL) for correlating temporal properties across transformations of programs. The correlations of atomic propositions are given along with the definitions of primitive program transformations in the OVerification library. The TTL inference rules are given in the TTL library. We separately prove soundness of the TTL inference rules entirely in the boolean matrix algebra.

We have specified and verified several bit-vector analysis based intraprocedural optimizations viz.
  • Common subexpression elimination,
  • Dead code elimination,
  • Partial dead code elimination,
  • Lazy code motion,
  • Loop invariant code motion,
  • Optimal code motion.

SPOTS/GCC

Figure 2: SPOTS/GCC Framework

Transformation primitives and temporal logic operators in SPOTS/PVS are defined in such a way that they can be directly evaluated in the PVS ground evaluator. This provides a means to validate the specifications. Apart from verification, we also perform validation of optimization specifications by specifying sample programs as PVS theories. An optimization specification is simulated on sample programs and its verification conditions are checked using the PVS ground evaluator.

This evaluatable nature of specifications in SPOTS/PVS facilitates a novel validation scheme which we call ``validation against trace''. An optimizer implementation can be instrumented to generate a trace of its execution in terms of predefined transformation primitives. A trace is a sequence of applications of transformation primitives on the intermediate program being optimized. The generated trace is not required to be trusted as an accurate description of the execution of the optimizer.

The validation of a particular run of the optimizer is done in two steps:
  1. The trace is simulated on the intermediate program being optimized. If the output program thus obtained matches with the optimized program generated by the optimizer then the trace conforms with the execution of the optimizer.
  2. The soundness conditions of the transformation primitives used in the trace are checked on the appropriate versions of the intermediate program being optimized. If the soundness conditions are satisfied then the trace preserves semantics. Consequently, the optimizer also preserves semantics of the particular intermediate program.

Similar to verification, this validation scheme also takes advantage of common patterns of transformations and of semantic equivalence proofs. This scheme does not directly check semantic equivalence of the input and the optimized programs. It involves only simulation of transformation primitives and checking satisfaction of soundness conditions and is therefore amenable to automation.

We apply this validation scheme to GCC v4.1.0. The schematic of the SPOTS/GCC framework is shown in Figure 2. We instrument several optimizers of GCC v4.1.0 to generate traces of their executions in terms of the transformation primitives defined in SPOTS/PVS which is a compiler independent framework. The input and the optimized intermediate programs of GCC are converted to PVS theories and subsequently validated in SPOTS/PVS.

We consider optimizations of Register Transfer Language (RTL) intermediate code. We first convert an RTL program into a simplified (spots) format. The spots representation of the input and the optimized programs together with the generated trace (test.c.spots) is then converted into a PVS theory (test.c.pvs). The spots representation and the actual parameters for the transformation primitives for the primitives used in the trace are entirely expressible as ground terms.

We also automatically generate verification conditions for the trace. Apart from the soundness conditions of the primitives this also includes type checking of the program representation. The type checking verifies whether the content and the control flow of a program are consistent. The verification conditions are checked in the PVS ground evaluator. The definitions of the transformation primitives and their soundness conditions are those from the SPOTS/PVS framework.

A GCC generated trace may consist of steps which do not satisfy soundness conditions of the respective primitives. We therefore apply certain heuristics to convert a trace into an equivalent one s.t. the new trace satisfies the soundness conditions of the underlying primitives. Note that since we check conformance of a trace with the actual optimization, our scheme is sound even though potentially incomplete.

We additionally generate dot representation of all the programs generated by the trace (test.c.dot). It is then used to generate a PS file test.c.ps. This aids in visualizing the actual transformations being performed on the intermediate programs.

We have validated optimizations of test programs written in a subset of C for all optimization flags. In particular, we have validated the following optimization routines in GCC v4.1.0: loop invariant code motion (loop.c/move_movables), partial redundancy elimination (PRE) or global common subexpression elimination (GCSE) through lazy code motion (gcse.c/pre_gcse), PRE/GCSE through code hoisting (gcse.c/hoist_code), copy and constant propagation (gcse.c/cprop).

Our intuitive formulation of optimizing transformations as sequences of primitive transformations is reflected in the implementation of a real compiler like GCC. In fact, we can find transformation routines in the GCC code which directly correspond to our predefined transformation primitives. The instrumentation therefore is quiet straightforward. We instrument a GCC transformation routine to dump a signature of its execution in terms the corresponding SPOTS/PVS primitive(s) based on the actual arguments passed to it during execution. The trace is generated only when appropriate flags are set. The flags are set and reset appropriately from the optimizations which we are validating.

Dowloading and Installation

Installation and User Guide

SPOTS/PVS Files spotspvs.tgz

SPOTS/GCC Files spotsgcc.tgz

SPOTS is distributed under the GNU General Public License. For more details see the COPYING file in the distribution.

Documentation and Publications