![]() | ![]() | ![]() | Analysis Scope Construction |
A pre-requisite to analyzing a Java program using any program analysis framework, including Chord, is to compute the analysis scope: which parts of the program to analyze. Several scope construction algorithms (so-called call-graph algorithms) exist in the literature that differ in scalability (i.e., how large a program they can handle with the available resources) and precision (i.e., how much of the program they deem is reachable).
Chord implements several standard scope construction algorithms. Besides scalability and precision, an additional metric of these algorithms in Chord that can be controlled by users is usability, which concerns aspects such as excluding certain code from being analyzed even if it is reachable, and modeling Java features such as reflection, dynamic class loading, and native methods. These features affect which code are reachable but, in general, they cannot be modeled soundly by any program analysis framework. The best a framework can do is provide stubs for commonly-used native methods in the standard JDK library (e.g., the arraycopy method of class java.lang.System), offer users a range of options on how to resolve reflection (e.g., an option might be running the program and observing how reflection is resolved), etc.
Chord computes the analysis scope of the given program either if property chord.build.scope is set to true or if some other task (e.g., a program analysis specified via property chord.run.analyses) demands it. The following sections describe Chord's analysis scope computation in detail. Section * describes how to reuse the analysis scope computed in a previous run of Chord for a given program. Section * describes Chord's analysis scope construction algorithms. Finally, Section * describes how users can exclude certain classes from the analysis scope.
If property chord.reuse.scope has value true and both files specified by properties chord.methods.file and chord.reflect.file exist, then Chord regards those files as specifying which methods to consider reachable and how to resolve reflection, respectively.
The format of the file specified by property chord.methods.file is a list of zero or more lines, where each line is of the form mname:mdesc@cname specifying the method's name mname, the method's descriptor mdesc, and the method's declaring class cname (e.g., main:([Ljava/lang/String;)V@foo.bar.Main).
The format of the file specified by property chord.reflect.file is of the form:
# resolvedClsForNameSites ... # resolvedObjNewInstSites ... # resolvedConNewInstSites ... # resolvedAryNewInstSites ... |
where each of the above "..." is a list of zero or more lines, where each line is of the form bci!mname:mdesc@cname->type1,type2,...typeN meaning the call site at bytecode offset bci in the method denoted by mname:mdesc@cname may resolve to any of reference types type1, type2, ..., typeN. The meaning of the above four sections is as follows.
If property chord.reuse.scope has value false or the files specified by properties chord.methods.file or chord.reflect.file do not exist, then Chord computes analysis scope using the algorithm specified by property chord.scope.kind and then writes the list of methods deemed reachable and the reflection resolved by that algorithm to the files specified by properties chord.methods.file and chord.reflect.file, respectively.
The possible values of property chord.scope.kind are [rta|cha|dynamic] (the default value is rta). The following subsections describe the scope construction algorithm that Chord runs in each of these three cases. In each case, Chord at least expects properties chord.main.class and chord.class.path to be set to the fully-qualified name of the program's main class (e.g., com.example.Main) and the program's application classpath, respectively.
If property chord.scope.kind has value rta, then Chord computes analysis scope statically using Rapid Type Analysis (RTA). RTA is an iterative fixed-point algorithm. It maintains a set of reachable methods M. The initial iteration starts by assuming that only the main method in the main class is reachable (Chord also handles class initializer methods but we ignore them here for brevity; we also ignore the set of reachable classes maintained besides the set of reachable methods). All object allocation sites H contained in methods in M are deemed reachable (i.e., control-flow within method bodies is ignored). Whenever a dynamically-dispatching method call site (i.e., an invokevirtual or invokeinterface site) with receiver of static type t is encountered in a method in M, only subtypes of t whose objects are allocated at some site in H are considered to determine the possible target methods, and each such target method is added to M. The process terminates when no more methods can be added.
If property chord.scope.kind has value cha, then Chord computes analysis scope statically using Class Hierarchy Analysis (CHA). The key difference between CHA and RTA is that for invokevirtual and invokeinterface sites with receiver of static type t, CHA considers all subtypes of t in the class hierarchy to determine the possible target methods, whereas RTA restricts them to types of objects allocated in methods deemed reachable so far. As a result, CHA is highly imprecise in practice, and also expensive since it grossly overestimates the set of reachable classes and methods. Nevertheless, Chord allows users to control which classes are included in the class hierarchy, and thereby control the precision and cost of CHA, by setting property chord.ch.kind, whose possible values are [static|dynamic] (the default value is static).
Chord first constructs the entire classpath of the given program by concatenating in order the following classpaths:
All classes in the entire classpath (resulting from items 1-3 above) are included in the class hierarchy with the following exceptions:
If property chord.scope.kind has value dynamic, then Chord computes analysis scope dynamically, by running the program and observing the classes that are loaded at run-time. The number of times the program is run and the command-line arguments to be supplied to the program in each run is specified by properties chord.run.ids and chord.args.<id> for each run ID <id>. By default, the program is run only once, using run ID 0, and without any command-line arguments. Only classes loaded in some run are regarded as reachable but all methods of each loaded class are regarded as reachable regardless of whether they were invoked in the run. The rationale behind this decision is to both reduce the run-time instrumentation overhead and increase the predictive power of program analyses performed using the computed analysis scope.
Chord can be instructed to exclude certain classes in a given program from being analyzed. This functionality might be desirable, for instance, if the given program contains a larger framework (e.g., Hadoop or Android) which must not be analyzed. Chord provides three properties for this purpose. The value of each of these properties is a comma-separated list of prefixes of names of classes. Chord treats the body of each method defined in each such class as a no-op.
Note: The value of each of the above properties is a list of prefixes, not regular expressions. A valid value is "java.,com.sun.", but not "java.*,com.sun.*".
![]() | ![]() | ![]() | Analysis Scope Construction |