Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CHET: A System for Checking Dynamic Specifications - Paper | INTL XLIST, Papers of International Business

Material Type: Paper; Class: Courses of Interest to Students Concentrating in International Relations; Subject: International Relations; University: Brown University; Term: Unknown 1989;

Typology: Papers

Pre 2010

Uploaded on 08/16/2009

koofers-user-e3r
koofers-user-e3r 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download CHET: A System for Checking Dynamic Specifications - Paper | INTL XLIST and more Papers International Business in PDF only on Docsity! CHET: A System for Checking Dynamic Specifications Steven P. Reiss Department of Computer Science Brown University Providence, RI 02912 spr@cs.brown.edu Abstract Software specifications describe how code is suppose to behave. Software model checking and related activities statically investigate software behavior to ensure that it meets a particular specification. We have developed a tool, CHET, that uses model checking techniques to do large- scale checking of dynamic specifications in real systems. The tool uses a finite state specification of the properties to check in terms of abstract events. It first finds all instances in the system where this specification is applicable. For each such instance, it creates an abstract model of the software with respect to the events and then checks this model against the specification. Key aspects of CHET include a full interprocedural flow analysis to identify instances of the specifications and restrict the resultant models, and greatly simplified abstract programs that are easily checked. The system has been used to check a vari- ety of specifications in moderate-sized Java programs. 1. Introduction Many of the problems with software systems derive from the code not behaving as it is specified to do. In today’s world of components and libraries, one of the most pressing problems is ensuring the libraries and components are used correctly. This is true both for system libraries and outside components as well as for the packages and inter- nal components that comprise the system. Our goal was to develop a framework where the behav- ior or usage of a class or package can be specified and then checked against a program. We wanted this framework to be practical and useful on real systems. It had to be able to take specifications from a variety of sources. It had to be fully automatic, taking a specification, finding all instances of that specification in the program, and then checking each instance. It had to be efficient so it can work with large programs. Moreover, to be useful, it had to be accu- rate, identifying all violations of the specifications with a minimum of false positives. The framework we have developed meets most of these criteria. In the next section we discuss some of the related work in this area. The following sections describe our approach, first by providing the overall architecture and then going into detail on the various components. We con- clude by discussing our experiences and future directions. 2. Related Work Static checking of software systems has a long history that includes original attempts at proving software correct, extended compiler checking such as Lint [27], static condi- tion checking as in CCEL [10], and verification-based static checking such as in LCLint [14]. More recently, there has been a significant body of work on software model checking [20]. Software model checking typically starts with a soft- ware system and a property to check. The software system is then abstracted into a representation that is more amena- ble to model checking by abstracting the original program into a much smaller program and then converting that program into a finite state representation. The various systems that have been developed differ in what they con- sider the software system to be checked, in the way they define the property to be checked, in the way they do abstraction, in how they map the program into a finite state representation, and in how they actually do the checking. Our system combines aspects of a number of other systems to produce a practical alternative that works on real pro- grams. Most of the software model checking systems start with source code. For example, the original Java Pathfinder [15,16] translated Java programs into Promela, the input language for the SPIN model checker [19,20]. Java Pathfinder 2 [4,25,29] instead starts from Java byte codes, essentially a binary representation, while the Bandera system [7,8,11] uses both the source and the byte code rep- resentation. The advantages of using the binary representa- tion are that it is often simpler than the source, one does not have to worry about the vagaries of the language, and, most importantly, one can easily check not only the user’s system but also libraries and the interactions between the user’s system and libraries. Thus we also start with byte code, using IBM’s JikesBT package [23]. Finite state automata are the principal representations used for specifying properties. These are defined either directly or using a language that can be mapped into a finite state representation. The automata are triggered by program events and it is the characterization of these program events that differentiates the systems. Most of the systems including Bandera and Flavers [6] require that the user explicit define events or predicates in terms of the code for each item being checked. Other systems such as Metal [13] and ESP [9] use simple parameterized source code patterns which let the programmers specify all events of a given type with a single specification. SLIC [1] achieves the same effect using an event-oriented language. The MaC framework [24] takes a similar approach for specifying dynamic instrumentation using an event defini- tion language. Patterns have also been used to simplify the definition of commonly occurring idioms in the specifica- tions [12]. Our approach provides the functionality of Metal or ESP using an event-based specification similar to MaC or SLIC. The key to successful software model checking is the generation of small abstractions that reflect the property being checked without irrelevant details. The different approaches do this in different ways. The C2BP package within SLAM [2,3] and the DEOS system [28] convert the user’s code into a Boolean program using predicate abstraction where each predicate relevant to the specifica- tion being checked is replaced with a Boolean variable. Bandera uses data abstraction to map the program types into abstract predicates that can be finitely modeled. Trail- blazer looks only at control flow events and actions and eliminates all data [21]. ESP does a combination of control and data flow analysis to build a simplified version of the original program. Flavers constructs a trace flow graph by inlining control flow graphs of the various methods and adding arcs to represent synchronization events. Java Path- finder 2 uses static analysis to reduce the state space by finding concurrent transitions [5]. Bandera, ESP, Java Path- finder, and Flavers all use some type of slicing technology to restrict the abstraction to those portions of the program that are relevant to the conditions being checked. BLAST takes a additional step, using the verification process to identify what needs to be refined in the abstraction and building a new model based on this information [17]. Later work on BLAST uses Craig interpolation and proof tech- niques to better the abstraction [18]. Our approach to date is probably closest to that of ESP in that we use both control and data flow analysis. However, we do not attempt to find all relevant variables, which greatly simplifies the abstraction in exchange for a loss of accuracy, and we achieve the effect of path-sensitive analysis using automata simplification techniques. The various systems also differ in their representation of an abstract program for checking. Some of the systems actually generate an automata. For example, Flavers inlines routines and adds synchronization arcs to build a single large automata that can be checked. Bandera and the first Java Pathfinder map the program into Promela, the input language for the SPIN model checker. Our approach is dif- ferent. We generate an abstract program with calls, syn- chronized blocks, and events. This lets us handle complex and recursive programs easily and compactly. In addition, we use a Flavers-like automata (still with calls, synchro- nized blocks and events) to represent the behavior of each program thread other than the primary one. We do not attempt to generate a program that includes all the possibly relevant state variables on the assumption that this is too much in a large system. Instead, we include a minimal set of variables and let the specification indicate additional ones as needed. Checking in Bandera is done using the SPIN model checker. SLAM and Java Pathfinder 2 use their own model checkers, SLAM’s is based on Boolean programs, and Pathfinder’s is based on a modified JVM [4,25]. Our approach has been to develop our own checker to match our program-like abstraction representation. The checker is unique in the way it handles routines and synchronization, and extends from a detailed single-threaded analysis to an approximate multithreaded analysis quite naturally. 3. Architecture Our framework uses multiple passes to first find all occurrences of the given specifications in the user program, then to check each of those occurrences. The checking itself is done in two phases: the first builds a model of the program with respect to the particular instance of the spec- ification and the second checks whether that model meets the specification. We start with the code base of the program and a set of specifications. The code base is given as the set of Java class files that comprise the system, and involves not only the user’s code but also all system and other libraries that are used by the program. The specifications are defined as extended finite state machines over program events. These are described in detail in the next section. The first phase of the system does a full interprocedural flow analysis to find all the instances of each specified check and to identify the potential set of events for each instance. This is done by tracking sources in the code where a source is the trigger or variable used in a specifica- tion event. This is described in Section 5. The result of the data flow is used to create a set of spec- ification instances each of which is the combination of a single specification automaton and a single source of the appropriate type. For each of these instances, the system creates a abstract program consisting of a set of finite state machines each representing a particular method. The nodes of the machine represent actions that either generate the Fixed sources are normally used for those instances where values originate outside the analyzable object code. An additional optimization in our flow analysis allows the algorithm to use a fixed source in place of local sources for particular types. This is used, for example, for sources that represent exceptions in library methods since we rarely need to track any detailed information associated with these. In addition to these various optimizations, the algorithm is designed to work with complete Java programs. This entails dealing with all the complications of such programs including native code, exceptions, threads, synchroniza- tion, callbacks, dynamic loading and binding using reflec- tion, and large numbers of library routines. It also means tracking the implicit execution semantics of Java such as calls to static initializers and implicit field initializations. We handle the latter by encoding the implicit semantics into the flow analysis algorithms, for example ensuring that the static initializer for a class is called before we evaluate any methods of that class (unless the methods are called from within the static initializer). For the other issues, we use a common solution that lets the programmer specify routines that are to receive special handling. Method special handling can take a variety of forms. Standard library and native methods can be flagged so that their internals are ignored and the value they return is a fixed source of the appropriate type. Methods that return values other than their declared type (e.g. are declared to return an interface type) can be declared to return a fixed source that is mutable. Such a source will be automatically converted to another fixed source upon an implicit or explicit cast. Other methods can be flagged with a substi- tute method. This is used for some internal java security calls, for methods that dynamically bind to implementation classes, and for methods such as Thread.start which actu- ally invokes Thread.run asynchronously. Other methods, such as System.arraycopy need to be treated as special cases and are flagged as such. Finally, methods that register callbacks can be flagged so that the parameters for the call- backs will be computed correctly and the callback will be invoked as part of the dataflow. The result is a package that does source-based data flow analysis of real java programs including all the various libraries and does it relatively efficiently. It handles small systems (5000 methods, 200,000 byte codes, 5000 in the system itself) in under 1 minute. On CHET itself (6400, 340,000/47,000), it takes under 3 minutes. On a larger project that includes 12 different executables (10158, 575,000/100,000) it takes 12 minutes. Once this data flow analysis is complete, the framework identifies all instances of the specifications. It does this by finding, for each given specification, all instances of a source created by a trigger event for that specification. The instance is stored as the combination of the specification and this source. 6. Building Program Abstractions Once we have identified an instance of a specification, we need to check that instance. We do this in two steps, first creating an abstraction of the application that only includes those portions that are relevant to the particular instance, and then checking if this abstraction meets the specification. The generated abstraction here is actually an abstract program that generates events for the specification. 6.1 The Abstract Program This abstract program is generated to ensure that if there is an execution of the actual program which exhibits a certain sequence of specification events, then there is an execution of the abstract program that generates the same sequence. This is again conservative in that the abstract program may generate sequences that can never be exhib- ited in the actual program. The abstract program consists of a set of routines. Each routine is composed of nodes and arcs similar to an autom- ata. There are actions associated with each node, but the arcs are uninterpreted. The associated actions control the behavior of the program and the generation of events. The current actions include: • Enter a routine. • Exit a routine (return or end of program). • Call a routine. • Generate an event. • Set a variable to a given value. • Set the return value for the current routine. • Test a variable or return value. In addition, to facilitate checking of multithreaded applications, we have added the following actions: • Asynchronous call of a routine. • Begin synchronization for a set of sources. • End synchronization for a set of sources. • Wait or timed wait on a set of sources. • Notify or notify all on a set of sources. Execution of this abstract program is nondeterministic. Consider the single threaded case. At any point in time there is a current node. This node is executed to determine the current node at the next point in time. If the node is a call, then the current node is pushed onto the call stack and the next current node is the enter node of the called routine. If the node is a return, then the calling node is popped off the call stack. If there was no calling node, the program exits normally. If there was, then one of its successor nodes is chosen nondeterministically as the next node. If the node is an event node, a variable set, or a return set, the next node is chosen nondeterministically from this node’s suc- cessors after an event is output or the program state is changed accordingly. If the node is a test node and the test fails, the program fails; if the test succeeds then the next node is determined nondeterministically from the succes- sors. The threaded case assumes that there are multiple such programs with a common program state. A new thread is created by an asynchronous call node. Synchronization can be applied by keeping track of the set of sources that are currently being synchronized on and having a synchronize node block (i.e. use the current node as the next node) if synchronization would fail. This is not accurate in general since we cannot guarantee from the flow analysis that the same source from the analysis implies the same object at execution time; a true conservative approach requires us to ignore the synchronization statements. Wait and notify can also be modeled conservatively. A true wait blocks until there is a notify or notify all that shares a common source. Then it nondeterministically chooses to either continue blocking or to proceed to one of its successor nodes. As an example of an abstract program, the code in Figure 2 yields the automata shown in Figure 3. Here entry nodes are rounded boxes containing the name of the rou- tine, exit nodes are empty circles, and event generation nodes are boxes containing the event name. The automata for main is the same for both instances of the iterator check. The top automata is the one generated for the first instance of the iterator in printList, while the bottom automata is the one generated for the second instance. 6.2 Building the Abstraction We build a program abstraction by mapping each method of the system being analyzed into an abstract program routine. The methods here are those that were used in the flow analysis phase, so that a method that was inlined in multiple versions actually appears as multiple methods in the abstraction. Moreover, we add additional methods to represent complex virtual calls, with the new method simply doing a parallel call of all possible alterna- tives as determined by the data flow analysis. The program abstraction is generated in three phases. First, a prepass checks all the methods in the system to determine which ones are definitely not relevant to the given specification and instance. These methods and any calls to them can be ignored. Next, we construct an automata for each potentially rel- evant method. This is done by making a symbolic execu- tion pass over the code for the method. A new node is created whenever there is a synchronization entry or exit, when an event for the specification would be generated, when a method that is not ignored is called, when a moni- tored field is accessed, when a method returns, and at the start of each basic block. When a conditional occurs and one of the items tested is a monitored field or a return value, then nodes testing the value of the field are gener- ated for the different resultant branches. This generation is done using a path-sensitive analysis. While doing the symbolic execution, we keep track of the values on the stack and in local variables in a minimal way. For objects, we track whether the value is null or non-null. For numbers, we note if the value is constant and if so, what constant. When we have a branch in the program to the start of a basic block, we create a new abstract program node each time we have a different value set. This lets us construct finite programs that reflect local variable values. This value-based generation can be turned on or off for each particular method. We currently do it for all methods that are less than a certain size (currently 400 byte codes). This provides accuracy for most items while avoiding the relatively small number of cases where the procedure gen- erates an initial excessively large graph. The third phase of program abstraction is to simplify the resultant automata. This first involves local simplification, where we eliminate nodes without actions (e.g. all the extra nodes we inserted for basic blocks), eliminate empty or unneeded synchronized regions, eliminate unneeded tests, and eliminate meaningless returns. Second, it involves finding methods where the graph becomes trivial and elim- inating these automata and any calls to them. These two steps are done concurrently using a worklist algorithm. A final step involves applying automata minimization to each remaining automata. 7. Checking Abstractions Once we have generated the abstract program, the next step is to check whether all sequences of events that can be generated by that program are consistent with the given specification. main init printList printList has next initprintList next FIGURE 3. Generated automata for Simple. Many of the types of specifications we want to check. for example design patterns, UML interaction diagram, and class usage, are generally not thread-related. Thus, we first developed a means for efficiently and accurately checking specifications for the single threaded case and then extended this to the multithreaded case. 7.1 The Single Threaded Case The overall approach to testing specifications is to determine the set of checking states that are reachable at each node of the program. We first define what we mean by a checking state. This has to reflect the program state and the state in the specification being checked. A checking state thus consists of a state from the specification (e.g. S2 from Figure 1) along with values for each of the monitored variables and the latest return value. Value settings are cur- rently limited to {Null, NonNull, Unknown} for objects and either a specific value or Unknown for numerics. Next we determine for each routine and each possible checking state on entry to that routine, the set of checking states that are possible on exit. This is done using a worklist algorithm that takes a node and a set of states that can apply at the start of the node and then computes the set of states that apply at the start of any successors to the node. Each node is handled based on its associated action: • Start nodes just propagate the current state to their suc- cessors. • End nodes add the set of states that are generated to the states that apply after each corresponding call node, queuing up call nodes that might have changed. • Call nodes check if the called method has been checked for each of the current states. If it has, they propagate the result states of the call to the successors; if not, they queue the starting node of the called method for later checking. • Event nodes modify the state by applying the transition given in the specification for the given event. • Field set and return nodes modify the state by changing the value of the field that is being set. • Test nodes check the value of the field and either propa- gate the current state or nothing to their successors. We note that this process handles recursion correctly. A recursive routine will be checked once for each achievable entry state. At least one of these states should represent the bottom of the recursion and thus should yield an output state. Propagating this state back, even through recursive calls, produces the correct set of output states in the light of recursion. The final stage is to look at the possible exit states of each main program. These represent the final states that can be reached in any execution and thus indicate whether the specification succeeds by reaching an accepting state or fails by reaching an error state. To handle programs that don’t return or don’t return if a particular state is reached, we distinguish specification states for which all transitions go to the state itself. These states typically represent either error conditions or a desired target state. Whenever the simulation gets into one of these states, we simulate an immediate return from the current method. This ensures that if the program can reach one of these “final” states, the algorithm will detect it. For the example of Figure 3, the algorithm finds one final state for the first instance and two for the second. For the first instance, it notes that it is always the case that starting in main in state S1, one will end up in state S3. For the second instance, it finds that starting in state S1, one can actually end up in either state S2 where the iterator has been allocated but never used, or state S4, the error state. 7.2 The Multithreaded Case We had several choices in extending this approach to handle real multithreaded programs. One approach would be to model each thread as a separate program as above and track the cross product of the states at each point. This would require, however, that the state include the call stack which would make it potentially infinite. The alternative we use is to find all instances of threads (based on asyn- chronous call nodes) and convert each into an automata based on the method graphs. This eliminates the call stack as part of the state while still preserving much of the infor- mation in the abstract program. We build an initial thread automata using an inlining process, handling recursive calls by only having one copy of each method in the resultant graph. Then we simplify the resultant automaton first by removing unnecessary nodes and then doing automata minimization. The result is again a conservative approximation to the original pro- gram, ensuring that any execution of the thread in the orig- inal program will be reflected by some execution of the automata, but allowing executions of the automata that do not correspond to program executions. Once we have constructed an automata for each possible thread in the program, we can extend the single-threaded checking approach to handle multiple threads. We start by extending the notion of a checking state to include thread information. We first add synchronization information to the checking state in the form of the set of sources that are currently synchronized for each active thread. Second, the checking state is extended to include the automaton node of each active thread. We allow a finite number of instances of each thread to be created, where the number is deter- mined by the specification and defaults to three.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved