1 Ë KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING INFORMATICA Celestijnenlaan 200 A B-3001 Leuven Static and Dynamic Verification of Indirect Data Sharing in Component-based Applications Promotoren : Prof. Dr. ir. F. PIESSENS Prof. Dr. ir. P. VERBAETEN Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen door Lieven DESMET Januari 2007
3 Ë KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING INFORMATICA Celestijnenlaan 200 A B-3001 Leuven Static and Dynamic Verification of Indirect Data Sharing in Component-based Applications Jury : Prof. Dr. ir. W. Sansen, voorzitter Prof. Dr. ir. F. Piessens, promotor Prof. Dr. ir. P. Verbaeten, promotor Prof. Dr. ir. E. Steegmans Prof. Dr. ir. W. Joosen Prof. Dr. T. Holvoet Prof. Dr. D. Gurov (KTH Royal Institute of Technology) Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen door Lieven DESMET U.D.C D1, D2, D46 Januari 2007
4 c Katholieke Universiteit Leuven Faculteit Ingenieurswetenschappen Arenbergkasteel, B-3001 Heverlee (Belgium) Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. D/2007/7515/3 ISBN
5 Abstract Modern software systems evolve towards modularly composed applications, in which existing software components are reused in new software compositions. Current component-based software systems often promote simple syntactical interfaces to make the wiring of components simple. In practice however, these looselycoupled components sometimes tend to have semantical dependencies, for example by passing data to each other on a shared data repository as is the case in datacentered applications. Typically, these hidden semantical dependencies are not checked at compile-time and lead to run-time errors in current software systems. The main contribution of this thesis is an approach to reduce the number of runtime errors due to broken data dependencies in data-centered, component-based applications. We have built up extensive hands-on experience with such applications in two application domains: networking software and web applications. This experience shows that errors caused by broken data dependencies are important in practice. Our approach is based on the formal verification of a composition property, the no broken data dependencies property. This formal verification is achieved in two steps. In a first step, we statically verify the desired composition property in deterministic software compositions. To do so, each component of the composition is extended with a component contract, stating in a problem-specific contract language what the shared data interactions of the component are. These contracts are then verified in two steps. First, the compliance of the component s implementation with its contract is checked. Next, the different component contracts are used to verify the composition property in a given composition. In a second step, we extend the verification of the desired composition property towards reactive software systems by combining the static verification approach with run-time checking. Reactive systems are characterised by their nonterminating behaviour and perpetual interactions with their environment, as is for example the case in graphical user interfaces. In addition to the approach for deterministic compositions, the expected, reactive protocol between the client and the server is expressed in a labelled state machine. Next, we apply the earlier proposed solution to verify statically whether or not the desired property is vioi
6 ii lated in a given composition, while assuming that the actual interaction protocol complies with the expressed state machine. Finally, we use the run-time checking capabilities of a Web Application Firewall to guarantee that the incoming requests of a user s session adheres to the verified labelled state machine and thus that the no broken data dependencies composition property also holds in the given composition. Finally, we validated the formal verification of the no broken data dependencies composition property in both a deterministic and a reactive software composition. The static verification approach is successfully applied to the medium-sized open-source webmail application GatorMail, in which more than 1350 shared data interactions are present. In addition, the combination of static verification and run-time checking by means of a Web Application Firewall is validated in the Duke s BookStore application, a reactive e-commerce application. In both case studies, a limited annotation and verification overhead was measured, and they both illustrated that the presented solution is scalable to larger, real-life software applications thanks to the modular specification and verification process.
7 Voorwoord Preface In onze huidige maatschappij zijn computersystemen en het Internet alomtegenwoordig. Tevens hangen we in ons dagdagelijkse leven steeds sterker af van de goede werking van deze infrastructuur en hun diensten, zowel vanuit een individueel als vanuit een sociaal en economisch perspectief. Het aanbieden van betrouwbare en veilige diensten is dan ook essentieel, maar lang niet evident. Deze doctoraatsthesis onderzoekt daarom in deze context hoe de betrouwbaarheid en veiligheid verhoogd kan worden van componentgebaseerde software toepassingen, waarin indirect data wordt gedeeld. Velen hebben direct of indirect bijgedragen tot het uitvoeren van dit doctoraatsonderzoek, en ik zou dan ook graag iedereen bedanken die op professioneel of persoonlijk vlak bijgedragen heeft tot dit werk. Daarnaast wil ik nog een aantal mensen extra in de schijnwerper plaatsen. Ik zou in de eerste plaats mijn promotoren, professor Frank Piessens en professor Pierre Verbaeten, van harte willen bedanken. Zij hebben de realisatie van dit werk mogelijk gemaakt en mij op deze tocht met raad en daad bijgestaan. Speciale dank gaat hierbij uit naar Frank die instond voor de dagdagelijkse begeleiding. Steeds opnieuw maakte hij tijd voor mij vrij en ik zal mij dan ook nog lang de vele inspirerende discussies en kritische reflecties herinneren. In combinatie met zijn inspirerend enthousiasme en gedrevenheid, zijn talent om mensen te motiveren en zijn gefundeerde onderzoeksvisie maakt dat van hem dan ook een promotor uit de duizend. Naast mijn promotoren wil ik ook de andere leden van de begeleidingscommissie, professor Eric Steegmans en professor Wouter Joosen, van harte danken voor het kritisch nalezen van deze thesistekst. Ik wil Wouter ook bedanken om samen met mijn promotoren mijn onderzoek te helpen kaderen in een bredere onderzoeksvisie. Samen hebben ze binnen DistriNet gezorgd voor een brede waaier aan opportuniteiten en mij tevens voldoende vrijheid aangereikt om mijn eigen onderzoekspiste verder te kunnen uitdiepen. I would also like to thank professor Tom Holvoet and professor Dilian Gurov (from the KTH Royal Institute of Technology in Stockholm) for accepting to be members of the jury, and professor Willy Sansen for chairing the jury. A special thanks to Dilian for travelling all the way to Leuven. iii
8 iv Ik zou eveneens graag mijn collega s van de DistriNet onderzoekgroep willen bedanken voor de interessante discussies en de fijne samenwerking. Ik denk hierbij in het bijzonder aan de leden van de security taskforce en de networking taskforce. Hen allemaal opsommen is onbegonnen werk, maar toch een expliciet dankjewel aan Tine Verhanneman, Yves Younan, Bart De Win, Riccardo Scandariato, Bart Jacobs, Jan Smans, Dries Vanoverberghe, Koen Yskout en Thomas Heyman van de security taskforce; en Nico Janssens, Sam Michiels en Tom Mahieu van de networking taskforce. Verder wil ik ook enkele collega s buiten deze twee taskforces expliciet bedanken waaronder André Mariën, Adriaan Moors, Eddy Truyen, Bert Lagaisse en Marko van Dooren. Verder wil ik ook de vele vrienden buiten de onderzoeksgroep bedanken. Jullie zorgden ervoor dat de computer op tijd en stond werd ingeruild voor de nodige ontspanning. Gaande van een partijtje squash of een toertje met de mountainbike tot een avondje stevig doorzakken, jullie waren altijd wel van de partij. Ik wil ook mijn ouders uitgebreid bedanken. Zij hebben mijn studies mogelijk gemaakt en zo de fundamenten gelegd voor mijn doctoraat. Gedurende die ganse periode hebben ze mij blijvend gesteund en gemotiveerd. Ook een woord van dank aan mijn zus Greet, mijn schoonbroer Pieter en mijn schoonouders. Tot slot wil ik mijn vrouw Kathleen bedanken. Al zeven jaar lang ben je mijn steun en toeverlaat, en laat je dag in dag uit de zon schijnen in mijn leven. Gedurende mijn ganse doctoraat stond je aan mij zijde. Je gaf mij telkens opnieuw een duwtje in de rug wanneer ik het nodig had, ook al was het afwerken van deze thesis ook voor jou soms een hele beproeving. Van harte bedankt! Lieven Desmet Januari 2007
9 Contents 1 Introduction Background Problem Statement Main Contributions Overview of this Thesis Indirect Data Sharing in Loosely-coupled Component Systems Introduction Indirect Data Sharing In Data-centered Applications Case Study 1: Component-based Web Applications Web Applications A Servlet-based E-commerce Application Case Study 2: Component-based Protocol Stacks Protocol Stacks DiPS+ Rapid Prototyping Infrastructure A Simplified DiPS+ IPv4 Layer Development of the DiPS+ Radius Layer Goal and Scope of this Work Related Work Conclusion Dependency Analysis of the GatorMail Webmail Application Introduction GatorMail The Struts Framework Composition Example Dependency Analysis Exploring Dependencies Heuristic Identification of Dependencies Abstract Application Model Results v
10 vi CONTENTS Overview of Dependencies Key Characteristics Conclusion Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems Introduction Background Component Contracts and Static Verification Problem Statement Simplified Application Model Composition Example Desired Composition Properties Solution No Dangling Forwards Property No Broken Data Dependencies Property Unsoundness with ESC/Java Validation Verifying Struts Applications: an Example Results of the GatorMail Experiment Discussion Related Work Conclusion Bridging the Gap Between Web Application Firewalls and Web Applications Introduction Background Web Vulnerabilities and Web Application Firewalls (WAFs) Problem Statement Solution Prototype Implementation Server-side Specification and Verification Application-specific Protocol Verification Run-time Protocol Enforcement Discussion Results of the BookStore Experiment Limitations Related Work Conclusion
11 CONTENTS vii 6 Conclusion Summary Main Contributions Validation Future Work Developer-centric Verification Concluding Thoughts
12 viii CONTENTS
13 List of Figures 2.1 Functional dataflow dependencies A small e-commerce web application Additional constraints on the data flow dependencies OSI reference model versus the TCP/IP protocol suite A protocol stack in DiPS Shared repositories in DiPS A simplified IPv4 protocol layer Dataflow dependencies in the IPv4 protocol layer The IPSec protocol layer Design of RADIUS authentication and accounting in DiPS Data dependencies in the DiPS+ Radius layer Request processing in Struts Composition example in Struts Internal dataflow dependencies in Struts Interaction between client and server Protocol transitions for /preferences.do Data sharing between client and server Structure of preferences.jsp using Struts tiles Abstract application model Composition Processing /modifymessage.do Crosscuttingness of dependencies in GatorMail Composition processing /saveaddresses.do Visualisation of the dependencies for /saveaddresses.do The simplified application model Composition example: scheduling a meeting Overview of the solution /createfolder.do composition in GatorMail Web Application Firewall infrastructure ix
14 x LIST OF FIGURES 5.2 Solution overview Client/server interaction protocol Class diagram of the run-time enforcement engine
15 Listings 3.1 Struts configuration file: struts-config.xml Implementation of an ActionForm: PreferencesForm.java Implementation of an Action: PreferencesAction.java Implementation of a JSP view: preferences.jsp Conditional repository interaction Use of static methods: ActionsUtil.java Use of the JSTL Expression Language: loginform.jsp Extract from the navigation tile: navbar.jsp Framework-specific contract of AddMeetingAction ENBF notation of the framework-specific contract language Contract for declarative forwarding (AddMeetingAction.spec) Contract for indirect data sharing ( NotificationAction.spec) JML contract for indirect data sharing (AddMeetingAction.spec) JML contract of the shared data repository (Request.spec) Composition-specific check method to be verified by ESC/Java Frame condition of NotificationAction Declarative forwarding in Struts JML specification of ActionMapping Declarative forward specification of FolderManageAction JML contract of FolderAction Struts-specific contract of FolderAction Example of a defensive read/write operation in BookDetailsServlet Problem-specific specification of ShowCartServlet ENBF notation of the problem-specific contract language EBNF notation of the client-server protocol Contract for shared session repository interactions (ShowCartServlet.spec) JML contract of the session repository (HttpSession.spec) Protocol-simulating check method to be verified by ESC/Java Component-specific specification of the repository (HttpSession.spec for ShowCartServlet) xi
16 xii LISTINGS
17 List of Tables 3.1 Overview of control flow transitions and shared data interactions in the internal viewpoint Overview of control flow transitions and data passing in the external viewpoint Indirect data dependencies in /createfolder.do JML notation overhead in GatorMail Verification performance Interactions with the shared session repository in the BookStore application Verification performance xiii
18 xiv LIST OF TABLES
19 Chapter 1 Introduction Developing a large software systems completely from scratch is a very time- and resource-consuming activity, which makes custom-made software less cost-effective for a lot of companies and which results in a high time-to-market as well. On the other hand, buying customisable standard software sometimes reduces the flexibility needed in a particular setup or lowers the competitive advantage on competitors who can buy the same software. Component-based software development combines several advantages of both worlds. Components are independent building blocks possibly provided by different vendors. New software solutions are built by combining commercial off-the-shelf components and custom-made components together in new compositions . In , Szyperski et al. define software components as a unit of composition with contractually specified interfaces and explicit context dependencies only. In practice however, software developers and software vendors tend to have a less rigid interpretation of software components and component-based software development. Even in cases where software components also specify their semantical behaviour  (as opposed to just their syntactical programming interface), these contracts are often incomplete or expressed informally. In addition, context dependencies are often neglected. Software systems also become more and more mission critical. In software systems, such as computer networks , application servers  and avionics systems , software failures can cause a considerable loss and damage as business processes strongly depend on the availability and the correct functioning of software systems. The explosion of the Ariane 5 in 1996 for instance is probably the most well-known and costliest software failure ever and is caused by a reuse error due to incomplete specification . In addition, many modern software systems have an increasing demand for security. More and more companies for example incorporate e-commerce in their business model to increase their revenues, but at the same time, their web applica- 1
20 2 Introduction tions tend to be error-prone, and these bugs are a welcome target for attackers due to their high accessibility and possible profit gain. Similarly, NIST s national vulnerability database clearly shows an increasing number of vulnerabilities located in the application layer . Quality characteristics of a software system such as reliability and security can be expressed in terms of software quality attributes or metrics . Several standards and models such as ISO/IEC , the Common Criteria  and the Capability Maturity Model (CMM)  exist, but they often achieve a more qualitative rather than a quantitative measurement of the software quality. Furthermore, in component-based software systems it is quite difficult to guarantee that certain quality characteristics are achieved by composing the individual building blocks, each with their own characteristics. This is especially the case in software systems composing commercial off-the-shelf components , and in runtime reconfigurable or multi-stakeholder distributed systems, for which in-depth quality measurement or integration testing is almost unfeasible. 1.1 Background Within the DistriNet research group, we worked on different techniques to achieve certain security and reliability related quality characteristics in modularly composed software systems. I contributed in three main research tracks, in different research domains. Firstly, we focussed on achieving graceful degradation behaviour in overload situations in component-based software applications. To do so, the concurrency behaviour of an application was separated from the functional building blocks by introducing dedicated concurrency components. In this way, the concurrency model could easily be adapted without the need to modify the existing functional components within a given composition. Two adaptations of the concurrency model were investigated in the context of graceful degradation in component-based protocol stacks. In applications with blocking behaviour, a multi-threaded concurrency model was combined with a selfhealing scheduling algorithm in order to achieve optimal performance results in cases of overload [26, 82]. Similarly, to reflect the business policy, the concurrency model was customised to handle important requests in a prioritised way during overload situations [83, 84]. This first track was research continued upon my MSc thesis  and was joint work with Sam Michiels as part of the SCAN project . Secondly, we focussed on providing additional support to perform threat analyses on web applications. To do so, a generic architecture for web applications was defined, integrating a set of web technologies that are often combined in current web applications such as web services and databases . Based on this architecture, a technology-specific threat analysis was performed on the web service technology using the STRIDE threat enumeration methodology  as
21 1.1 Background 3 reported in . Next, the CORAS methodology  was used to perform risk analysis and to visualise threats, risks and treatments typically involved in web services-based web applications. The abstract risk analysis model was then used as input artefact to automate risk management and treatment selection in a particular web services-based web application . This second track was joint work with several colleagues of the DistriNet Security Taskforce and was part of the DeSecA project . Other partners in this project performed threat analyses on web application containers , security tokens , directory services  and databases . Thirdly, we focussed on achieving reliability and security relevant properties in software compositions by applying formal verification. To deal with different types of bugs in Java-like programs, we explored the combination of program annotations and formal verification techniques to verify the absence of these bugs in a given software system. In our approach, components are first annotated with a problem-specific component contract. Next, the component contracts of a composition are translated to a general purpose specification formalism, that is then fed to an automatic prover. Various researchers in DistriNet have applied this approach to three types of bugs. (1) The author of this thesis verified that data dependencies are not broken in software compositions with a shared data repository. The problem-specific contract annotations specify the indirect data sharing interactions between the components and the shared data repository. In addition, static verification is used to verify that dependencies are not broken in deterministic software compositions . Moreover, the static verification was combined with run-time checking in reactive software systems . (2) In , Jacobs proposes a programming model for concurrent programming in Java-like languages, including the design of a set of problem-specific program annotations and automated verification of compliance. This program model ensures the absence of data races and deadlocks in Java-like programs. (3) In , Smans proposes formal component contracts for stack inspection-based sandboxing, and they verify that at run time the component implementations will not throw security exceptions in C#. For each of these systems, a prototype verifier was built, and some experience in verifying medium sized software systems was gained. The case studies include a chat server, and an open-source web mail application. This third track was part of the SoBeNet project . The approach of formally achieving reliability and security relevant composition properties also strongly relates to other research topics within the DistriNet research group. For instance, with the NeCoMan middleware platform, Janssens provides support for safe, distributed-service adaptations in run-time reconfigurable software systems. The main focus of his research is to achieve a customised, safe adaptation process, i.e. the distributed software system remains in a consistent state during the run-time reconfiguration . This safe adaptation process
22 4 Introduction is complementary to the verification process of this research track. The latter one verifies that among others no dependencies are broken in a given composition, whereas the NeCoMan platform guarantees that the run-time transition from an old (verified) composition to a new (verified) composition is performed in a safe way. This thesis will present in more detail my contributions within this third research track, i.e. the formal verification of the absence of composition problems related to indirect data sharing. 1.2 Problem Statement Data-centered software applications consist of a set of components and a shared data repository . Each component can interact with the shared repository to store data on the repository, or to retrieve data from the repository. By using the data repository, the different components of an application can indirectly share data without actually interacting with each other. This loose coupling adds extra flexibility to software systems and is widely used in various applications and frameworks. A data-centered application is correctly composed if, at run time, each component is able to retrieve the data from the repository that it expects to find. Thus, the correct functioning of a component depends on the run-time state of the shared repository during its execution. Or rephrased, in applications with a shared data repository, implicit semantical dependencies exist between components that share a common data item on the shared repository. Typically, these dependencies are hidden within a software system. Because of this, it is hard for a software composer to reuse existing components in new compositions without breaking any of the hidden data dependencies between the components [32, 34]. Especially in run-time reconfigurable systems where reconfigurations can easily break dependencies, a decent dependency management is crucial to achieve a reliable software system [31, 65]. Moreover, the complexity of shared data dependencies in real-life applications may not be underestimated as was illustrated in the in-depth dependency analysis of the GatorMail webmail application . In this medium-sized software system, already more than 1350 interactions with the shared data repository were identified without any form of documentation. Breaking data dependencies in data-centered applications is a relevant composition problem and typically leads to run-time errors. This results in less reliable and secure software systems. Hence, the main goal of this thesis is to reduce the number of run-time errors due to broken data dependencies in data-centered applications. This thesis will present solutions to detect and prevent such composition problems at compile time or at composition time instead of at run time.
23 1.3 Main Contributions Main Contributions At this moment, the software engineering research community lacks experience reports on the typical composition problems in nowadays software technologies and on the complexity of dependency management in large-scale software systems. In this thesis, we develop good insights in the dependency management of component-based software systems with a shared data repository. We are particularly focussed on dependencies that occur when composing different components that share a common data item on the shared repository. In addition, we perform an in-depth dependency analysis of the medium-sized webmail application Gator- Mail to investigate the complexity of managing dependencies in such data-centered applications. Based on the extensive hands-on experiences, we define the no broken data dependencies composition property for applications with a shared data repository. Achieving this property in data-centered applications eliminates a large number of composition problems, and the typical errors they cause at run time, such as NullPointerExceptions and ClassCastExceptions. The goal of this thesis is then to reduce the number of run-time errors by formally verifying that a given composition does not violate the no broken data dependencies composition property. We present two solutions to formally verify this property, in respectively deterministic and reactive software compositions. Deterministic software compositions are software systems in which the control flow through the application is independent from the environment. In contrast, reactive systems are characterised by their non-terminating behaviour and perpetual interactions with their environment, as is for example the case in graphical user interfaces. The formal verification of the software composition property is based on the use of component contracts. First, the component contracts specify in a problemspecific language the interactions between the component and the shared repository. Next, static verification is used to verify the composition property in deterministic software systems, relying on the different component contracts. In reactive software compositions, this static verification is combined with run-time checking of the system s interactions with the environment. The formal verification of the no broken data dependencies composition property is validated in both a deterministic and a reactive software system. Two existing, medium-sized applications are chosen for the validation experiments. These applications are representative to measure the overhead and test the applicability of the solutions in larger, real-life applications.
24 6 Introduction 1.4 Overview of this Thesis The thesis is structured as follows. Chapter 2 starts with introducing indirect data sharing in loosely-coupled applications with a shared data repository. Next, two case studies from different application domains are presented, illustrating the typical use of a shared data repository. The first case study investigates componentbased web applications built upon the Java Servlet technology, which is part of the J2EE web tier specification. The second case study explores component-based protocol stacks in DiPS+, an in-house developed protocol stack framework. Based on extensive hands-on experiences in both case studies, the precise goals of this thesis are determined and related with existing research. Chapter 3 illustrates in more detail the complexity of hidden data dependencies in loosely-coupled component systems by investigating the servlet-based webmail application GatorMail, written by the University of Florida. Analysis of the different components of GatorMail and their interactions with the shared repository reveals that even in this medium-sized open-source application more than 1350 interactions occur with the shared data repository without being specified or documented. Extending the GatorMail application or reusing some of its components without breaking one of the hidden data dependencies is really hard, and oversight of the composer can easily lead to run-time errors. In chapter 4, parts of a component s contract are made formal and automated tool support is used to verify some level of semantical compatibility at composition time. In particular, a given composition is formally verified to ensure that no data dependencies are broken between components that share a common data item on the shared repository. To do so, a component s contract specifies its interactions with the shared data repository in a problem-specific specification language. Static verification then checks whether the implementation of a component adheres to its contract, and whether in a given composition dependencies between components, that share a common data item, are not broken. Finally, the presented approach is validated on the GatorMail webmail application. Whereas the static verification approach of chapter 4 is limited to deterministic systems, chapter 5 combines static and dynamic verification of the indirect data sharing dependencies to cope with indeterministic, reactive systems. A wellknown example of such a reactive system is a web application, in which a web user typically can choose the next processing step by clicking the link of his choice. In such reactive web applications, the strict request flow enforcement of a Web Application Firewall (WAF) at run time is combined with static verification of the WAF policy in a given software composition. By doing so, the combination of static and dynamic checking ensures a software composer that no shared data dependencies are broken in the reactive system. To conclude, the presented approach is validated in the Duke s BookStore e-commerce application. Finally, chapter 6 summarises the contributions of this thesis and discusses some opportunities for future research.
25 Chapter 2 Indirect Data Sharing in Loosely-coupled Component Systems 2.1 Introduction Modern software systems evolve towards modularly composed services, in which existing software components are reused within new compositions . The different services are interconnected into distributed software systems, using serviceoriented software architectures . Both component-orientation as well as serviceorientation promote a shift towards highly decoupled and reusable software units, which can be implemented independently from each other. Applications are then instantiated by composing the components and connectors of the system in a particular software composition. Recent software systems even enable dynamic reconfiguration of the running application [45, 46]. Hereby, the software system can adapt the current composition towards a new composition by adding, removing and replacing loosely-coupled software units at run time [8, 67]. In this way, highly flexible software systems are achieved, both at composition-time and at run time. Also, software systems become more and more mission critical. In mission critical systems, such as computer networks , application servers  and avionics systems , software failures can cause a considerable loss and damage as business processes strongly depend on the availability and the correct functioning of software systems. Because of the stringent dependability requirements inherent in such critical systems, the composition-time and run-time flexibility should be prevented from jeopardising among others the correct functioning of the system. 7
26 8 Indirect Data Sharing in Loosely-coupled Component Systems Building manageable, large-scale software systems introduces several development constraints and invariants, such as architectural constraints and data typing. Specifying and enforcing compatibility requirements, constraints and invariants has already proven to significantly enhance the safety and reliability of a software system (e.g. syntactic interface compatibility , type systems [87, 6] and ADLs [59, 89]). By composing the individual components, implicit assumptions are made by the software composer about the usage and interaction of the components in a particular composition instance. In particular, during the composition of the components into a working application, composition-specific dependencies can arise between the different components. Different types of composition-time dependencies can be identified. In eventbased systems for example, the implicit invocations in the control flow are clearly specific for a particular composition and impose dependencies between the different publishers and subscribers of that composition. The control flow of an application is the description how the computational control moves around through a running instance of the application or, in other words, which executing method invokes another method and passes hereby the control to the latter one. Similarly, indirect data sharing in data-centered repositories imposes at composition-time dataflow dependencies between the software components. The dataflow describes how data moves between different components. Dataflow may follow the control flow (e.g. in passing arguments through methods), but dataflow and control flow can also have separate paths within an application. In our opinion, keeping such composition-time dependencies implicit within a software system (as is often done in state of the art systems today), increases the risk for software failure due to broken dependencies, especially in a dynamically reconfigurable system. Therefore, in this thesis, we investigate the dependency problems caused by indirect data sharing in data-centered applications in order to reduce the risk for run-time software failures. The rest of this chapter is structured as follows. Firstly, in section 2.2 the concept of indirect data sharing is further explained. Next, two case studies are presented in sections 2.3 and 2.4, in order to demonstrate the importance of dataflow dependencies and the risk for software failure due to violated constraints or broken dependencies. Based on hands-on experiences in both case studies, the precise goals of this thesis are defined in section 2.5. Finally, section 2.6 discusses related research and section 2.7 summarises the contributions in this chapter. This chapter is partially based on the following publications: Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Infrastructural support for data dependencies in data-centered software systems. In Proceedings of the Third AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software, pages 79 80, Lancaster, U.K., March 2004 
27 2.2 Indirect Data Sharing In Data-centered Applications 9 Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Improving software reliability in data-centered software systems by enforcing composition time constraints. In Proceedings of Third Workshop on Architecting Dependable Systems (WADS2004), pages 32 36, Edinburgh, Scotland, May 2004  Sam Michiels, Lieven Desmet, and Pierre Verbaeten. A DiPS+ Case Study: A Self-healing RADIUS Server. Report CW 378, Department of Computer Science, K.U.Leuven, Leuven, Belgium, February 2004  Sam Michiels, Lieven Desmet, Wouter Joosen, and Pierre Verbaeten. The DiPS+ software architecture for self-healing protocol stacks. In Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA-4), pages , Oslo, Norway, June 2004  2.2 Indirect Data Sharing In Data-centered Applications In the repository architectural style , a system consists of a central data structure (representing the state of the system) and a set of separate components interacting with the central data store. This architectural style is quite commonly used in several component models and APIs such as JavaServlet containers , Pluggable Authentication Modules framework (PAM)  and JavaSpaces in Sun s Jini [44, 69]. Sometimes, further refinements are introduced to this style. However, in practice, most of the refinements or additional constraints are only modelled implicitly and are not explicitly checked. A data-centered application is a collection of separate components and a shared repository. The application is correctly composed if, among others, every required data item of a component is provided by another component. In figure 2.1, different dataflows within a data-centered application are explicitly shown, while the actual component interactions (i.e. the control flow) are abstracted. According to Shaw and Garlan, two major subcategories of repositories exist, depending on the control flow within the application. If the central data structure is the main trigger for selecting processes to execute, the repository is called a blackboard. Otherwise the repository can be a traditional database . Indirect data sharing strongly contributes to achieve loosely-coupled components, which can easily be developed by different parties 1. Nevertheless, in our opinion, the very loose nature of such software compositions can also easily lead to 1 In this thesis, we use the term loosely-coupled to reflect the loose coupling of components at the syntactical level , irrespectively of the possible tight binding between them on a semantical level, as will be illustrated later on in this chapter.
28 10 Indirect Data Sharing in Loosely-coupled Component Systems Figure 2.1: Functional dataflow dependencies reliability and correctness problems. In the next sections, two case studies in different domains are presented to illustrate the use of a shared data repository as the glue between loosely-coupled components. Both case studies investigate some reoccurring problems with indirect data sharing in data-centered applications, based on extensive hands-on experiences within the presented technologies. 2.3 Case Study 1: Component-based Web Applications In this section the typical use of a shared data repository in a servlet-based e- commerce application is analysed. The case study is structured in two parts. Firstly, subsection introduces web applications and the web technologies used in this case study. Next, the functionality of the e-commerce application is examined in subsection with special attention to the interactions with the shared data repository Web Applications Web applications are server-side applications that are invoked by thin web clients (browsers), typically using the HyperText Transport Protocol (HTTP). A user can navigate through a web application by clicking links or URLs in his browser, and he is also able to supply input parameters by completing web forms. A URL maps to a server-resident program that is executed with the user s supplied input parameters. The result of the program execution (often expressed in the HyperText Markup Language (HTML)) is then sent back to the browser where it is rendered for further user interaction. HTTP is a stateless, application-level request/response protocol and has been in use on the World Wide Web since 1990 . Since the protocol is stateless,
30 12 Indirect Data Sharing in Loosely-coupled Component Systems fact, five instances of shared repositories are provided to the servlet, each with a different access scope: a data repository associated with the dynamic web page (1), with the web request (2), with the user session (3), the web context (4) and the application (5). Hence, servlet-based applications are data-centered compositions , and the application composer must pay special attention to the dataflow dependencies. JavaServer Pages The JavaServer Pages (JSP) technology is also part of the J2EE specification and is built upon Java Servlets. JSP enables the separation of content and presentation in developing dynamic websites. In JSP, Java code can be embedded into the markup language similar to other technologies such as ASP and PHP. Next to embedding plain Java code, also JSP Tags can be used to encapsulate simple programming logic (such as iterators and boolean tests) and provide the website designer an abstract interface to the model of the application. JSP files are also deployed in a servlet-based web container and are compiled into Java Servlets the first time they are requested within the application. This implies that JavaServer Pages inherit the strengths of Java Servlets, while providing a better separation between logic and markup. In general, JSP files are used to develop the user interface (or view) of a web application. They are loosely coupled, and can communicate anonymously with other JSP files or servlets using shared data repositories A Servlet-based E-commerce Application In this subsection a canonical e-commerce site is analysed in order to illustrate the typical use of the shared data repositories in a servlet-based web application. Figure 2.2 illustrates a subset of the simple e-commerce application. Three different services are identified within the application: adding a product item to the personal shopping basket, the payment of the shopping order and searching through the website. Each box represents a functional task implemented as a servlet, and the services are pipe-and-filter compositions of several independent tasks. Adding a product item to the shopping basket starts by retrieving the necessary item information from the data back-end (retrieve item information). Next, the item is added to the shopping basket (process add to basket), and depending on the success or failure of adding the product to the basket, the new basket (display new basket) or an error page (display error page) is displayed to the end user. Similarly, the payment service is decomposed into a first servlet constructing an order out of the shopping basket entries (prepare basket order) and a second servlet (process order payment), taking care of the payment (e.g. the submission of credit card information). Next, the order is logically processed at the server,