Relevance of conformance analysis information. A.C.N. Martens

Transcriptie

1 Relevance of conformance analysis information A.C.N. Martens December 4, 2009

2 Preface This thesis is part of my master project at the Industrial Engineering & Innovation Sciences department at the TU/e. The master project concludes the Business Information Systems program I attended. The research described in this document has been conducted at DSV Solutions in Venray. This report has been written for the assessment committee, employees of DSV Solutions in Venray and everyone else who is interested in the field of conformance analysis. I would like to thank Infrastructure & Operations Manager Marcel Brouwer (DSV Solutions) and Adjunct Business Unit Manager Remco Brosky (Valid) for offering me the opportunity to perform this project at the DSV Solutions site in Venray. Within DSV Solutions in Venray I would also like to thank Kenny Moonen, Erik Cornelissen and Matthijs Peeters who helped me understand the ins and outs of the operational business process I investigated. Next I would like to thank the managers, engineers and supervisors at DSV Solutions in Venray who have taken the effort to complete the survey for this research. Without their input I could not have conducted the analytical analysis of this research. Special thanks goes out to Business Intelligence Specialist Hans Westheim who was a great help to me during my search for the correct data of the operational business process. Within the TU/e I would like to thank my supervisor, assistent professor Remco Dijkman, who guided me along this project and provided insightful feedback on my work. I would also like to thank associate professor Ton Weijters for providing feedback about this thesis. At last (but not at least) I would like to thank my friends who reviewed my survey and my girlfriend who supported me during the hard parts of this project. i

3 Abstract To analyze whether a business process is executed according to the prescribed behavior in a process model, conformance analysis techniques can be used. In this research we analyzed which conformance analysis information is most relevant for businesses. Conformance analysis techniques that have been found in literature have been been classified into seven feedback aspects to distinguish between the different techniques. The identified feedback aspects are fitness, precision, generalization, structure, frequency, violation and location. We have performed a case study to execute the conformance analysis techniques. A survey containing the case study results has been distributed among managers, engineers and supervisors of a business processes in a logistics firm who rated the importance of the techniques and answered questions regarding the results of the techniques to measure their understandability of these techniques. The results of the analysis showed that all conformance analysis techniques are regarded to be equally important. Differences in the understandability of conformance analysis techniques have been found. Techniques that report about fitness and violation aspects are significantly easier to understand on average. Techniques that report about the generalization aspect are significantly less easy to understand than average. Additionally managers understand feedback aspects that report about the structure aspect significantly better than supervisors. ii

4 Contents Preface Abstract i ii 1 Introduction Research motivation Research question Research strategy Structure of thesis Techniques and Feedback aspects Process mining fields Conformance analysis techniques Conformance checking techniques Parsing measure Conformance checking Completeness and soundness Delta analysis techniques Difference analysis Causal footprint similarity Precision and recall Model comparison based on typical behavior LTL technique Linear temporal logic Feedback aspects Fitness Precision Generalization Structure Frequency Violation Location Summary of techniques and feedback aspects Case study Context of case study Data preparation Result of conformance analysis techniques Parsing Measure Conformance Checking Fitness PFcomplete Behavioral Precision and Recall Structural Precision and Recall Footprint similarity iii

5 3.3.7 Linear Temporal Logic Difference Analysis Survey Survey approach Hypotheses Variables Population Questionnaire design Questionnaire data Reliability & Validity Results Analysis results Data reduction Response Data analysis & hypotheses testing Difference of feedback aspect importance (not grouped by function) Difference of feedback aspect importance (grouped by function) Understandability of techniques (not grouped by function) Understandability of techniques (grouped by function) Summary of analysis Conclusion Summary Limitations Future work Bibliography 50 Appendix A: A priori model 53 Appendix B: Mapping of process model elements 54 Appendix C: Questionnaire 55 Appendix D: Correct answers 83 Appendix E: Descriptive statistics importance feedback aspects 84 Appendix F: Normality test importance feedback aspects 88 Appendix G: Normality test importance feedback aspects (groups) 89 Appendix H: Descriptive statistics understandability techniques 90 Appendix I: Normality test understandability techniques 94 iv

6 List of Figures 1 Research strategy Research fields The fitness aspect The precision aspect The generalization aspect The structure aspect The frequency aspect The violation aspect The location aspect Business processes at the warehouse of DSV Solutions in Venray Business processes at the warehouse of DSV Solutions in Venray Results of Token based Fitness Results of Conformance Checker (diagnostic result) Results of PFcomplete Results of Behavioral Precision and Recall Results of Structural Precision and Recall Results of Footprint similarity Results LTL checker Results Differences analysis Example of feedback aspect question Format of feedback aspect question An example of a technique question The format of technique questions Box plot importance of feedback aspects (not grouped by function) Kruskal-Wallis to test Hypothesis Box plot importance of feedback aspects (grouped by function) Kruskal-Wallis to test Hypothesis Box plot understandability of techniques (not grouped by function) Kruskal-Wallis to test Hypothesis Median-test to test Hypothesis Box plot understandability of techniques (grouped by function) Kruskal-Wallis to test Hypothesis pair wise Mann-Whitney analysis for groups managers and engineers pair wise Mann-Whitney analysis for groups managers and supervisors pair wise Mann-Whitney analysis for groups engineers and supervisors v

7 List of Tables 1 Overview of techniques and feedback aspects Original record of WMS database Transformed record Overview of analysis approach Overview of response and response rate Identified groups with pair wise comparison Summary of hypothesis testing vi

8 1. Introduction This chapter provides a general description of the conducted research. After the motivation for this research has been presented in section 1.1, the research question will be discussed in section 1.2. The chosen strategy to answer this question has been elaborated next in section 1.3. The last part of this chapter provides an overview of the structure of this thesis (section 1.4). 1.1 Research motivation A business process is a collection of activities that leads to a business goal. Business processes need to be managed carefully. Often policies and regulations are documented to describe the activities and conditions that must hold to perform the activities in a business process. The level at which business processes are being performed according to the described or prescribed ways of working is called the business alignment [24]. A correct business alignment is important for several reasons. Long term goals are easier to achieve with a well aligned business process. It enables organizations to perform its business process as profitable as possible by making efficiently and effectively use of its resources. It also enables organizations to guarantee the manageability of the business process and the quality of the provided product or service. The business alignment needs to be measured in order to decide whether corrective actions are needed to prevent escalation of potential problems. Conformance analysis techniques can be used to provide detailed feedback about the status of the business alignment. Many business processes are being controlled and supported by information systems. As a result, these information systems leave their footprints in transactional systems [31]. These footprints can be seen as a representation of the business process as it is executed in reality. Conformance analysis techniques use the footprints from the transactional systems and analyze whether the business process has been performed according to the policies and regulations describing the business process. Conformance analysis techniques are powerful because they use recorded data from the operational business processes itself. These techniques can easily be used to analyze a business process over a long period in time as long as the transactional data is available. Since computational methods are used in these techniques even small deviations of the intended process model are captured with conformance analysis techniques. Although the conformance analysis techniques are very powerful, its application in business has not been researched thoroughly. The reason for this is that the field of conformance analysis is relatively new. More research about the use of conformance analysis techniques as a business alignment tool will help to narrow the gap between businesses and the research area of business process management. Both fields will benefit as this will help express their needs to each other. In the future this might lead to a better alignment of these two fields. 1

9 1.2 Research question In order to achieve this goal we have chosen to conduct explorative research. The goal of this research is to obtain information about the current match between the need of conformance information by businesses and the information that can be provided by available conformance analysis techniques. Based on the objective of this research we formulated the next research question: Which conformance analysis information is most relevant for businesses? In order to answer the research question, the following sub questions need to be answered first: 1. Which conformance analysis techniques exist? 2. What are the different aspects about which the conformance analysis techniques report? 3. Which feedback aspects of conformance analysis techniques are regarded to be the most important by businesses? 4. Which feedback aspect has the most understandable conformance analysis technique results according to businesses? The first sub question has been formulated to obtain an overview of the available conformance analysis techniques. Searching for the answer will also clarify how the conformance analysis techniques are related to each other. The second sub question has been posed to be able to group similar conformance analysis techniques based on specific characteristics. We have chosen to group similar conformance analysis techniques in order to enable us to make statements about these groups of techniques. The third sub question evaluates which of these groups of conformance analysis techniques are regarded to be the most important by businesses. With the last sub question we investigate the understandability of each group of conformance analysis techniques. 1.3 Research strategy The strategy we used for this research consists of four phases. In phase 1, the literature phase, we have identified conformance analysis techniques and derived feedback aspects from these techniques. In phase 2, the case study phase, we executed the conformance analysis techniques in the context of a case study. In phase 3, the survey phase, we have constructed a survey to measure the relevance of the conformance analysis techniques. In phase 4, the analysis & interpretation phase, we analyzed the survey data that we gathered in phase 3 and drew conclusions from the analysis. The phases and their related activities have been depicted in figure 1. In phase 1 the first two sub questions are being answered. A literature study will be conducted to search for relevant conformance analysis techniques (activity 1.1 of figure 1). For 2

10 Phase: Literature Phase: Case Study Phase: Survey Phase: Analysis & Interpretation 1.1 Literature study 1.2 (Deliverable 1) Derive techniques 2.1 Collect a priori model information 3.1 (Deliverable 4) Build online survey 4.1 Analyze results 2.3 (Deliverable 3) Execute techniques 3.2 Distribute survey 4.2 (Deliverable 5) Draw conclusions 2.2 Preprocess DSV data 3.3 Collect survey results 1.3 (Deliverable 2) Derive aspects Figure 1: Research strategy each of the techniques we found in literature, we indicate what the required input is, how the technique processes this input and what the resulting output is (activity 1.2). Once activity 1.2 has been finished we obtained deliverable 1. This first deliverable is a list with identified techniques and their corresponding output. We will then use this deliverable to categorize the techniques in different feedback aspects (activity 1.3). This will be done to distinguish between the various techniques. The techniques are categorized based on the type of information that is presented in the output to its practitioner. Deliverable 2 is the result from activity 1.3. This deliverable is a table with techniques mapped to the various feedback aspects. Sub question 2 will be answered after activity 1.3 has been finished. In phase 2 a case study is being performed. A case study is an empirical inquiry that investigates a phenomenon within its real life context [7]. A case study can be used in explorative research where the empirical data can be used to investigate phenomena in areas where existing knowledge is limited [7, 3]. The conformance analysis techniques are being performed with data from a business process of the logistics firm DSV Solutions. Information regarding the desired process will be obtained to construct a model that represents the outbound business process as it is being perceived by management of DSV Solutions in Venray (activity 2.1). This perceived model is also known as an a priori model. Warehouse management data will be collected and preprocessed to construct an event log containing information regarding the outbound business process as it is being executed in reality. The activities required to construct an event log are captured in activity 2.2. With this information the conformance analysis techniques found in literature will be executed (activity 2.3). The deliverable from the case study consists of the results from the execution of the conformance analysis techniques (deliverable 3). 3

11 In phase 3 a survey will be conducted. Deliverable 2 and 3 will be used to construct an online survey at activity 3.1 (deliverable 4). The online survey is distributed among the respondents (activity 3.2) and the results of the survey are collected once the survey has been filled out by the respondents (activity 3.3). In phase 4 the data collected from the survey will be analyzed (activity 4.1) and used to test various hypotheses to answer sub questions 3 and 4. Conclusions from the analysis will be drawn in activity 4.2. A summary of the conclusions is deliverable Structure of thesis The remainder of this thesis has been structured as follows. In chapter 2 relevant conformance analysis techniques, feedback aspects and hypotheses will be discussed. In chapter 3 the research method will be presented. The results of the data analysis are described in chapter 4. Chapter 5 discusses the results of this research. Chapter 6 provides the conclusions of this research, followed by a discussion and directions for future research. 4

12 2. Techniques and Feedback aspects In this chapter we start by explaining the context of conformance analysis techniques in the field of process mining in section 2.1. Subsequently we will discuss the conformance checking, delta analysis and LTL checking techniques in sections 2.2, 2.3 and 2.4. In section 2.5 we introduce seven different feedback aspects that have been identified based on the feedback from the conformance analysis techniques. Section 2.6 provides a mapping between the techniques and feedback aspects. 2.1 Process mining fields Increasingly, business processes become process aware [31]. This means that information systems control and/or monitor business processes. As a result, these business processes leave their footprints in transactional information systems, i.e. event logs [24, 22]. Event logs are records of audit trails of a business process that can be used to discover models describing processes, organizations, and products. A lot of research to effectively discover process models from event logs has been conducted. This field of research is called process mining [34, 17, 31]. Closely related to the field of process mining are the fields of conformance checking [22], delta analysis [24] and linear temporal logic [26] which all contain techniques that can be used to compare a prescribed situation with reality. These fields of research are derived from the field of process mining. The field of conformance checking [22] involves all techniques that focus on differences between a process model and an event log. A prescribed situation of the business process is contained in the process model and the reality of how this business process is executed is captured in the event log. With conformance checking the event log is being replayed in a prescribed process model and focuses on the deviations. The field of delta analysis [24] contains techniques that assume two process models that will be compared to each other and focuses on the differences. Delta analysis can be used to assist with the merger of two organizations. In this context we used conformance analysis techniques from the field of delta analysis to detect differences between a prescribed process model and a process model which describes the reality. The process model that describes the reality can be obtained by applying process mining techniques to the event log. The field of linear temporal logic (LTL) [26] involves techniques that can be used to test for specific behavior. With LTL a formula, describing specific behavior, can be created. This formula can then be used to verify how often this behavior actually occurred in an event log. These fields of research contain techniques that all require an input that describe the prescribed situation and the actual situation. There are four different types of input possible. These are an a-priori model, an a-posteriori model, an LTL formula and an event log. An a-priori model is a prescriptive model and a model which has been discovered using process mining techniques is called an a-posteriori model [36]. The notations used in this thesis are 5

13 Petri Net [19], Heuristic Net [9] and Event-driven Process Chain (EPC) notations [28]. Prescribed situation Actual situation A-priori model Delta Analysis Conformance checking A-posteriori model Process Mining LTL-formulas LTL Event log Figure 2: Research fields. Figure 2 is a graphical overview of the relations between the discussed research fields and the different types of input that are used by these research fields. In figure 2 the four types of input have been denoted. At the left side of the figure the types of input that represent the prescribed situation are depicted. The right side of the figure depicts the actual situation. The conformance of either input of the prescribed situation (the a-priori model or the LTL-formulas) will be tested against an input of the actual situation (the a-posteriori model or the event log). The possible combinations of conformance analysis are (indicated with a bidirectional arrow): delta analysis, conformance checking and LTL-checking. The only unidirectional arrow (process mining) has been indicated to illustrate the process extracting an a-posteriori model from the event log using process mining techniques. The discussed research areas can provide managerial insights in three perspectives [36, 4, 25, 24, 36]. 1. process perspective 2. organizational perspective 3. data perspective The process perspective focusses on the control-flow i.e., the ordering of tasks. The goal of mining this perspective is to find a representation of all the possible paths. The organizational perspective focuses on the performer of the identified tasks. The goal here is to identify organizational relations between workers i.e., construct a social network. The case perspective focuses on properties of cases. Besides the path or performer of tasks, a case can be characterized by the values of the corresponding data elements. For example, if a case represents a customer order, it might be interesting to know the supplier or the number of 6

14 products ordered [36, 4, 25, 24, 36]. Cook and Wolf have applied process mining techniques to the the field of software engineering [6]. Agrawal and Gunopulos introduced process mining to the context of workflow management [1]. Process mining techniques have also been discussed in the context of security [27], business alignment [24], web services [13, 29], software development [23] and genetic mining [9]. All researchers have focussed on the process perspective. Because most research conducted in this field has been dedicated to provide insight to the process perspective [31] and because this perspective represents the differences between process in reality and the process on paper best, we have chosen to focus on the process perspective. 2.2 Conformance analysis techniques In this section we will discuss the conformance analysis techniques we found in literature. Techniques from the field of conformance checking, delta analysis and linear temporal logic will be discussed Conformance checking techniques In this subsection we discuss the techniques from the field of conformance checking. Process validation Cook et al. introduced Process Validation as a technique to detect deviations from intended and actual behavior [6, 4, 5]. The authors introduced two metrics to determine the correspondence between a formal model and executions of the process. These metrics are Simple String Distance (SSD) and Nonlinear String Distance (NSD). Two streams are created. One for the model and another one for the event log. The distance between these two strings is calculated by measuring the number of insertions, deletions and substitutions to transform one string into the other. An insertion represents missed activities (the model predicted them, but they were not performed in the log trace), deletions represents extra activities (the model did not predict them, but they were performed in the log trace) and the substitution represents replaced activity (the model expected a different activity then performed in the log trace). To strengthen the metrics, weights can be assigned to each of the activities that might be performed. The NSD calculates the quantitative impact of the insertions, deletions or substitutions. Consecutive actions that are needed for the transformation are regarded as blocks. A sequence of missed activities, for example, might be seen as a series of deviations but might potentially be a more serious breach of desired execution than can be represented by counting those missed activities. The lengths of these blocks are being used to calculate the NSD [6]. These techniques give feedback about the fitness and violation aspect Parsing measure Weijters et al. have introduced a process discovery technique to mine a process model from an event log that contains noise. To determine the quality of the mined log compared to the event log the authors introduced the metrics, Parsing Measure (PM) and Continuous Parsing 7

15 Measure (CPM). The CPM is similar to the SSD metric used in process validation by Cook et al [35]. Input to this technique is the mined process model and the corresponding event log. To calculate the PM, the number of correctly parsed traces is divided by the total number of traces found in the log. The CPM metric is similar to the PM metric but continues parsing after an error has been recorded where the PM metric stops. The feedback of the PM and CPM both report about the fitness and violation aspects Conformance checking Conformance checking has been introduced by Rozinat et al. [22]. The technique quantifies how well a process model conforms to a corresponding event log. Conformance checking takes an event log and some a-priori model, in the petri net notation, as input. The technique consists of three metrics and diagnostic information. The metrics are token-based fitness, behavior appropriateness and structural appropriateness. To calculate the metrics, the event log is being replayed in the a-priori model. Token-based fitness represents how well the event log can be replayed in the given model. The behavioral appropriateness indicates how much extra behavior the model allows for in comparison to the given event log. The structural appropriateness is aimed at the detection of redundant and duplicate tasks in the model. To calculate the token-based fitness metric, the number of artificially created and removed tokens are related to the number of produced and consumed tokens during event log replay. Then these relations are compared to the events found in the event log. Rozinat et al. claim that a good fitness does not imply conformance. Therefore the behavioral and structural appropriateness metrics are part of the feedback of conformance checking as well. The appropriateness capture the idea of Occam s razor, i.e. one should not increase, beyond what is necessary, the number of entities required to explain everything [22]. In order to calculate the behavioral appropriateness metric, the number of choices that could be made in the model at each point during event log replay are related to the number of these choices actually taken in the event log. Calculating the structural appropriateness metric involves determining the number of redundant and duplicate tasks in the model. Redundant tasks can be removed without altering the behavior of the model and duplicate tasks list alternative ways of working without expressing it in a meaningful way. Apart from the metrics, diagnostic information is returned. The diagnostic information is presented in a petri net. In the petri net, which has been loaded as input, is indicated at which locations in the model the event log could not be parsed successfully during event log replay. The feedback aspects of the token-based fitness metric are fitness and violation. The violation is applicable as the research in researched [27] has shown. The authors have used conformance checking to detect security violations on business processes. First a process mining techniques has been applied to an event log that contains all acceptable behavior to construct a model that contains allowed behavior. The conformance can be verified for each new audit trail by playing the token game. The point at which the audit trail differs from allowed behavior can be detected. In this way the location of the difference can be detected, even realtime. The behavioral appropriateness metric provides feedback about the precision aspect [22, 21]. The structural appropriateness metric gives feedback about the structure aspect. The diagnostic information provides feedback on the location aspect. 8

16 2.2.4 Completeness and soundness Greco et al. have created a technique to formulate a suitable process model from an event log [14]. The authors developed an algorithm that stepwise refines a starting schema by guaranteeing that each refinement leads to an increasingly sound schema. To verify whether each successive scheme is a refinement, the metrics completeness and soundness have been used [14]. The completeness value, which calculates the fitness, is similar to the token-based fitness metric of conformance checking [22]. The soundness metric, measuring the minimality of the process model, is similar to the behavioral appropriateness [14]. 2.3 Delta analysis techniques In this subsection we discuss the techniques from the field of delta analysis Difference analysis Difference analysis is a conformance analysis technique from delta analysis field and has been developed to detect differences between two process models [20]. Dijkman proposed difference analysis as a technique to detect the deviations between two process models [12, 11]. Difference analysis points out the position of the difference and diagnoses the type of difference. The notion of similarity used by Dijkman is completed trace equivalence. According to Dijkman the problem with other process difference analysis techniques is that they return simple true or false statements or statements in terms of formal semantics. These types of feedback are not useful for business analysts [12, 11]. The difference analysis gives feedback about the location aspect. Dijkman identifies two reasons for the fact that equivalence checking techniques provide limited feedback [11]. Firstly, equivalence checking techniques are defined on the execution traces or the state-space of processes, while business processes are defined in terms of activities and relations between those activities. Hence, it is hard to pinpoint exactly where those processes are different in terms of the activities and relations. Secondly, when an equivalence checking technique determines that two processes are different, it provides little information as to why this may be. If it provides any diagnostic information at all (many state-based techniques return a simple true or false answer), it again returns an answer in terms of states or traces. Van der Aalst identified two drawbacks to delta analysis. First, he argues that there might not be enough events to actually discover the process model, which results in a model that may be flawed or not representative. The second drawback he identified is that delta analysis does not provide quantitative measures for the fit between prescriptive/descriptive model and log [24]. Van der Aalst raises the interesting question of how much events are required to successfully discover the process model. He shows that there are simple techniques to assess the completeness of the result, e.g., K-fold cross validation which splits the event log into K parts and for each part it is verified whether adding a part changes the result. However the first drawback of delta analysis mentioned by Van der Aalst is valid, it is also a drawback of conformance checking. When an event log would be used to perform conformance checking that does not reflect all activities performed in a business process, the results of conformance checking might not be representative. 9

17 2.3.2 Causal footprint similarity Van Dongen et al. have developed Causal Footprint analysis to measure the behavioral similarity of two process models [33, 32]. A causality footprint can be derived from a process model. A causal footprint can be seen as an abstraction of process behavior. The process models are transforming graphs that contain vectors in a multidimensional space. The cosine distance between these two vectors is then calculated [21]. The bigger the distance, the greater the difference between the two models. The authors have identified three problems with other delta analysis techniques that use bi-simulation as a notion of equivalence. The first problem that has been identified is that these techniques result yes or no as an answer to the question whether two process models are similar. The second problem is the state explosion problem which occurs when models with concurrency are being compared to each other. The last problem is that deadlocks in the process model are not captured in the behavioral comparison. The causal footprint analysis is different in a sense that causal footprints capture constraints instead of the state space. Weights can be assigned to terms to indicate their relevance to the process model [32]. The feedback of the analysis technique results in a value for similarity between 0 and 1. The feedback aspects that are reported about are precision and generalization Precision and recall Pinter et al. have developed interval based algorithms to create well structured graphs from an event log [18]. Precision and recall are metrics that have been used by the authors to determine the quality of their algorithms. The precision metric is calculated by determining the ratio of correctly identified links between activities over the total number of generated links between activities. Recall is the ratio of correctly identified links between activities in relation to the number of links between activities in the original workflow graph [18]. The precision and recall metrics report about the structure aspect [21] Model comparison based on typical behavior More recent work in the field of process mining has been devoted to the use of genetic algorithms [10]. Medeiros et al. developed a technique to compare two process models based on typical behavior. The inputs to this technique are an a-priori model, an a-posteriori model and an event log [10]. The authors have presented naive approaches to compare with their approach. According to Medeiros et al. the problems with other metrics are that the full firing sequences/state space needs to be finite, the models need to be terminating, i.e., it should be possible to end in a dead marking representing the completion of the processes, there is no distinction between important or unimportant paths and last the behavioral precision and behavioral recall metrics are too rigid, i.e., one difference in the full firing sequence invalidates the entire sequence [10]. For these reasons Medeiros et al. have presented the completeness, behavioral precision and recall and duplicates precision and recall metrics. The completeness metric is very similar to the token-based fitness, the difference is that the completeness metric also takes the frequency of event traces into account by assigning weights. In this way, problems in more often occurring parts of the model are punished more severely. The metric behavioral precision measures how much extra behavior the mined 10

18 model allows for in comparison to the allowed behavior from the original model with respect to typical behavior in the event log. The metric is calculated by measuring the intersection between the set of enabled tasks that the mined and reference models have at every moment of the event log replay. The intersection is further weighed by the frequency of traces in the event log. The metric behavioral recall measures how much extra behavior the original model allows for in comparison to the mined model with respect to the typical behavior in the event log. Structural precision and recall are adapted from the precision and recall metric by Pinter et al. [9]. Structural precision is calculated by establishing how much structural elements are contained in the a-priori model that are not contained in the a-posteriori model. The structural recall takes the a-posteriori model as a starting point and verifies how much structural elements are not contained in the a-priori model. The duplicates precision and recall focus on the duplicate elements that can be removed without altering the behavior found in the other model. The completeness metric reports about the fitness aspect. The behavioral precision reports about precision, behavioral recall about generalization. The structural and duplicates precision and recall metrics all report about the structure aspect. 2.4 LTL technique In this subsection we discuss the technique from the field of linear temporal logic Linear temporal logic Linear Temporal Logic (LTL) [26] is a language that has been developed to verify (un)desired behavior. In [23, 30, 8] the LTL syntax has been discussed. The syntax of the LTL language can be used to create formulas that describe this (un)desired behavior. These formulas can then be tested for a specified event log. Dependent on the tested formulae, each trace in the event log evaluates to true or false. The total number of traces that have been evaluated to true and false are being summed up separately. The problem with LTL checking according to Medeiros et al. in [8] is that these techniques require an exact match between the elements (or strings) in the event log and the corresponding elements in the model. The advantage of this technique is that specific behavior can be tested accurately. The LTL checker is a plugin that has been implemented in the ProM (process mining) framework [26], gives feedback on the frequency aspect. 2.5 Feedback aspects We have used the output from the conformance analysis techniques and grouped techniques that provided similar feedback. In this process seven categories have been found; The feedback aspects. Practical examples of techniques reporting about the different feedback aspects has been provided in chapter 3. The feedback aspects will be explained here. To illustrate the interpretation of each feedback aspect, an event log and two process models will be provided. One of these models illustrates a poor representation of the feedback aspect while the other model indicates a good representation of the feedback aspect. 11

19 2.5.1 Fitness Fitness quantifies how much behavior in an event log is captured by a process model [22, 21]. It indicates up to which level a business process has been conducted as described in some a priori model. The model of figure 3(b) has poor fitness since only one of the three traces in the event log figure 3(a) can be replayed successfully. The log trace that can be replayed successfully is trace ABCD. The fitness of the model of figure 3(c) on the other hand has good fitness since all three traces in the event log can be replayed successfully. ABCD ABD ABECD A B C D A B C D E a) Event log b) Poor fitness c) Good fitness Figure 3: The fitness aspect Precision Precision indicates how much extra behavior has been contained in the model but no such behavior can be found in the provided event log. A completely precise model is a model where all behavior has been performed in reality. The model of figure 4(b) is an example of a model with poor precision because the model allows for much more behavior than in captured in the event log. It is for example possible in model 4(b) to perform activities A, B, C, E and F several times each. The precision of model figure 4(c) is good because no more behavior is possible in the model than the event traces in the event log. E F E F AEFD ABCD A B C D A B C D a) Event log b) Poor precision c) Good precision Figure 4: The precision aspect Generalization Generalization indicates if a model is not too precise (hence too static). A model which is too precise leaves no room for alternative ways of working. The generalization aspect is complementary to the precision aspect. It reports about overly precise models. When no generalization has been applied to a model it is too rigid. The generalization in model figure 5(b) is poor because the model is very rigid. Apart from the traces depicted in the event log in figure 5(a), no other behavior is possible in the model of figure 5(b). In model of figure 5(c) on the other hand is more generalizable. This model allows for repetitions of activity F for example. It is also possible to replay trace AECD while this was not possible in the rigid model of figure 5(b). 12

20 E F E F ABCD AEFD AGHD A B C D A B C D G H G H a) Event log b) Poor generalization c) Good generalization Figure 5: The generalization aspect Structure Structure indicates if the a priori model is well structured and such that it is readable. Often, the same behavior can be expressed in more than one way but there are preferred ways of expressing the same behavior. The way to verify the structure is by establishing if the model contains duplicate and redundant tasks by taking the supplied event log into account [21]. The structure of the model in figure 6(b) is poor because there are several duplicate tasks and there are preferred ways of expressing the same behavior. The duplicate elements in model 6(b) are activities B and C. The model of figure 6(c) for example has a good structure since there are no duplicate tasks and there is no way of expressing the same behavior differently. ABCD ABCED B C E A B C D A B C D E a) Event log b) Poor structure Figure 6: The structure aspect. c) Good structure Frequency Frequency indicates how often specific behavior has been performed in reality. Information regarding the number of instances in the event log is an example of the frequency of specific behavior in a business process. Based on this information one can distinguish important from less important traces. The model of figure 7(b) is an example of poor frequency. In this model it is impossible to distinguish frequent from infrequent event traces. Frequency information has been contained in the model of figure 7(c). From the frequency information of figure 7(c), it can be seen that activities E and F have been performed much more often than activities B and C. ABCD AEFD a) Event log E F A B C D b) Poor frequency E 680 A B C D F c) Good frequency Figure 7: The frequency aspect. 13

21 2.5.6 Violation Violation is the complement of the fitness aspect. It assumes that the provided process model only contains behavior that is allowed. With this aspect, the practitioner is informed on violation of the supplied behavior in the process model. To illustrate the violation aspect we have assumed in this example that the only allowed behavior is captured in the event log of figure 8(a). The process model in figure 8(b) violates the behavior in the log since it also allows for traces AD and ABD. The figure of model 8(c) does not violate the behavior in the log. ABCD A B C D A B C D a) Event log b) Poor violation Figure 8: The violation aspect. c) Good violoation Location Location indicates where the observed behavior has deviated from the a priori model. The location aspect informs about the location of a conformance problem in a model. With this information it is possible to pinpoint exactly where the problem occurs. The exact location of the problem has not been pinpointed in the model of figure 9(b), therefore it has poor location. The model of figure 9(c) on the other hand indicates a problem at location C in the process model. Here information regarding the location of the problem has been provided. ABD A B C D A B C D a) Event log b) Poor location Figure 9: The location aspect. c) Good location 2.6 Summary of techniques and feedback aspects The aspects that have been identified have been mapped to the different conformance analysis techniques. Table 1 gives a summary of the techniques and the feedback aspects that they can report on. Some techniques report about more than one aspect, therefore the writers are listed several times in table. The information in this table will also be used further on in this thesis. 14

22 Author(s) Technique(s) Fitness Precision Generalization Structure Frequency Violation Location Cook and Wolf Simple String Distance x x Cook and Wolf Nonlinear String Distance x x Weijters et al. Parsing Measure x x Weijters et al. Continuous Parsing Measure x x Greco et al. Completeness x x Greco et al. Soundness x Rozinat et al. Token-based fitness x x Rozinat et al. Token-based fitness diagnostics x x Rozinat et al. Token-based fitness logview x x Rozinat et al. Behavioral Appropriateness x Rozinat et al. Structural Appropriateness x Van Dongen et al. Footprint similarity x x Van der Aalst LTL number correct traces x Van der Aalst LTL logview x Dijkman Difference analysis x Pinter et al. Precision x Pinter et al. Recall x Medeiros et al. Fitness Pfcomplete x x Medeiros et al. Behavioral Precision x Medeiros et al. Behavioral Recall x Medeiros et al. Structural Precison x Medeiros et al. Structural Recall x Medeiros et al. Duplicates Precision x Medeiros et al. Duplicates Recall x Table 1: Overview of techniques and feedback aspects. 15

23 3. Case study In this chapter we discuss the conducted case study. In section 3.1 we will discuss the context of the case study. In section 3.2 we will discuss which activities will be performed to prepare the data from this context so that it can be used in conformance analysis techniques. The results from the conformance analysis techniques will be discussed in section Context of case study The application of conformance analysis techniques has been researched with a case study that has been conducted at DSV Solutions in Venray. DSV Solutions in Venray is a subdivision of DSV 1. DSV offers transport and logistic services all over the world. DSV has three subdivisions; DSV Road, DSV Air & Sea and DSV Solutions. The DSV Solutions division is one of the leading suppliers of logistics services in Europe. At the solutions office in Venray, a warehouse is located. Within this warehouse there are four operational business processes. These business processes are called inbound, warehousing, Value Added Logistics (VAL) and outbound. Most stock in the warehouse are printers or printer related products. Figure 10 is a graphical representation of the warehouse. The arrows in figure 10 show the flow of the products through the warehouse. The products that will be stored temporarily in the warehouse arrive at the inbound process. Dependent on the requests from the customers, VAL activities are applied to the products. Eventually the products will be shipped to customers once an order for these products has been filed. Warehouse Inbound process Warehouse process Value Added Logistics process Outbound process Figure 10: Business processes at the warehouse of DSV Solutions in Venray

24 DSV Solutions in Venray uses a Warehouse Management System (WMS) to support its business processes. In this WMS information regarding the locations and destinations of products is stored. Moves of these products are also stored in the WMS. Workers in the warehouse use hand held scanners to register the actions they perform. The inbound process involves activities regarding the storage of products delivered to the warehouse. The warehouse process involves activities regarding the replenishment of fast moving products, cycle counts and quality control. The activities in the VAL process can be applied to products when customers have demanded special modifications to these products. This ranges from adding a Russian operation manual for products that have to be shipped to Russia until upgrading the firmware of printers. The outbound process involves activities that process customer orders. This process consists of release, pick, pack, load and ship activities. In this case study we have focused on the activities of the outbound business process. We have chosen to investigate the outbound process because this business is the most crucial and complex for DSV Solutions in Venray. Most resources are allocated to the outbound process as well. Figure 11 shows a graphical representation of the outbound business process. The elements in figure 11 are subprocesses of the outbound process. The label attached to each subprocess represents the performer of the specific subprocess. planner Start Release order picker Pick order packer Pack order QC Load order planner Ship order End Figure 11: Business processes at the warehouse of DSV Solutions in Venray. In the release subprocess orders are released to be picked in the pick subprocess. In the pick subprocess the products from a customer are picked from warehouse locations and dropped at destinations indicated by the WMS. In the pack subprocess, products are prepared for shipment. This can either be done by packaging on the pack line or by hand at the pack 17

25 area. The pack line is a packaging machine that automatically measures, weighs, wraps and labels the product. For some products packaging is not required. A load check to verify if all products of an order are picked and packed is performed in the load subprocess. The ship subprocess has been finished when all required documents for the physical transportation have been completed and all products have passed the load check in the load subprocess. 3.2 Data preparation To perform the various conformance analysis techniques we needed to prepare the required types of input for these techniques. The required types of input are an a-priori model, LTL formulas, an a-posteriori model and an event log. The a-priori model and LTL formulas have been constructed together with process engineers and managers from the outbound business process. To construct event logs, transactional data from the WMS has been captured, preprocessed and transformed into an interpretable format so that it can be analyzed. The a-posteriori model could be extracted from the event log using process mining techniques. The first input, a-priori model, has been constructed together with domain experts Matthijs Peeters (Logistics engineer), Max Beterams (Outbound Planning & Control manager) and Emil Reys (Manager Outbound). We have also used work instruction and flowcharts which were available at the intranet of DSV Solutions in Venray. Since some work instructions and processes described in the organizational handbook were somewhat outdated, the process overview could not directly be used as a-priori model. The a-priori model we constructed contains all behavior that is accounted for. The a-priori model has been depicted in Appendix A. The second input, LTL formulas, has been obtained while discussing with managers, supervisors and process engineers, which behavior should or should not be contained in the a-priori model. Especially the behavior that is not contained in the a-priori model was used in the formulation of the LTL formulas. The third input, a-posteriori model, has been obtained by applying process mining techniques to the event log. The process mining technique we used is the heuristic miner [35] to derive a heuristic net from the event log. Some conformance analysis techniques require the a-posteriori model to have a petri net notation. For these techniques, the heuristic net has been transformed using the heuristic net to petri net conversion plugin available in ProM (process mining framework) 2. The fourth input, event log, has been obtained by transforming the data from the Warehouse Management System (WMS) into a format that can be used by ProM. The WMS data is collected when the outbound business process is operational. This data is collected to keep track of the location of products, to keep track of stock levels and to analyze potential problems. The following actions that have been performed to turn the data from the WMS into an event log will be explained here: 1. Capture WMS data

26 2. Preprocess WMS data 3. Transform WMS data into MXML event log In step 1 we captured the WMS data. We had to copy WMS data to a database that would not be purged. For our analysis we wanted to use three months of data (January, February and March of 2009). The WMS data is being purged each month for maintenance reasons. Hans Westheim (System Engineer) copied the data to another database and appended new data to it. In step 2 we preprocessed the WMS data in the database. The records in the database represent operations applied to pickdetails. A pickdetail is a product that is being collected for a customer. Each record in the database represents an action performed to a specific pickdetail. These actions are for example collecting a pickdetail from a warehouse location, dropping the pickdetail on a specific location, packing a pickdetail on the packing machine, shipping the pickdetail and so on. A pickdetail runs through the outbound business process and is called a case in terms of business process modeling. Table 2 represents a snapshot of data that represents a record of the database with WMS data. Pickdetail id Activity Timestamp To location Dropped :05 Packarea Dropped :32 Lane Table 2: Original record of WMS database. Table 2 shows that pickdetail with id has been moved to the packarea location in the warehouse at :05. At :32, the same pickdetail has been moved to the lane in the warehouse. In an event log the columns pickdetail, activity and timestamp will be used. This means that the column To location will not be used in the event log. When this data would be used to construct the event log, it would seem that two times the same activity (Dropped) has been performed sequentially to pickdetail with identification To bypass this problem a visual basic script has been altered to incorporate the information from the To location column as a postfix to the description in the activity column. In case the pickdetail is moved to the packarea, the postfix To Pack Area (TPA) has been added. The postfix To Lane (TLA) has been added to the description whenever a pickdetail is moved to a lane location in the warehouse. The application of the altered visual basic script to the snapshot of table 2 is shown in table 3. In this way it is possible to make a distinction between the two performed activities where this would otherwise not be possible. A mapping of the various postfixes and their explanations has been included in Appendix B. In step 3 the preprocessed WMS data has been transformed into an event log. To do this 19

27 Pickdetail id Activity Timestamp To location Dropped-TPA :05 Packarea Dropped-TLA :32 Lane Table 3: Transformed record we have used the ProMImport manual 3. The ProMImport program allows practitioners to covert data to a MXML (Meta XML) file that can be used for analysis in ProM. To do this, we used the visual basic script in the example database that has been included in the archive file provided with the manual. Hans Westheim and the author had to alter the script because when the original script would be applied to the WMS data. ProMImport has been used to generate the event log from the three months. The resulting event log is relatively large (822 MB). To reduce the size, the event log has been loaded into ProM and exported as a grouped MXML file. This function groups similar log traces and saves in the MXML file how often these occur. This reduces the size of the event log significantly. For the purpose of analyzing the working days of the months January, February and March of 2009 separately, an event log has been created for each day. Event log filters have been used to generate these. 3.3 Result of conformance analysis techniques Now the required types of input have been constructed, the various conformance analysis techniques have been executed. The settings and results of these techniques are discussed here Parsing Measure To perform the Parsing Measure (Weijters et al.), we have loaded the a-priori model and an event log of each working day between January 7 th and March 26 th of The a-priori model has to be loaded after the event log has been loaded because the mapping of log events and elements found in the model has to be made. Once the inputs have been loaded, the a-priori model has to be transformed into a heuristic net. This can be done by using the conversion heuristic net to petri net plugin available in the ProM framework. The Parsing Measure can now be calculated. A graphical representation of the Parsing Measure has been depicted in figure 12. We have chosen to adjust the range of the y-axis to emphasize the variance in the Parsing Measures and because the values lie in the range of 0,7 and 1. A can be seen in figure 12, the Parsing Measure of January 9 th had the value 1. This means that all behavior that has been captured in the event log of that day could be successfully replayed in the a-priori model. Another day which might be interesting to focus on is March 13 th since that day closely approached a Parsing Measure value of 1. The average Parsing Measure value for January is 0,904. The average Parsing Measure value for February was 0,890 and the average Parsing Measure value for March was 0,938. The average of February was lower than the averages of January and March

28 1.000 Parsing Measure PM Parsing Measure PM /7/2009 1/9/2009 1/11/2009 1/13/2009 1/15/2009 1/17/2009 1/19/2009 1/21/2009 1/23/2009 1/25/2009 1/27/2009 1/29/2009 1/31/2009 2/2/2009 2/4/2009 2/6/2009 2/8/2009 2/10/2009 2/12/2009 2/14/2009 2/16/2009 2/18/2009 2/20/2009 2/22/2009 2/24/2009 2/26/2009 2/28/2009 3/2/2009 3/4/2009 3/6/2009 3/8/2009 3/10/2009 3/12/2009 3/14/2009 3/16/2009 3/18/2009 3/20/2009 3/22/2009 3/24/2009 3/26/2009 Figure 12: Results of Token based Fitness Conformance Checking To perform Conformance Checking (Rozinat et al.) we loaded the a-priori model and one event log. The event log consists of transactional data from the months January, February and March. When the conformance checker in ProM is being started, it is possible to indicate which metric you want to be calculated. For this research we have used the conformance checker to calculate the token-based fitness, the behavioral and structural appropriateness. Figure 13: Results of Conformance Checker (diagnostic result). Figure 13 shows a problem found with conformance checking. In the months January, February and March of 2009, a conformance problem at the pack area has occurred 4074 times. For 1358 pallets have been dropped at the pack area and moved twice to a different location on the pack area. The desired behavior is that a pallet is dropped on the pack area only once Fitness PFcomplete To calculate the fitness PFcomplete metric [10], an event log of each working day and an a-priori model have been loaded into ProM. The a-priori model has been transformed into a heuristic net notation. The PFcomplete metric has been calculated for each workday of the three months. As with the graph of the Parsing Measures, we have adjusted the range of the y-axis here as well. All values lie within a range from 0,9 till 1. Figure 14 shows the values for the PFcomplete metric for the observed months. The PFcomplete results are in line with other fitness techniques report which is not remarkable since they are all calculated based on the same principle. However there are differences. The PFcomplete value of February 18 th for example is higher than the PFcomplete value calculated 21

29 3/30/2009 3/28/2009 3/26/2009 3/24/2009 3/22/2009 3/20/2009 3/18/2009 3/16/2009 3/14/2009 3/12/2009 3/10/2009 3/8/2009 3/6/2009 3/4/2009 3/2/2009 2/28/2009 2/26/2009 2/24/2009 2/22/2009 2/20/2009 2/18/2009 2/16/2009 2/14/2009 2/12/2009 2/10/2009 2/8/2009 2/6/2009 2/4/2009 2/2/2009 1/31/2009 1/29/2009 1/27/2009 1/25/2009 1/23/2009 1/21/2009 1/19/2009 1/17/2009 1/15/2009 1/13/2009 1/11/2009 1/9/2009 1/7/ Fitness PFcomplete Fitness PFcomplete 1/7/2009 1/9/2009 1/11/2009 1/13/2009 1/15/2009 1/17/2009 1/19/2009 1/21/2009 1/23/2009 1/25/2009 1/27/2009 1/29/2009 1/31/2009 2/2/2009 2/4/2009 2/6/2009 2/8/2009 2/10/2009 2/12/2009 2/14/2009 2/16/2009 2/18/2009 2/20/2009 2/22/2009 2/24/2009 2/26/2009 2/28/2009 3/2/2009 3/4/2009 3/6/2009 3/8/2009 3/10/2009 3/12/2009 3/14/2009 3/16/2009 3/18/2009 3/20/2009 3/22/2009 3/24/2009 3/26/2009 Figure 14: Results of PFcomplete. over January 19 th. The Parsing Measures for these days indicate the contrary. Based on the definition of these two techniques we can conclude that the severe deviations from the process model on January 19 th, reported by the Parsing Measure, partially occurred in less important parts of the process model Behavioral Precision and Recall For the calculation of the Behavioral Precision and Recall metrics by Medeiros et al. [9] we have loaded an a-posteriori model, an a-priori model and a reference log containing the typical behavior. The a-priori model remained the same. The reference log and the a-posteriori model changed for each working day that we used to calculate the metrics. For each working day, the corresponding event log has been used and loaded to the heuristic miner to derive the a-posteriori model Behavioral Precision Behavioral Recall Figure 15: Results of Behavioral Precision and Recall. The results from the Behavioral Precision and Recall metrics have been displayed in figure 15. As can been seen in figure 15, the Behavioral Recall value is higher than the Behavioral 22

30 3/30/2009 3/28/2009 3/26/2009 3/24/2009 3/22/2009 3/20/2009 3/18/2009 3/16/2009 3/14/2009 3/12/2009 3/10/2009 3/8/2009 3/6/2009 3/4/2009 3/2/2009 2/28/2009 2/26/2009 2/24/2009 2/22/2009 2/20/2009 2/18/2009 2/16/2009 2/14/2009 2/12/2009 2/10/2009 2/8/2009 2/6/2009 2/4/2009 2/2/2009 1/31/2009 1/29/2009 1/27/2009 1/25/2009 1/23/2009 1/21/2009 1/19/2009 1/17/2009 1/15/2009 1/13/2009 1/11/2009 1/9/2009 1/7/2009 Precision values for almost all working days. This means that the a-posteriori models allow more behavior than the a-priori model. This indicates that there is behavior that has been performed structurally in reality which is not contained in the a-priori model Structural Precision and Recall For the Structural Precision and Recall metrics [10] we used the a-priori model and a different a-posteriori model for each working day. The calculation of these metrics has been performed for each working day of the three months Structural Precision SP Structural Recall SR Figure 16: Results of Structural Precision and Recall. The results from the Structural Precision and Recall metrics have been displayed in figure 16. For all measurement the value for the Structural Precision is higher than the value for Structural Recall. This means that for each working day there are structural elements contained in the a-priori model that are not contained in the a-posteriori model. An explanation for this phenomenon might be that none of the orders could be shipped one the same day the order has been received Footprint similarity The a-priori and an a-posteriori model have been loaded in ProM to perform a Footprint similarity [33]. The a-priori model needed to be converted to a heuristic model first. Next a causal footprint could be derived from the heuristic model. For each working day the heuristic miner has been used to derive a heuristic net from the event log of that day. For most days the calculation of a causal footprint from the heuristic a-posteriori net was too complex because the a-priori model contained too many connections. We were only able to calculate a causal footprint from the heuristic a-posteriori net of January 9 th. It is not remarkable that we could calculate the footprint similarity for this day because we have identified a good fitness value for this day from the fitness techniques. Figure 17 shows the result of the footprint similarity. The a-priori model has been shown at the right hand side of the figure. The process model a-posteriori model from January 9 th has been shown at the left hand side of the figure. The similarity of the causal footprints from the process models is 68,83%. 23

31 Figure 17: Results of Footprint similarity Linear Temporal Logic A Linear Temporal Logic formula [26] has been entered after a combined event log of the three months has been loaded into ProM. The LTL checker plugin in ProM has been used to test a formula which verifies if a pallet has been moved on the pack area once it has been dropped at the pack area. Figure 18: Results LTL checker. Figure 18 shows the results. The number of process instances that meet the tested formula is The number of process instances that do not have two consecutive drops on the pack area in their process path is Difference Analysis The a-priori model has been used in the difference analysis [11] plugin in ProM to serve as required behavior. We have chosen to use the a-posteriori model of January 9 th to serve as provided behavior to the difference analysis plugin. Once the models have been loaded, a 24

32 mapping between the two models had to be provided to indicate which elements of the models are similar. Figure 19: Results Differences analysis. Figure 19 shows the results of the difference analysis. The differences found are activities that are performed more than once in the provided behavior where this is not possible in the required behavior. In the provided behavior model it is possible to drop a pallet several times on the pack area. In the required behavior this is not possible. 25

33 4. Survey The results from the literature study and case study have been used for the construction of a survey. Section 4.1 discusses the approach for this survey. In section 4.2 we will discuss the dependent and independent variables that will be collected with the survey. Section 4.3 introduces the hypotheses that will be tested. Section 4.4 the targeted population will be discussed. Section 4.5 provides the design of this research. The way the data from the questionnaire will be processed is discussed in section 4.6 and in section 4.7 the reliability and validity of this research will be discussed. 4.1 Survey approach Using the identified feedback aspects and results from the conformance analysis techniques discussed in chapters 2 and 3, we were able to construct a survey. Decisions regarding the survey will be presented here. To determine the type of research, we first needed to decide on conducting a qualitative or quantitative research. In qualitative research, flexible methods of data collection are used to anticipate on unexpected turn of events during data collection. During an interview for example, the researcher can decide to sidetrack from the script to find out why specific behavior occurred. In quantitative research, structured methods of data collection are being used. For example questionnaires are used with predetermined questions. In quantitative research is exactly determined which behavior is being measured and how this is recorded. Arguments for conclusions will be based on statistical data analysis where in qualitative research a large amount of text is being produced that needs to be ordered and structured before conclusions can be drawn [2]. In this research the focus is on the relevance of conformance analysis techniques rather than as to exactly why specific techniques are relevant. This relevance has been operationalized into two dimensions i.e. importance and understandability of conformance analysis techniques. Structured questions regarding these dimensions can be posed to the applied population. For these reasons we have chosen to conduct a quantitative research. Next, we have to decide on the type of data collection. In general there are two data inquiry types for quantitative research; experiment and survey [2]. An experiment is a controlled way of observing where one of the independent variables is manipulated intentionally in order to determine the effect on one or more dependent variables. A survey is a way of describing opinions or facts captured in an interview or questionnaire [2]. Because in this case study, we are only focusing on the situation as it exists at DSV Solutions in Venray, we have chosen to conduct a survey research. Survey data can be collected by means of interviews, by phone or through an online questionnaire. Conducting the survey by phone has been ruled out first because we had to present graphs and models to the respondents. The main advantages of an interview over an online questionnaire are that the quality of the response will be higher since it is likely that the respondent will consider his or her answer more carefully and the mortality of the response will be lower (abandoning the survey before it is completed)[2]. The main advantages of 26

34 on online questionnaire over an interview are that it costs less time to collect the results, automatic checks for completely filled out questionnaires are possible, the research can be duplicated easier, it is better suited for large populations and the quality of sensitive subjects is likely to be answered more honestly [2]. Because of the limited time available in this project and the number of respondent we would like to question we have chosen to build an online questionnaire. 4.2 Hypotheses In table 1 we can see that several feedback aspects occur more often than other (i.e. fitness and violation) while others occur less often than others (i.e. frequency and generalization). This indicates that more research has been devoted to the feedback aspects than are listed more often in table 1. For this reason we expect that researchers labeled these feedback aspects as more important. By formulating the first hypothesis we test if the research population rated some feedback aspects significantly more important than others. Hypothesis 1: There is a difference between the importance of feedback aspects. Employees with different functions in organizations are faced with different challenges. We expect that, based on their education and work description, they might have a different opinion on the importance of the various feedback aspects. With hypothesis 2 we test if the importance of feedback aspects is rated significantly different for employees of different functions. Hypothesis 2: There is a difference between the importance of feedback aspects based on the function of an employee. Because more techniques are categorized into feedback aspects that occur more than others (table 1), research for those feedback aspects is more mature than research to the feedback aspects that occur less often. Previous work could have been used in the development of new techniques of a feedback aspect. To test whether techniques of a specific feedback aspects are understood significantly better (or worse) than techniques of other feedback aspects, we have formulated hypothesis 3. Hypothesis 3: There is a difference in the understandability of conformance analysis techniques that report about a specific feedback aspect and conformance analysis techniques that report about other feedback aspects. Employees within technical functions have hands-on experience with extracting data from information systems. This experience might influence their understandability of the feedback of conformance analysis techniques. For this reason we have formulated hypothesis 4 which can be used to test whether there is a significant relation between the understandability of conformance analysis techniques reporting about a feedback aspect and the function of employees. Hypothesis 4: There is a relation between the function of employees and the understandability of the conformance analysis techniques reporting about a feedback aspect. 27

35 4.3 Variables In this research we investigated the relevance of conformance information in businesses. To determine this relevance we defined relevance as a combination of importance and understandability of conformance analysis information. Therefore importance of feedback aspects and understandability of conformance analysis techniques have been operationalized as the dependent variables in this study. In order to be able to draw conclusions from the values of these dependent variables in relation to different types of employees, we also added an independent variable. This is the function of the employee. The data that is used for the testing of hypothesis 1: 1. importance of feedback (dependent variable) 2. type of feedback (factor variable) The data that is used for the testing of hypothesis 3: 1. understandability score of techniques (dependent variable) 2. type of feedback (factor variable) The data that is used for the testing of hypotheses 2 and 4: 1. importance score of fitness (dependent variable) 2. importance score of precision (dependent variable) 3. importance score of generalization (dependent variable) 4. importance score of structure (dependent variable) 5. importance score of frequency (dependent variable) 6. importance score of location (dependent variable) 7. importance score of frequency (dependent variable) 8. understandability score of fitness techniques (dependent variable) 9. understandability score of precision techniques (dependent variable) 10. understandability score of generalization techniques (dependent variable) 11. understandability score of structure techniques (dependent variable) 12. understandability score of frequency techniques (dependent variable) 13. understandability score of location techniques (dependent variable) 14. understandability score of frequency techniques (dependent variable) 15. function of employees (independent variable) 28

36 To use these variables in hypotheses testing we should be able to quantify these variables. The first dependent variable is an ordinal variable which can have one of the five different levels ranging from much less important until much more important. The second dependent (scale) variable is a percentage of correctly answered questions regarding the understandability of technique questions about a specific aspect. When all questions regarding a feedback aspect are answered correctly we assume that the respondent fully understands the feedback provided by these feedback aspects. We have decided not to distinguish between individual techniques reporting about a single feedback aspect because the differences are too subtle. The information in table 1 has been used to determine which technique feedback reports about which feedback aspect. The only independent variable used in this study is function of the DSV employee. This is a nominal variable which can have either the value outbound process manager, process engineer or supervisor. 4.4 Population The entities of the research population consists of DSV employees that have improving the quality of the outbound business process as part of their job description. The employees of DSV Solutions in Venray that meet this specification are outbound process managers, process engineers and supervisors. Employees that have no such clause in their job description are order pickers and employees of the financial department for example. Because the size of this population is relatively small we decided to question the whole population instead of using a probability sample of the population. Non-response of the populations occurs when we do not receive data from entities in the population [2]. Two types of non-response can occur; unit non-response and item non-response. Unit non-response is when a respondent does not participate in the questionnaire. Item non-response is when specific questions in the questionnaire are structurally avoided by the respondents [2]. To reduce the unit non-response, we have raffled an incentive among the respondents that have completed the questionnaire. We also sent two reminders to the respondents that had not participated yet. The first reminder was sent one week after the invitation was sent. The second reminder was sent the week after the first reminder was sent. To reduce the item-non response, we made each question in the questionnaire obligatory. 4.5 Questionnaire design The questionnaire consists of four parts and has been modeled using a free web-based surveys builder 1. The first part is an introduction where the required information which is required to answer the questionnaire has been explained. No questions are asked in the introduction. The second part contains questions regarding the personalia of the respondent. In the third part questions regarding the importance of the feedback aspects are being asked followed by the fourth and last part where all conformance analysis technique related questions have been asked. The questionnaire as it has been sent to the respondents is included in Appendix C

37 In the introduction part of the questionnaire the activities of the business process the event log and the essence of conformance analysis techniques are explained. In the personalia part the first questions have been asked. The most important question in the personalia part is the question which function the respondent fulfills within DSV. At the feedback aspects part all feedback aspects have been explained first. Next the feedback related questions have been asked. All feedback questions have the same format. This format has been depicted in figure 21. Figure 20 illustrates an example of a feedback aspect question. 8) Aspect 1: Fitness How important do you rate this feedback aspect in relation to other feedback aspect? Question number Question The feedback aspect for this question has been given. Next the respondent is asked to rate the feedback aspect in relation to the other feedback aspects. 1. Much less important 2. Less important 3. Neutral 4. More important 5. Much more important Possible answers A likert scale containing 5 options has been given to indicate the relative importance. Figure 20: Example of feedback aspect question. Figure 21: Format of feedback aspect question. In the conformance analysis technique questions part of the questionnaire we asked questions regarding the understandability of the presented conformance analysis technique. The questions in this part all have the same format. We used a format where we first explained what the goal of the technique is and about which feedback aspect the technique reports. Next, we explained briefly how the technique works. Eventually we displayed the results of the technique and presented two statements about each result. The statements could either be true or false. Additionally, there is an I don t know option. By using the provided information, the respondents are asked which of the presented statement are correct. 30

38 18) Technique 2: Ratio of traces that can be replayed in the process model versus traces that can not be replayed in the process model. Feedback aspect: Fitness (Fitness quantifies how much behavior in an event log is captured by a process model.) Calculation of the result: To calculate the result, the traces in the event log is replayed in the predefined process model. When modifications to the process model are required, the result of the technique lowers. A modification is for example skipping or adding an activity in the process model. The replay of a trace can require multiple modifications. A trace score 100% when it can be replayed without any modifications and 0% when each step requires a modification. Result: from 0 till 1. Graph: result January, February and March ,000 0,990 0,980 0,970 0,960 0,950 0,940 0,930 0,920 0,910 0,900 Question number Question number Question The feedback aspect for this question has been given. Next the respondent is asked to rate the Question The feedback aspect for this question has been given. Next the respondent is asked to rate the feedback aspect in relation to the other feedback aspects. Result It depends on the output of the technique which information is shown here Which of the next statements is correct? Statement 1: At January 9 th and March 28 th 2009 there was a big deviation from the theoretical model. Satement 2: In the period from Janary 9 th until January 19 th the deviation from the intended behavior grew stronger. Question Here the actual question has been posed. Two statements regarding the result have been given. 1. Both statements are correct. 2. Statement1 is correct, statement 2 is incorrect. 3. Statement 1 is incorrect, statement 2 is correct. 4. Both statements are incorrect. 5. I do not know the answer. Possible answers The possible combinations are presented. Additionally, the option I don t know has been added. Figure 22: An example of a technique question. Figure 23: The format of technique questions. The design of the techniques related question has been shown in figure 22. An example of a 31

39 technique related question has been shown in figure 23. A draft of the questionnaire has been tested before the final version has been realized. The reason to do this is to solve inconsistencies, remove difficult to understand words, alter questions that were experienced to be too difficult and check for typographical errors. The draft has been tested on twelve friends of mine. They provided useful feedback that has been used to correct the draft version. Once the survey has been corrected addresses have be acquired with help from workers at the IT department. A distribution list has been created at the survey builder website. By sending the invitations through the website it was possible to keep track of the respondents that had and had not yet filled out the questionnaire. This made it easy to send reminders to the correct persons in the population. 4.6 Questionnaire data Once the respondents have participated in the survey the data from the questionnaire will be collected. The ratings that have been assigned to the feedback aspects have been used in two ways. First the answers from the respondents have not been grouped by function. This information is used for the testing of hypothesis 1. Next the answers from the respondents are grouped by function. That information is used for the testing of hypothesis 2. The correct answers provided by the respondents to the conformance analysis technique questions have first been summed up and divided by the number of conformance analysis technique question regarding each feedback aspect. This results in an understandability percentage score for each feedback aspect. This score has also been used in two ways. First the understandability scores have not been grouped by function. The correct answers to the technique questions have been added in the table of Appendix D. This information has been used for the testing of hypothesis 3. Next the scores are grouped by function. That information is used for the testing of hypothesis Reliability & Validity Reliability is the consistency of the measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects [15]. Boeije et al. operationalized reliability as the presence of coincidental mistakes made during data entry [2]. To support the reliability of this research used the same likert scale for each of the feedback aspects in the questionnaire. In this way the respondents could anticipate the answers without having to read them time and again. The reliability of the technique related questions is supported by adding the option to indicate that the respondent does not know the answer to the question. This reduces the chances of respondents guessing the correct answer. Another important factor of reliability is the internal consistency. This estimates reliability by grouping questions in a questionnaire that measure the same concept [2]. We have applied this to the techniques questions. The approach I used was posing two different statements regarding the same presented technique. We decided not to ask two separate questions because this would have a significant effect on the length of the questionnaire where online questionnaires should be kept short to reduce mortality [2]. 32

40 Validity is the strength of conclusions, inferences or propositions [2]. According to Joppe (2000), validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are [15]. Problems with the validity of the research are serious because completely wrong conclusions might be drawn in such situations [2]. Respondents that provide socially accepted answers in questionnaires are structural problems that might spoil the validity of the research. In our questionnaire questions have been posed regarding the importance of the feedback aspects. If we would simply ask if each of the feedback aspects are important, the socially accept answer would be to rate each aspect important. This might not be the realistic opinion of the respondent. For this reason we slightly altered to question where the respondent has to rate the importance of the aspect in relation to the other aspects. To further minimize the effect of socially accepted answers we have guaranteed the respondents that their questionnaire results will be processed anonymously. 33

41 5. Results In this chapter we discuss the results from the statistical analysis. Section 5.1 discusses the analysis results from the questionnaire data. Section 5.2 discusses the reduced data set. The response of the questionnaire has been discussed in section 5.3. In section 5.4 each hypothesis test has been elaborated. Section 5.5 provides a summary of the results. 5.1 Analysis results In this analysis we have chosen to first test for significant differences in importance and understandability of conformance analysis techniques without and next with the group effect of the various functions of DSV employees. Hypotheses 1 and 2 have been used to test for differences of the importance while hypotheses 3 and 4 test for differences of the understandability. Hypotheses 1 and 3 do not take the group effect of functions into account, hypotheses 2 and 4 do. A statistical data analysis has been performed with SPSS version 17 on the questionnaire data gathered in chapter 4. The four hypotheses have been tested. The data that has been used for each hypothesis test has first been inspected using an exploratory data analysis. In the exploratory data analysis we have performed Kolmogorov-Smirnov and Shapiro-Wilk normality tests. Because all the data that has been used was not normally distributed we decided to perform a non-parametric approach. To detect the differences between the groups we have used a Kruskall-Walis and Median test to determine if the group differences are significant at the 95% interval level. This was the case for hypotheses 3 and 4. In these cases a post hoc analysis has been performed. The non-parametric approach used for the post hoc analysis consist of pair wise Mann-Whitney tests. Because various pairs of groups will be tested in that situation we have corrected the α levels using the Bonferroni correction to reduce the chance of obtaining a significant result after performing subsequent tests. Hypotheses 1 and 2 Hypotheses 3 and 4 Group differences Group differences Kolmogorov-Smirnov and Shapiro-Wilk Normality tests Kolmogorov-Smirnov and Shapiro-Wilk Normality tests Kruskall-Walis and Median test Kruskall Walis and Median test Differences identification Bonferroni correction Pairwise Mann-Whitney tests Table 4: Overview of analysis approach. Table 4 shows an overview of the tests that have been performed to test for group differences. As can be seen from table 4, no post hoc tests have been performed for hypotheses 1 and 2. This means that no significant differences between the importances have been found. For hypothesis 3 we detected three differences from the post hoc test. These indicated that fitness and violation techniques are understood significantly better than average and Generalization techniques are understood worse than average. The difference detected by the post hoc analysis of hypothesis 4 indicated that managers understood the structure techniques significantly 34

42 better than supervisors. 5.2 Data reduction While inspecting the data from the questionnaires we noticed that three respondents have conducted the questionnaire twice. The first time they entered the questionnaire they abandoned it within the first three questions. The second time, they finished the questionnaire. We assumed that these respondents started the questionnaire the first time around and something urgent came along. For this reason we have excluded the results from the first time these respondents entered the questionnaire to remove duplicate results from each respondent. 5.3 Response In total we have sent 48 invitations to fill out the questionnaire. Based on the answer given at question 6 we have classified the respondent as a manager, (process)engineer or supervisor. 39 persons have in fact participated in the research. A significant group (47,62%) has dropped the questionnaire before completing it. From the respondents that started but did not finish the questionnaire, 40% has abandoned the questionnaire at the start of the conformance analysis technique questions. The reason to abandon the questionnaire might be that it took too much time to complete or that the respondents experienced the questions as too difficult and therefore abandoned it. N Hypothesis 1 and 2 Hypotheses 3 and 4 Managers 10 8 (80%) 4 (40%) Engineers (94%) 13 (76%) Supervisors (92%) 5 (42%) Totals (90%) 22 (56%) Table 5: Overview of response and response rate 5.4 Data analysis & hypotheses testing In this section the tests of the different hypotheses have been discussed Difference of feedback aspect importance (not grouped by function) In this part we analyze the data collected to test the hypothesis 1. With hypothesis 1 we test if there is a difference between the importance of the feedback aspects. At hypothesis 1 we do not take the function of the respondent into account. For this analysis we only used the importance of feedback aspects data and have arranged this data in two columns. The first column contains the importance score which the respondents assigned to the feedback aspect, which has been depicted in the second column. We start by performing the exploratory data analysis. 35

43 The figure in Appendix E provides the descriptive statistics. The distribution of the data has been displayed graphically in figure 24. Figure 24: Box plot importance of feedback aspects (not grouped by function). The Kolmogorov-Smirnov and Shapiro-Wilk normality tests in Appendix F have been performed for each feedback aspect. Since the significance for each feedback aspect has a p-value that is lower than 0,05 (p 0, 05) we reject the null-hypothesis on the assumption of a normally distributed data set. This means that the data is not normally distributed or we can not proof that it is normally distributed. The most probable reason, for the fact that the feedback aspect data is not normally distributed, is that we have too little data available for the data set to be normally distributed. To verify if the differences between the importance of the feedback aspects are significant, we have to test if hypothesis 1 holds. We have formalized hypothesis 1 into a null hypothesis (H 0 ) and an alternative hypothesis (H 1 ). H 0 states that the distribution of each feedback aspect is identical. The alternative hypothesis (H 1 ) states that for at least one population in the distribution differs in location [16]. We can test this by establishing whether the independent populations in our data have identical distributions. With independent populations we mean the different feedback aspects. Since the data that we want to use for the testing of hypothesis 1 is not normally distributed, we will use a non-parametric data analysis approach. The non-parametric approach to test differences between three or more independent samples is a Kruskal-Wallis test [16]. 36

44 Ranks Importance Type Fitness Precision Generalization Structure Frequency Violation Location Total N Mean Rank 111,40 120,01 106,87 124,86 121,20 143,83 132,83 Figure 25: Kruskal-Wallis to test Hypothesis 1. Figure 25 shows the results of the Kruskal-Wallis test. In the upper part of the figure, the ranked averages for the data points have been calculated which have been used to determine if the differences in the ranked values are significantly different [16]. The test is based on the joined ranking of the observations from all samples. Since the asymptotic significance is greater than 0,05 (p = 0, 290) we decided not to reject the null hypothesis. To support the results found with the Kruskal-Wallis test, we have performed a median test as well. With a median test the medians from all groups are used to detect significant differences [16]. Like with the Kruskal-Wallis test, the asymptotic significance calculated here is also greater than 0,05 (p = 0, 495). This supports our decision not to reject the null hypothesis. From the two tests we can conclude, with a confidence of 95%, that there is no difference between the rating for importance of feedback aspects Difference of feedback aspect importance (grouped by function) Here we will test hypothesis 2. For this part of the analysis we use the data regarding the importance of the feedback aspects like with the testing of hypothesis 1. Additionally we will make use of the information regarding the function of the respondents. With hypothesis 2 we will test if the importance of feedback aspect score depends on the function of the respondent. Because the importance of feedback aspects data is grouped by function, we have split the original data set into three parts (one part for each function). Because of this modification, we have to perform the exploratory data analysis again. Because the value for importance of fitness is constant for supervisors it has been omitted from the descriptive statistics table since there is no information regarding the dispersion. It has been included in the boxplot. The boxplot for the importance of feedback aspects grouped by function has been shown in figure 26 The normality for the feedback aspects has been assessed for each function with the Kolmogorov- Smirnov and Shapiro-Wilk normality tests in Appendix G. Since the value for the importance of fitness is constant for supervisors, it is automatically not from the normal distribution. Next to the fitness aspect, the other aspect also seems to have problems with normality. This can be concluded from the significance values which are lower than 0,05 (p 0, 05). Either 37

45 Figure 26: Box plot importance of feedback aspects (grouped by function). Kolmogorov-Smirnov, Shapiro-Wilk or both test statistics report this problem for all aspects. For that reason we reject the null hypothesis, for each feedback aspect grouped by function, which states that the data points are normally distributed. Because of the problems with normality we also test hypothesis 2 using a non-parametric approach. Since we have exactly three independent samples (the function types) we again use the Kruskal-Wallis test [16]. Hypothesis 2 has been formalized in a null hypothesis (H 0 ) which states that there is no difference between the importance of a feedback aspect dependent on the function of the respondent. The alternative hypothesis (H 1 ) states that there is at least one function group that rated the importance of a feedback aspect significantly different than the rest of the function groups. Because we had to split the importance of feedback aspects data into separate groups, we ended up with smaller data sets. For that reason we have decided to perform an exact test instead of the asymptotic test alone. The exact significance results of the Kruskal-Wallis reported p-values for all feedback aspects which are greater than 0,05 (p 0, 05) which means that the chances of making a type I error (wrongly rejecting H 0 ) is greater than 5%. Therefore we do not reject H 0 for all feedback aspects. We have tried to support the results by performing the median test. However the assumption, that each quadrant in the median table contains at least five values, does not hold for each of the feedback aspects. For this reason, the median test cannot be used to support (nor 38

46 Ranks ImportanceFitness ImportancePrecision ImportanceGeneralization ImportanceStructure ImportanceFrequency ImportanceViolation ImportanceLocation Function Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total N Mean Rank 17,31 20,06 15,50 16,13 21,81 13,82 12,06 19,78 19,73 17,88 16,28 20,59 18,38 17,66 18,23 14,88 18,41 19,68 14,13 19,16 19,14 Figure 27: Kruskal-Wallis to test Hypothesis 2. contradict) the result from the Kruskal-Wallis test. From the results of the Kruskal-Wallis test we can conclude, with a confidence of 95%, that there is no difference between the rating for importance of feedback aspects based on the function of the respondents Understandability of techniques (not grouped by function) In the second part of the questionnaire we have presented the results from the conformance analysis techniques. The respondents were asked to indicate if each of the two statements in the questions were true or false. This data has been used to test hypothesis 3. With hypothesis 3, we want to test if conformance analysis techniques that report about a specific feedback aspect are easier to understand than conformance analysis techniques that report about another feedback aspect. To test this hypothesis we will use the scores for the conformance analysis technique question and the feedback aspect about which the conformance analysis technique reports. As with the testing of hypothesis 1 we will arrange this data into two columns. The first column 39

47 contains the score obtained by a respondent for conformance analysis techniques reporting about a feedback aspect. The corresponding feedback aspects is placed in the column next to the score. The figure in Appendix H provides the descriptive statistics. The differences in average for the conformance analysis scores per aspect are relatively large. This has be seen in the boxplot in figure 28. Figure 28: Box plot understandability of techniques (not grouped by function). The Kolmogorov-Smirnov and Shapiro-Wilk normality tests in Appendix I have been performed for each score per aspect. Since the significance for each score per aspect has a p-value that is lower than 0,05 (p 0, 05) we reject the null-hypothesis on the assumption of a normally distributed data set for each score-per-aspect population in this data. To test hypothesis 3, we will test if the differences between the scores per aspects are significant. Formalizing hypothesis 3 results in a null hypothesis (H 0 ), which states that the distributions of the scores per aspects are identical, and an alternative hypothesis (H 1 ), which states that for at least one population in the distribution differs in location [16]. We can test this by establishing whether the independent populations in our data have identical distributions. For this hypothesis test, the independent populations are the different scores per aspect. Again the data we use for hypothesis testing is not normally distributed. Therefore, we will use a Kruskal-Wallis test [16]. Figure 29 shows the results of the Kruskal-Wallis test. In the upper part of the figure, the 40

48 Ranks Score Type Fitness Precision Generalization Structure Frequency Violation Location Total N Mean Rank 105,70 69,23 43,68 65,23 82,20 105,70 70,75 Figure 29: Kruskal-Wallis to test Hypothesis 3. ranked averages for the data points have been calculated which have been used in the lower part of the figure to determine if the differences in the ranked values are significant. Since the asymptotic significance is smaller than 0,05 (p = 0, 000) we decided to reject the null hypothesis. Frequencies Type Score > Median <= Median Fitness 13 9 Precision 5 17 Generalizatio n 4 18 Structure 3 19 Frequency Frequencies Score > Median <= Median Type Violation Location Figure 30: Median-test to test Hypothesis 3. Additionally we have performed a median test (figure 30). Like with the Kruskal-Wallis test, the asymptotic significance calculated is smaller than 0,05 (p = 0, 001). This supports our results from the Kruskal-Wallis test. The Kruskal-Wallis and Median test both support the decision to reject H 0. Post-hoc test Now that we have established that there is at least one significance difference between the scores per aspect we are interested in the aspect(s) that caused the difference [16]. For this reason, we will perform a post-hoc analysis [16]. A suitable method to do this is to perform 41

49 a Mann-Whithney U test with corrected levels of significance [16]. We have to adjust the value for α because getting a significant result based on chance after subsequently testing grows. By applying a Bonferroni correction we make up for this [16]. To obtain the corrected significance value we have to divide α by the number of tests (α/n). The original level used is α = 0, 05. If we compare each dependent variable with each other, we make 21 comparisons. So the corrected significance level is 0, 05/21 = 0, Now we have determined the corrected p-value we can perform the Mann-Whithney U tests for pair wise comparison. Since we have not given any direction for the test, we will focus on the two-tailed exact significance value. This value is tested against the correct p-value (p c = 0, 00238). If the p-value that we obtained in the Mann-Whitney test is smaller than p c we will reject the null hypothesis H 0 which states that the distributions of the two tested populations are identical [16]. When we collect the significant differences from the pair wise tests we identify four different groups. These groups have been depicted in table 6. Group Fitness Precision Generalization Structure Frequency Violation Location A x x x x x B x x x x x x C x x x x x D x x x Table 6: Identified groups with pair wise comparison. We have constructed this table after conducting all pair wise comparisons. For each understandability score, we identified which understandability score was significantly different. This approach led to the four groups depicted in table 6. The groups in table 6 have been ranked from highest ranked average (group A) to lowest ranked average (group D). In order to draw conclusions from the results in table 6 we distinguish over-average, average and under-average understandability scores. The over-average category strictly contains elements in the A or A and B groups. The under-average category strictly contains elements in the D or C and D groups. The understandability scores that are neither in the over-average nor under-average category are part of the average category. The scores from the fitness and violation techniques are the only two feedback aspect that can be categorized in the over-average understandability group. The techniques from the precision, structure, frequency and location aspects are all categorized in the average understandability group. Techniques from the generalization aspect are the only techniques that fall in the under-average understandability group Understandability of techniques (grouped by function) The last part of the analysis deals with hypothesis 4. This hypothesis states that respondent understand conformance analysis technique that report about some feedback aspect better than conformance analysis techniques that report about another feedback aspect based on their function. The data from the testing of hypothesis 3 has been used. Additionally we use the function description to categorize the techniques scores for each aspect. 42

50 As with the testing of hypothesis 2, the data has been split into three categories (function). Therefore, the exploratory data analysis has to be performed again. The exploratory data analysis reported that the scores for fitness, generalization and violation are consistent for managers. As well as the score for structure from the supervisors. Therefore these values have been omitted from the descriptive statistics table. It has been included in the boxplots. Figure 31: Box plot understandability of techniques (grouped by function). The normality tests showed that each population contains at least one of the three groups experience problems with normality. This is either supported by the Kolmogorov-Smirnov or the Shapiro-Wilk (or both) tests. By testing hypothesis 4, we will establish whether the scores for the techniques that report about one feedback aspect differ between the function groups. Formalizing hypothesis 4 results in a null hypothesis (H 0 ), which states that the distributions of the scores per aspects are identical between the different groups, and an alternative hypothesis (H 1 ), which states that for at least one population in the distribution differs in location [16]. We can test this by establishing whether the independent populations in our data have identical distributions. For this hypothesis test, the independent populations are the different scores per aspect. Again the data we use for hypothesis testing is not normally distributed. Therefore, we will use a the Kruskal-Wallis test [16]. 43

51 Ranks ScoreFitness ScorePrecision ScoreGeneralization ScoreStructure ScoreFrequency ScoreViolation ScoreLocation Function Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total Manager Engineer Supervisor Total N Mean Rank 14,50 12,23 7,20 9,75 11,92 11,80 9,50 12,04 11,70 18,00 11,81 5,50 15,13 11,81 7,80 14,50 12,23 7,20 13,75 12,62 6,80 Figure 32: Kruskal-Wallis to test Hypothesis 4. Figure 32 shows the results of the Kruskal-Wallis test. As with the testing of hypothesis 2, we have chosen to compute the exact significance instead of the asymptotic significance alone since the data set has been divided into three smaller groups. The result of the Kruskal- Wallis test show that all exact significance values are greater than 0,05 (p 0, 05) except for the structure score (p = 0, 002). For the aspects with technique scores, that have p values greater than 0,05 we decide not to reject the null hypothesis. We have rejected the null hypothesis for the score of the structure aspect. To support this decision we have tried performed a median test. However the conditions to make conclusions based on the results from the median test did not hold. Therefore we could use the median test to support our decision. Based on the result from the Kruskal-Wallis test, we can conclude that there is at least one difference in distribution among the groups for the technique scores from the structure score. We will use a post-hoc analysis to detect which group has caused this difference. Post-hoc test We have found a difference in between groups regarding the scores for the structure aspect related techniques from the Kruskal-Wallis test. As with the post-hoc analysis for hypothesis 44

52 3, we will use Mann-Whitney U tests to perform pair wise comparisons to track down the difference(s) that caused the significance in the Kruskal-Wallis test. Since we require three pair wise tests to construct each possible combination we have to rule out the chance factor by using the Bonferroni correction. The variables for the Bonferroni correction in this case are the same α (α = 0, 05) but a number of pair wise tests of 3 (n = 3). Therefore the corrected p-value is p c = 0, 05/3. This makes a corrected p-value of p c = 0, Now we have determined the corrected p-value we can perform the Mann-Whithney U tests for pair wise comparison. Again, there is no direction to the test. Therefore we have to focus on the exact two-tailed significance value. This value is tested against the correct p- value (p c = 0, 0167). The test in figure 33 tested the difference in structure aspects score between the groups managers and engineers. The test in figure 34 tested for a significant difference for the structure aspects score between the managers and supervisors group. The last of the Mann-Whitney test is the test for significance difference between the engineers and supervisors group. Ranks ScoreStructure Function Manager Supervisor Total N Mean Rank 7,50 3,00 Sum of Ranks 30,00 15,00 Figure 33: pair wise Mann-Whitney analysis for groups managers and engineers. Ranks ScoreStructure Function Manager Supervisor Total N Mean Rank 7,50 3,00 Sum of Ranks 30,00 15,00 Figure 34: pair wise Mann-Whitney analysis for groups managers and supervisors. Ranks ScoreStructure Function Engineer Supervisor Total N Mean Rank 11,04 5,50 Sum of Ranks 143,50 27,50 Figure 35: pair wise Mann-Whitney analysis for groups engineers and supervisors. The exact 2-tailed significance values between the managers and engineers is 0,089 (p = 0, 089) figure 33. This values is greater than the corrected p-value (p c ). For this reason we do not reject the null hypothesis for the Mann-Whitney test between managers and engineers which states that there is no difference between the two groups regarding the techniques scores for the structure aspect. 45

53 The pair wise test between managers and supervisors resulted in a p-value of 0,008 (p = 0, 008) figure 34. This values is small than the corrected p-value (p c ). For this reason will reject the null hypothesis which states that there is no significance between managers and supervisors regarding the techniques scores for the structure aspect. The last pair wise test assessed the difference between engineers and supervisors. The p- value here is 0,036 (p = 0, 036) figure 35. This values is greater than the corrected p-value (p c ). For this reason we do not reject the null hypothesis for the Mann-Whitney test between engineers and supervisors which states that there is no difference between the two groups regarding the techniques scores for the structure aspect. From these three Mann-Whitney tests we can conclude, with 95% confidence, that the scores for the structure aspects related technique questions are easier to understand by managers than by supervisor. 5.5 Summary of analysis As can be seen in table 7, differences in the understandability of conformance analysis techniques have been found. Post hoc analysis have identified the types of these differences. Hypothesis Description Result 1 There is a difference between the importance of Rejected feedback aspects. 2 There is a difference between the importance of aspects Rejected based on the function. 3 There is a difference between the understandability of Accepted conformance analysis techniques that report about a specific feedback aspect. 4 There is a difference between the understandability of conformance analysis techniques that report about a specific feedback aspect based on the function. Accepted Table 7: Summary of hypothesis testing. The differences found with hypothesis 3 are characterized by: The fitness and violation techniques are understood significantly better than average. Generalization techniques are understood worse than average. The differences found with hypothesis 4 are characterized by: Managers understood the structure techniques significantly better than supervisors. 46

54 6. Conclusion In this final chapter we summarize this research in section 6.1. Limitations of this work are discussed in section 6.3 and directions for further work have been presented in section Summary A business process model can be created to prescribe how an operational business process should function. Conformance analysis techniques can be used to determine whether the business process is in fact being performed in the prescribed way. Conformance information is of interest to domain experts who are concerned with the quality of a business process. The type of conformance information they require has been researched here. In this study we have searched for an answer to our research question: Which conformance analysis information is most relevant for businesses? In this research we have defined relevance of conformance information as a combination of importance and understandability of conformance information. To answer the research question we first identified 24 existing conformance analysis techniques in a literature study. Next we categorized the feedback of these techniques into 7 feedback aspects. These feedback aspects are fitness, precision, generalization, structure, frequency, location and violation. To establish the importance of each of these feedback aspects and the understandability of the results from the conformance analysis techniques we have conducted a quantitative case study at DSV Solutions in Venray. Conformance analysis techniques have been executed using transactional data regarding the operational outbound business process. The results from these conformance analysis techniques have been used in an online questionnaire which has been distributed among managers, engineers and supervisors of DSV Solutions in Venray. In this questionnaire the importance of feedback aspects and understandability of conformance analysis techniques have been researched. For each of the feedback aspects we have asked the respondents to rank their importance in relation to the other feedback aspects. To measure the understandability of the conformance analysis techniques, we have given two statements after the result of a conformance analysis technique has been presented and asked whether each of these statements were true or false. About the importance of conformance information we can report that no significant difference between the importance of feedback aspects has been found. Neither did we find a difference in importance of feedback aspects based on the function of the respondents. Regarding the understandability of the conformance analysis techniques we can report that techniques regarding the fitness and violation aspects are understood significantly better on average. Conformance analysis techniques regarding the generalization feedback aspect are understood significantly worse on average. We also identified a difference in understandability 47

55 between employees with different functions. This difference indicated that managers of DSV Solutions in Venray understand conformance analysis feedback from structure techniques significantly better than supervisors of DSV Solutions in Venray. This concludes that conformance analysis techniques that report about the fitness and violation aspects are the most relevant for managers, engineers and supervisors of DSV Solutions in Venray. Additionally, techniques that report about the structure aspect are relevant for managers. The corresponding techniques are: Parsing Measure [35] (fitness), Token-based Fitness [22] (fitness), Fitness PFcomplete [10] (fitness), Footprint Similarity [33] (generalization), Behavioral Recall [10] (generalization), Structural Appropriateness [22] (structure), Structural Precision and Recall [22] (structure) and Duplicates Precision and Recall [22] (structure). The extent to which these results can be generalized to other organizations in additions to DSV Solutions in Venray involve logistic organizations that record transitional data in a warehouse management system and want to acquire conformance information regarding their business process. These results can support their decision on which of the available conformance analysis technique(s) to use. 6.2 Limitations The chosen research strategy in this research is a case study. However a case study can be used to perform an in depth analysis, conclusions from observations in a case study are not very generalizable. To draw more general applicable conclusions about the relevance of conformance analysis techniques research to a more heterogeneous research population containing respondents from different organizations within different sectors is required. The data that has been collected in this research exists of the importance of feedback aspects and the understandability of the conformance analysis techniques. Besides this information regarding the relevance, the practical applicability of the results from the conformance analysis techniques could be tested. Therefore the definition of relevance of conformance information could be extended with the usability of conformance analysis results. A panel research would be the appropriate strategy to test the usability of results of conformance analysis techniques where a panel of respondents repeatedly receives results of conformance analysis techniques and its usability is measured afterwards. The data collection method for this research is an online questionnaire which has been used to enquire information about the relevance of the conformance analysis techniques. A different data collection method could be considered when more time and finances would be available. A face to face data collection method would be ideally because it results in a high response rate and yields a good quality of the results. It is also suitable for long and complex questionnaires. Researchers should consider if these advantages outweigh high research costs and long project duration. The difficulty level, hence the level of devotion required to the questionnaire, was considerably high. This resulted in high mortality of the survey. The mortality at a face-to-face survey research would be significantly lower. The quality of the results obtained in the online questionnaire might have been influenced. 48

56 The quality of the importance of feedback aspects might have been influenced because the author has derived the feedback aspects from literature. To construct an evidence based feedback aspect classification, a panel of conformance analysis technique experts could be used to ask for such a classification. This expert panel could also be asked to translate the conformance analysis techniques from the technical and scientific language to a comprehensive text that is both comprehensible and practical applicable in business. In this research we have translated the techniques and aspects and subjected them to a test panel which had no prior knowledge of conformance analysis techniques and methods. The feedback from this test panel has been used to optimize the comprehension of the techniques and aspects. The use of an expert panel would probably lead to a better comprehension of the techniques and is hence preferable to the applied approach. The quality of the understandability of the conformance analysis techniques measured in the online questionnaire might have also been influenced. The order in which the questions regarding the techniques have been posed might have influenced the respondents understandability. It is likely that the respondent has a better understanding of fitness results for example once he has answered several questions regarding the fitness aspect earlier in the questionnaire. It might also be that respondents have influenced their understandability of the conformance analysis techniques by discussing the questions and answers. 6.3 Future work This exploratory research focused on detecting differences in relevance of the conformance analysis techniques. The relevance of conformance analysis techniques has been investigated by measuring the importance and understandability of conformance analysis techniques. We measured that techniques that report about the generalization aspect were more difficult to understand but are regarded equally important in relation to other feedback aspects. Therefore future work could be devoted to develop generalization techniques that are easier to understand than currently available generalization techniques. To improve the understandability of the generalization techniques, the presentation of fitness and violation technique results might be used since techniques reporting about these aspects were easier to understand. Another direction for future work is to investigate the reason(s) why the techniques that report about the fitness and violation are easier to understand. Similarly future work might be devoted to investigate why the techniques that report about generalization were more difficult to understand. 49

57 Bibliography [1] R. Agrawal, D. Gunopulos, and F. Leymann. Mining process models from workflow logs. Springer, [2] H. Boeije. Onderzoeksmethoden. Uitgeverij Boom, [3] A. L. M. Cavaye. Case study research: a multi-faceted research approach for is. Information Systems Journal, 6(3): , [4] J. E. Cook and A. L. Wolf. Toward metrics for process validation. In Software Process, Applying the Software Process, Proceedings., Third International Conference on the, pages 33 44, [5] J. E. Cook and A. L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th international conference on Software engineering, pages ACM New York, NY, USA, [6] J. E. Cook and A. L. Wolf. Software process validation: quantitatively measuring the correspondence of a process to a model. ACM Transactions on Software Engineering and Methodology, 8(2): , [7] P. Darke, G. Shanks, and M. Broadbent. Successfully completing case study research: combining rigour, relevance and pragmatism. Information Systems Journal, 40: , [8] A. de Medeiros, C. Pedrinaci, W. van der Aalst, J. Domingue, M. Song, A. Rozinat, B. Norton, and L. Cabral. An outlook on semantic business process mining and monitoring. pages Springer, [9] A. de Medeiros, A. Weijters, and W. van der Aalst. Genetic process mining: an experimental evaluation. Data Mining and Knowledge Discovery, 14(2): , [10] A. K. A. de Medeiros, W. M. P. van der Aalst, and A. J. M. M. Weijters. Quantifying process equivalence based on observed behavior. Data & Knowledge Engineering, 64(1):55 74, [11] R. Dijkman. Feedback on Differences between Business Processes. Beta, Research School for Operations Management and Logistics, [12] R. Dijkman. Diagnosing differences between business process models. In Proceedings of the International Conference on Business Process Models, volume Springer, [13] W. Gaaloul, K. Bana, and C. Godart. Log-based mining techniques applied to web service composition reengineering. Service Oriented Computing and Applications, 2(2):93 110, [14] G. Greco, A. Guzzo, L. Pontieri, and D. Sacc. Discovering expressive process models by clustering log traces. IEEE Transactions on Knowledge and Data Engineering, pages ,

58 [15] M. Joppe. The research process. Internet Communication, [16] S. Landau and B. Everitt. A handbook of statistical analyses using SPSS. CRC Press, [17] L. Maruster, A. Weijters, W. M. P. van der Aalst, and A. van den Bosch. Process mining: discovering direct successors in process logs. Lecture Notes in Computer Science, pages , [18] S. S. Pinter and M. Golani. Discovering workflow models from activities lifespans. Computers in Industry, 53(3): , [19] W. Reisig and G. Rozenberg. Lectures on Petri Nets: Advances in Petri Nets. Springer, [20] A. Rozinat, A. de Medeiros, C. W. Gunther, A. Weijters, and W. M. P. van der Aalst. The need for a process mining evaluation framework in research and practice. pages Springer, [21] A. Rozinat, A. K. A. de Medeiros, C. W. Gunther, A. Weijters, and W. M. P. van der Aalst. Towards an evaluation framework for process mining algorithms. BPM Center Report BPM-07-06, BPMcenter. org, [22] A. Rozinat and W. M. P. van der Aalst. Conformance checking of processes based on monitoring real behavior. Information Systems, 33(1):64 95, [23] V. Rubin, C. W. Gunther, W. van der Aalst, E. Kindler, B. F. van Dongen, and W. Schfer. Process mining framework for software processes. pages Springer, [24] W. M. P. van der Aalst. Business alignment: Using process mining as a tool for delta analysis and conformance testing. Requirements Engineering Journal, 10(3): , [25] W. M. P. van der Aalst. Exploring the cscw spectrum using process mining. Advanced Engineering Informatics, 21(2): , [26] W. M. P. van der Aalst, H. T. de Beer, and B. F. van Dongen. Process mining and verification of properties: An approach based on temporal logic. pages Springer, [27] W. M. P. van der Aalst and A. K. A. de Medeiros. Process mining and security: Detecting anomalous process executions and checking process conformance. Electronic Notes in Theoretical Computer Science, 121:3 21, [28] W. M. P. van der Aalst, J. Desel, and E. Kindler. On the semantics of epcs: A vicious circle. In Proceedings of the EPK, pages Citeseer, [29] W. M. P. van der Aalst, M. Dumas, C. Ouyang, A. Rozinat, and E. Verbeek. Conformance checking of service behavior. ACM Transactions on Internet Technology, [30] W. M. P. van der Aalst and M. Pesic. Specifying and monitoring service flows: Making web services process-aware. pages Springer,

59 [31] W. M. P. van der Aalst and A. Weijters. Process mining: a research agenda. Computers in Industry, 53(3): , [32] B. F. van Dongen, R. Dijkman, and J. Mendling. Measuring similarity between business process models. pages Springer, [33] B. F. van Dongen, J. Mendling, and W. M. P. van der Aalst. Structural patterns for soundness of business process models. In EDOC, volume 6, pages , [34] A. Weijters and W. M. P. van der Aalst. Process mining: discovering workflow models from event-based data. pages [35] A. Weijters, W. M. P. van der Aalst, and A. K. A. de Medeiros. Process mining with the heuristicsminer algorithm. Department of Technology, Eindhoven University of Technology, Eindhoven, The Netherlands, [36] L. Wen, W. van der Aalst, J. Wang, and J. Sun. Mining process models with non-freechoice constructs. Data Mining and Knowledge Discovery, 15(2): ,

60 Appendix A: A priori model 53

61 Appendix B: Mapping of process model elements 54

62 Appendix C: Questionnaire 55

63 Onderzoek naar feedback van verschilanalysetechnieken Beste medewerker van DSV, Hartelijk dank voor het deelnemen aan deze enquete. Deze enquete maakt deel uit van het onderzoek naar verschilanalysetechnieken. De enquete wordt verwerkt door ondergetekende in opdracht van DSV Solutions in Venray en de Technische Universiteit Eindhoven. De enquete bestaat uit 31 vragen en vraagt ongeveer 20 minuten van uw tijd. De enquete begint met een korte inleiding. Lees deze s.v.p. goed door. Allereerst wordt u een paar vragen gesteld die u als persoon beschrijven. Daarna worden 7 aspecten genoemd waarover verschilanalysetechnieken rapporteren. Er wordt u van elk van deze aspecten gevraagd in welke mate u hierin geïnteresseerd bent. Daarna worden de analyseresultaten gepresenteerd. Daar worden steeds 2 stellingen voorgelegd. Aan u de vraag of deze stellingen juist zijn. Graag wil ik u vragen eerlijk te antwoorden, daar waar u het antwoord niet weet is het voor het onderzoek waardevoller wanneer u de optie "ik weet het niet" selecteert dan wanneer u het juiste antwoord gokt. De resultaten blijven anoniem. Wanneer de grafieken/modellen in een vraag buiten beeld vallen is er een mogelijkheid om te scrollen. De scrolbalk bevindt zich dan onderaan de betreffende vraag. Bij voorbaat dank, Alain Martens

64 Inleiding Het doel van dit onderzoek is ontdekken welke technieken, die verschillen in het outbound proces analyseren, als meest relevant wordt beschouwd door domeindeskundigen uit het bedrijfsleven. Het outbound proces van DSV Solutions Venray bestaat uit de volgende activiteiten: 1) Pick (Verzamelen van producten uit het warehouse) 2) Pack (Verpakken van deze producten) 3) Load (Controleren en laden van producten in de trailer) 4) Ship (Administratief afhandelen en vervoeren van producten) De verschillen die geanalyseeerd kunnen worden met deze technieken zijn verschillen die kunnen bestaan tussen een bedrijfsproces, zoals het beoogd is, en het bedrijfsproces zoals het werkelijk uitgevoerd wordt. Hiermee wordt de theorie dus aan de realiteit getoetst. Veel van deze technieken maken gebruik van een eventlog en een procesmodel. Een eventlog is een lijst met uitgevoerde activiteiten, gespecificeerd per pickdetail. Een pickdetail is een product (of doos met producten) die uit een warehouse stelling gehaald moet worden om naar een klant te kunnen verschepen. Een activiteit is bijvoorbeeld het verpakken van een pickdetail over de verpakkingslijn. Een eventlog bevat dus informatie over hoe een bedrijfsproces werkelijk is uitgevoerd. Deze gegevens zijn dus tijdens een operationeel bedrijfsproces verzameld. Eventlog voorbeeld:

65 De reeks activiteiten die uit worden gevoerd voor 1 bepaald pickdetail wordt een procespad genoemd. Uit bovenstaand eventlog kunnen 3 verschillende procespaden geïdentificeerd worden: Pickdetail X) Release pickdetail -> Pick pickdetail -> Drop op Lane -> Lane validation -> Ship pickdetail Pickdetail Y) Release pickdetail -> Pick pickdetail -> Drop op Packarea -> Pack pickdetail -> Lane validation -> Ship pickdetail Pickdetail Z) Release pickdetail -> Pick pickdetail -> Drop op Packarea ->Pack on Packline -> Lane validation -> Ship pickdetail Een procesmodel geeft de theoretische werking van een bedrijfsproces grafisch weer. Hieronder is een voorbeeld van een procesmodel gegeven. Het proces begint links en eindigt rechts. Bovenstaande procespaden zijn in het onderstaande procesmodel terug te vinden.

66 Legenda: Paarse 6 hoek = Een status Groene rechthoek = Een activiteit

67 1) * Wat is uw adres? (Deze informatie wordt alleen voor dit onderzoek gebruikt.) 2) * Wat is uw geslacht? Man Vrouw 3) * Wat is uw geboortedatum? (dd-mm-jjjj) 4) * Hoeveel jaar heeft u voor DSV gewerkt? korter dan 1 jaar 1 tot 3 jaar 3 tot 5 jaar langer dan 5 jaar 5) * Welke beschrijving past het beste bij uw huidige functie? Manager

68 (Proces) Engineer Supervisor 6) * Hoeveel jaar vervult u deze functie? korter dan 1 jaar 1 tot 3 jaar 3 tot 5 jaar langer dan 5 jaar Er zijn 7 aspecten waarover verschilanalysetechnieken kunnen rapporteren. Op deze pagina wordt naar uw mening over deze aspecten gevraagd. Deze aspecten zijn: 1. Fitness Fitness geeft aan in welke mate er gebeurt wat er beschreven is. Met andere woorden geeft het aan hoeveel werkelijk uitgevoerd gedrag beschreven is in het procesmodel. 2. Precisie Precisie geeft aan hoeveel gedrag is gedocumenteerd (in het procesmodel) dat niet wordt uitgevoerd. 3. Generalisatie Generalisatie geeft aan of een proces model niet te precies is. Hiermee wordt aangegeven of het proces model voldoende abstract is. Wanneer er naast het gedrag in de event log totaal geen extra gedrag mogelijk is, dan is er geen generalisatie toegepast? 4. Structuur Structuur geeft aan of een bedacht procesmodel duidelijk leesbaar en goed gestructureerd is. 5. Frequentie Met frequentie wordt aangegeven hoe vaak een specifiek procespad is doorlopen. Hiermee wordt aangegeven hoe significant een bepaald procespad is in relatie met andere doorlopen procespaden in het bedrijfsproces. 6. Overtreding Met overtreding wordt aangegeven of er afspraken zijn die niet nagekomen worden. Dit aspect is

69 tegenovergesteld aan het fitness aspect. 7. Locatie Locatie geeft aan op welke plek in het bedrijfsproces het geobserveerde gedrag afweek van het procesmodel. 10) * Aspect 3: Generalisatie Hoe belangrijk vindt u het om informatie over hoe generaliserend uw bedrijfsproces is te krijgen ten opzichte van de andere aspecten? 11) * Veel onbelangrijker Onbelangrijker Even belangrijk Belangrijker Veel belangrijker Aspect 4: Structuur Hoe belangrijk vindt u het om informatie over hoe gestructuureerd uw procesmodel is te krijgen ten opzichte van de andere aspecten? 12) * Veel onbelangrijker Onbelangrijker Even belangrijk Belangrijker Veel belangrijker Aspect 5: Frequentie Hoe belangrijk vindt u het om informatie over de frequentie van procespaden in uw bedrijfsproces te krijgen ten opzichte van de andere aspecten? Veel onbelangrijker

70 13) * Onbelangrijker Even belangrijk Belangrijker Veel belangrijker Aspect 6: Overtreding Hoe belangrijk vindt u het om informatie over overtredingen in uw bedrijfsproces te krijgen ten opzichte van de andere aspecten? 14) * Veel onbelangrijker Onbelangrijker Even belangrijk Belangrijker Veel belangrijker Aspect 7: Locatie Hoe belangrijk vindt u het om informatie over de locatie van verschillen in uw bedrijfsproces te krijgen ten opzichte van de andere aspecten? Veel onbelangrijker Onbelangrijker Even belangrijk Belangrijker Veel belangrijker In dit laatste deel van de enquete worden analyseresultaten gepresenteerd. Per feedback wordt u steeds 2 stellingen voorgelegd. 16) *

71 Feedback 1: In theoretisch model af te spelen versus niet af te spelen procespaden Aspecten: Fitness & Overtreding (Fitness geeft aan in welke mate er gebeurt wat er beschreven is. Met andere woorden geeft het aan hoeveel werkelijk uitgevoerd gedrag beschreven is in het procesmodel. & Overtreding geeft aan of er afspraken zijn die niet nagekomen worden. Dit aspect is tegenovergesteld aan het fitness aspect.) Deze techniek meet hoeveel procent van alle procespaden in een eventlog (de realiteit) ook afgelegd kan worden volgens het theoretisch model. Een procespad scoort dus 100% wanneer het af te spelen is of 0% wanneer het niet af te spelen is. Dit wordt gedaan voor alle procespaden in de eventlog. Resultaat = Van 0 tot 1 Grafiek = resultaat januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Op 9 januari zijn er geen procespaden doorlopen die niet in het theoretische model staan. Stelling 2: Februari 2009 is de maand waarin gemiddeld het meest volgens het theoretische model is gewerkt (in vergelijking met januari en maart van 2009). Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar

72 17) * Ik weet het niet Feedback 2: In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering Aspecten: Fitness & Overtreding (Fitness geeft aan in welke mate er gebeurt wat er beschreven is. Met andere woorden geeft het aan hoeveel werkelijk uitgevoerd gedrag beschreven is in het procesmodel. & Overtreding geeft aan of er afspraken zijn die niet nagekomen worden. Dit aspect is tegenovergesteld aan het fitness aspect.) Om het resultaat te berekenen worden de procespaden van de eventlog (realiteit) opnieuw in het theoretische model afgespeeld. Als het nodig is aanpassingen te maken in het theoretische model, om de procespaden uit de eventlog correct af te laten spelen, heeft dit een negatieve invloed op het resultaat Een aanpassing is bijvoorbeeld het overslaan of toevoegen van een activiteit in het procesmodel. Het afspelen van 1 procespad kan meerdere aanpassingen vereisen. Een procespad scoort dus 100% als het volledig zonder aanpassingen af te spelen is en 0% als voor iedere stap een aanpassing nodig is. Scores tussen 0% en 100% zijn dus mogelijk (dit is dus de nuancering). Resultaat = Van 0 tot 1 Grafiek = resultaat januari, februari en maart van 2009 Om de variatie in het resultaat duidelijk weer te geven is aan de y-as gekozen voor een schaal van 0,9 tot 1. (In werkelijkheid loopt de schaal van 0 tot 1)

73 Zijn de volgende stellingen waar? Stelling 1: Op 9 januari en 28 maart 2009 is er compleet afgeweken van het theoretische model. Stelling 2: In de periode van 9 januari tot en met 19 januari is er steeds verder afgeweken van het theoretische model. 18) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 2.1: Grafische weergave van "In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering" Aspecten: Locatie & Frequentie (Locatie geeft aan op welke plek in het bedrijfsproces het geobserveerde gedrag afweek van het procesmodel. & Met frequentie wordt aangegeven hoe vaak een specifiek procespad is doorlopen. Hiermee wordt aangegeven hoe significant een bepaald procespad is in relatie met andere doorlopen procespaden in het bedrijfsproces.) Het is mogelijk om dit resultaat grafisch weer te geven. Onderstaand figuur laat hier 1 resultaat zien. De eventlog met gegevens van januari, februari en maart van 2009 is afgespeeld in het procesmodel. Daar waar aanpassingen nodig waren is de locatie in het bedrijfsproces rood gekleurd. De getallen in de cirkels geven aan hoe vaak er aanpassingen moesten worden gedaan. Een positief getal geeft aan hoe vaak de volgende logische stap in het theoretische model in werkelijkheid niet is uitgevoerd terwijl dit wel verwacht was. Een negatief getal geeft aan hoe vaak de volgende stap in het procesmodel in werkelijkheid is uitgevoerd terwijl dit volgens het theoretische niet mogelijk was. Onderstaand analyseresultaat laat zien dat in de maanden januari, februari en maart van 2009 er 4074 maal een probleem ontstond bij de activiteit "drop pickdetail op Packarea". (Met deze activiteit wordt een pickdetail op de Packarea neergezet). Door met de muis over de rode status met te gaan verschijnt er extra informatie. Hierin staat dat er voor 1358 instanties (lees pickdetails) er 3 maal een andere activiteit vewacht was dan uitgevoerd is. De groen omlijnde

74 activiteiten zijn activiteiten die tot het uitgevoerde procespad behoren. Zijn de volgende stellingen waar? Stelling 1: In bovenstaand resultaat ontstond er een probleem bij 4074 pickdetails. Stelling 2: In bovenstaand resultaat is elk pickdetail 3 maal op de packarea neergezet. 19) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 2.2: Log weergave van "In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering" Aspecten: Locatie & Frequentie (Locatie geeft aan op welke plek in het bedrijfsproces het geobserveerde gedrag afweek van het procesmodel. & Met frequentie wordt aangegeven hoe vaak een specifiek procespad is doorlopen. Hiermee wordt aangegeven hoe significant een bepaald procespad is in relatie met andere doorlopen procespaden in het bedrijfsproces.) Tenslotte is het mogelijk om het resultaat weer te geven in een procespad. Onderstaand analyseresultaat laat zien dat in de maanden januari, februari en maart van 2009 er 5918 maal een pickdetail over de verpakkingslijn is verpakt terwijl deze reeds op de outbound lane gezet is.

75 Zijn de volgende stellingen waar? Stelling 1: In bovenstaand resultaat is de activiteit "Pack pickdetail op Packline" in werkelijkheid uitgevoerd waar dit niet verwacht is door het theoretische model. Stelling 2: In de maanden januari, februari en maart van 2009 is bovenstaand procespad 5918 maal doorlopen.

76 20) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 3: In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering (gewogen) Aspecten: Fitness & Overtreding (Fitness geeft aan in welke mate er gebeurt wat er beschreven is. Met andere woorden geeft het aan hoeveel werkelijk uitgevoerd gedrag beschreven is in het procesmodel. & Overtreding geeft aan of er afspraken zijn die niet nagekomen worden. Dit aspect is tegenovergesteld aan het fitness aspect.) Dit resultaat lijkt erg op de "In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering" maar neemt ook de frequentie van procespaden in acht. Zo worden problemen in procespaden die vaak voorkomen belangrijker geacht dan problemen in procespaden die niet vaak voorkomen in de eventlog (realiteit). Wanneer er zich een probleem voordoet in een belangrijk procespad heeft dit een grotere negatieve impact op de score dan een probleem in een zelden voorkomend procespad. Resultaat = Van 0 tot 1 Grafiek = resultaat januari, februari en maart van 2009

77 Om de variatie in de resultaten duidelijk weer te geven is aan de y-as gekozen voor een schaal van 0,9 tot 1. In werkelijkheid loopt de schaal van 0 tot 1. Ter vergelijking zijn hieronder de resultaten van "In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering" en "In theoretisch model af te spelen versus niet af te spelen procespaden + nuancering (gewogen)" beide in 1 grafiek uitgezet. Zijn de volgende stellingen waar? Stelling 1: Er kan met zekerheid worden gezegd dat er op 19 januari 2009 van belangrijker geachtte procespaden is afgeweken dan op 12 maart 2009 aangezien het verschil tussen het gewogen resultaat en het niet gewogen resultaat op 12 maart kleiner is dan het verschil op 19 januari. Stelling 2: Op 9 januari 2009 had het gewogen resultaat de waarde 1. Dit komt doordat er die dag niet is afgeweken van het theoretische model. Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet

78 22) * Feedback 4: Gedrag in model versus geobserveerd gedrag Aspect: Precisie (Precisie geeft aan hoeveel gedrag is gedocumenteerd (in het procesmodel) dat niet wordt uitgevoerd.) Het resultaat geeft aan hoeveel extra gedrag in het theoretische model is opgenomen dat niet is geobserveerd en dus niet in de eventlog staat. Bij een score van 1 is al het gedrag dat mogelijk is in het theoretische model dus ook uitgevoerd in werkelijkheid. Resultaat = Van 0 tot 1 Grafiek = resultaat over januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Er kan met zekerheid gesteld worden dat in heel februari 2009 in werkelijkheid niet meer gedrag heeft plaatsgevonden dan omvat is in het theoretische model. Stelling 2: Op 28 maart 2009 is niet al het gedrag, dat omvat is in het theoretische model, in werkelijkheid uitgevoerd. Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet 23) *

79 Feedback 5: Model van geobserveerd gedrag vergeleken met procesmodel (op basis van geobserveerd gedrag in eventlog) Aspect: Precisie (Precisie geeft aan hoeveel gedrag is gedocumenteerd (in het procesmodel) dat niet wordt uitgevoerd.) Om het resultaat te berekenen wordt er gebruik gemaakt van een theoretisch model, een eventlog en een procesmodel dat afgeleid is van de eventlog. Dat laatste procesmodel is dus een procesmodel van de werkelijkheid. De eventlog wordt afgespeeld in zowel het afgeleide procesmodel als het theoretische model. Voor het resultaat wordt bepaald hoeveel extra gedrag het afgeleide procesmodel toestaat in vergelijking met het theoretische model en de eventlog. Een zwaarder wegende factor wordt toegekend aan een procespad dat vaker voorkomt. Resultaat = Van 0 tot 1 Grafiek = resultaat over januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Het is waarschijnlijk dat er op 4 februari 2009 veel gedrag uitgevoerd is dat niet beschreven is in het theoretische model. Stelling 2: De procesmodellen, die afgeleid werden van de eventlogs van 2 en 4 februari, kunnen gezien deze resultaten en vanuit theoretisch aspect niet gelijk zijn aan elkaar. Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar

80 24) * Ik weet het niet Feedback 6: Procesmodel vergeleken met model van geobserveerd (op basis van geobserveerd gedrag in eventlog) Aspect: Generalisatie (Generalisatie geeft het tegenovergestelde van precisie aan. Ofwel, gebeuren er dingen die niet gedocumenteerd zijn (in het procesmodel)?) Dit resultaat wordt op dezelfde manier berekend als bij de vorige vraag. Echter wordt er bij deze techniek gekeken hoeveel extra gedrag het theoretische model toestaat in vergelijking met het afgeleide procesmodel en de eventlog. Zijn de volgende stellingen waar? Stelling 1: Op 9 januari was er relatief veel gedrag dat wel in het theoretische model maar niet in het procesmodel, dat is afgeleid van de eventlog van 9 januari, zat. Stelling 2: De score van deze techniek is gemiddeld hoger dan de score van de vorige techniek. Dit wijst erop dat het theoretische model minder gedrag bevat dan in werkelijkheid uitgevoerd wordt. Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet

81 25) * Feedback 7: Overeenkomst tussen procesmodel en model van geobserveerd gedrag. Aspecten: Precisie & Generalisatie (Precisie geeft aan hoeveel gedrag is gedocumenteerd (in het procesmodel) dat niet wordt uitgevoerd. & Generalisatie geeft het tegenovergestelde van precisie aan. Ofwel, gebeuren er dingen die niet gedocumenteerd zijn (in het procesmodel)?) Voor deze techniek wordt van de eventlog een procesmodel gemaakt. Dit model wordt vergeleken met het theoretische model. Het resultaat geeft aan hoeveel procent de 2 modellen overeenkomen. Hieronder is analyse uitgevoerd voor 9 januari van Rechts staat het theoretische model. Links staat het procesmodel dat van de eventlog van 9 januari is gemaakt. Zijn de volgende stellingen waar? Stelling 1: Het resultaat van 68,83% zegt alleen iets over de overeenkomst van mogelijk gedrag in de 2 modellen, niet over de doorlopen procespaden in de eventlog. Stelling 2: Als bovenstaande modellen van plaats zouden wisselen zou het resultaat anders zijn dan degene die hier berekend is.

82 26) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 8: Structuur van procesmodel beoordeeld op basis van geboserveerd gedrag in eventlog. Aspect: Structuur (Structuur geeft aan of een bedacht procesmodel duidelijk leesbaar en goed gestructureerd is.) Het resultaat geeft aan hoe goed leesbaar het theoretische model is op basis van geobserveerd gedrag in de eventlog. Als het resultaat de waarde 1 heeft, wil dat zeggen dat er geen redundante en duplicate elementen in het procesmodel zitten met betrekking tot de geteste eventlog. Redundante elementen zijn elementen die verwijderd kunnen worden zonder het gedrag van het procesmodel te wijzigen op basis van de eventlog. Duplicate elementen zijn elementen die dubbel in het procesmodel zitten zonder dat dit nodig is op basis van de eventlog. Resultaat = Van 0 tot 1 Grafiek = resultaat van januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Op basis van de eventlog bevatte het theoretische model veel redundante en duplicate

83 elementen op 28, 29 en 30 maart. Stelling 2: Op 10 februari bevatte het theoretische model geen elementen die op basis van de eventlog van die dag weggelaten konden worden. 27) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 9: Structuur van model van geobserveerd gedrag vergeleken met structuur van theoretisch proces model (op basis van geobserveerd gedrag in eventlog). Aspect: Structuur (Structuur geeft aan of een bedacht procesmodel duidelijk leesbaar en goed gestructureerd is.) Voor het berekenen van het resultaat wordt een procesmodel gemaakt van de eventlog van een dag. Voor dat procesmodel wordt gekeken hoeveel structurele elementen opgenomen zijn die niet in het theoretische model zitten. Met structurele elementen worden de relaties tussen de verschillende activiteiten bedoeld. Resultaat = Van 0 tot 1 Grafiek = resultaat van januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Het aantal structurele elementen in het theoretische model hoeft op 4 februari niet gelijk te zijn geweest aan het aantal structurele elementen in het procesmodel dat is afgeleid van

84 de eventlog van 4 februari. Stelling 2: Het verschil tussen het aantal structurele elementen in het procesmodel, dat gemaakt is met de eventlog van 2 maart en het theoretische model, is kleiner dan het verschil tussen het aantal structurele elementen in het procesmodel dat gemaakt is met de eventlog van 28 februari en het theoretische model. 28) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 10: Structuur van theoretisch proces model vergeleken met structuur van model van geobserveerd gedrag (op basis van geobserveerd gedrag in eventlog). Aspect: Structuur (Structuur geeft aan of een bedacht procesmodel duidelijk leesbaar en goed gestructureerd is.) Het resultaat wordt op dezelfde manier berekend als bij de vorige vraag, alleen wordt er hier gekeken hoeveel structurele elementen opgenomen zijn in het theoretische model die niet in het procesmodel, dat af is geleid van de eventlog, zitten. Resultaat = Van 0 tot 1 Grafiek = resultaat van januari, februari en maart van 2009 Zijn de volgende stellingen waar? Stelling 1: Het resultaat is gemiddeld lager dan het resultaat van de vorige vraag. Dit houdt in dat

85 het procesmodel, dat afgeleid kan worden van de eventlog, over het algemeen gedetaileerder gestructureerd is dan het theoretische model. Stelling 2: Op 29 maart naderde dit resultaat het nulpunt. Dit houdt in dat bijna alle structurele elementen uit het procesmodel, dat afgeleid kan worden met de eventlog van 29 maart, niet terug te vinden waren in het theoretische model. 29) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 11: Testen van specifiek gedrag middels formules Aspecten: Frequentie (Met frequentie wordt aangegeven hoe vaak een specifiek procespad is doorlopen. Hiermee wordt aangegeven hoe significant een bepaald procespad is in relatie met andere doorlopen procespaden in het bedrijfsproces.) Met deze techniek kunnen formules worden gemaakt die gebruikt kunnen worden om te testen of procespaden in de eventlog (realiteit) specifiek gedrag vertonen (of juist niet). Zo kan er bijvoorbeeld worden gemeten of een pallet verplaatst is op de packarea nadat hij hier gedropt is. De eventlog die hier gebruikt is bevat gegevens over de maanden januari, februari en maart van De formule die hier gebruikt is: Is de activiteit "Drop pickdetail op Packarea" minimaal 2 maal uitgevoerd? Zijn de volgende stellingen waar? Stelling 1: Over de maanden januari, februari en maart zijn er 2268 pickdetails geweest die minimaal 2 maal op de packarea zijn neergezet. Stelling 2: Het totaal aantal pickdetails dat in de maanden januari, februari en maart is verwerkt is =

86 30) * Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar Beide stellingen zijn niet waar Ik weet het niet Feedback 11.1: Procespad view van "Testen van specifiek gedrag middels formules" Aspecten: Frequentie (Met frequentie wordt aangegeven hoe vaak een specifiek procespad is doorlopen. Hiermee wordt aangegeven hoe significant een bepaald procespad is in relatie met andere doorlopen procespaden in het bedrijfsproces.) Na het uitvoeren van de analyse in de vorige vraag is het ook mogelijk om de procespaden weer te geven die voldoen aan de formule die getest is. Ook is het mogelijk om de procespaden weer te geven die hier niet aan voldoen. In de figuur hieronder is dezelfde formule getest als in de vorige vraag. De lijst met getallen aan de linkerkant van de figuur identificeren de pickdetails die voldoen aan de gestelde formule (die daarboven staat). De getallen erachter tussen haakjes zijn het aantal pickdetails die het exact zelfde procespad hebben gevolgd. Het geselecteerde procespad wordt aan de rechterkant van het scherm getoond.

87 Zijn de volgende stellingen waar? Stelling 1: 1358 pickdetails hebben het getoonde procespad uitgevoerd waarbij de activiteit "drop pickdetail op packarea" 4 maal is doorlopen. Stelling 2: De activiteit "drop pickdetail op packarea" is 2 maal teveel uitgevoerd voor het getoonde procespad. Beide stellingen zijn waar Stelling 1 is waar, stelling 2 is niet waar Stelling 2 is waar, stelling 1 is niet waar

88 31) * Beide stellingen zijn niet waar Ik weet het niet Feedback 12: Verschillen anayse met tekstuele uitleg Aspect: Locatie (Locatie geeft aan op welke plek in het bedrijfsproces het geobserveerde gedrag afweek van het procesmodel.) Met deze analyse techniek worden 2 procesmodellen vergeleken. Het theoretische model en een model dat afgeleid is vanuit de eventlog van een bepaalde dag. Het resultaat van de techniek geeft een lijst met verschillen tussen deze 2 modellen en verklaart wat de verschillen zijn. Hieronder is het resultaat van een dergelijke analyse van 9 januari 2009 gegeven. Rechtsboven: Het procesmodel dat verkregen is uit de eventlog van 9 januari Rechtsonder: Het procesmodel dat vooraf is gedefinieerd. Linksboven: De mapping tussen de 2 procesmodellen geeft aan hoe de activiteit benoemd is in het andere procesmodel. Linksonder: De locatie en identificatie van de gevonden verschillen.

Nog meer weergeven