Experimental Studies in Empirical Software Engineering

by Praveen Kallakuri and Sebastian Elbaum

Introduction

In February 1999, a report to the President from the Presidential Information Technology Advisory Committee [9] emphasized the incredible successes of information technology as well as its dangerous failings. Citing the recommendations of the report, Frank D. Anger [1], Program Director for the Software Engineering and Languages Program (SELP), suggested a change in the way we "do" software engineering. He remarked, "We need only look at the failure of software projects (over 70%) and the number of completed systems that are never used to agree that we are far from having a science [of software engineering]." He wrote that future studies should be directed towards building a science of construction underneath software engineering, which should help practitioners to choose between viable alternatives and to predict or even simulate their process and product behavior. This science, he wrote, is largely an empirical one, since the measure of its correctness is found in its power of prediction and construction in the real world of software development, not in its elegance and logical cohesion.

This article first provides an overview of the definitions and strategies necessary when conducting an empirical study. Next, we examine the concepts and goals of experimental software engineering. Finally we examine the structure of experimental studies. The structure of experimentation advocated in this work is based on the proposals of Perry, Porter, and Votta [8]. We will make our discussion concrete by looking at an experimental study on the textual differencing method of regression testing underway by the Measurement, Analysis, Profiling, Software Testing, EXperiments Team(MAPSTEXT) [6] at the University of Nebraska-Lincoln.

Definition of Empirical Research

Empirical research can be defined as research based on the observation to discover an unknown or test a hypothesis [5]. Empirical research is characterized by an investigator gathering data and performing analyses to determine the meaning of the data, and encompasses the following research strategies:

  1. The experiment provides the researcher with control over some of the conditions in which the study takes place by manipulating independent factors to elicit responses from the dependent factors.

  2. The observation of a single data point, through anecdotes and case studies, which investigate real-life phenomena in the context of a current theory.

  3. A demonstration of technology on selected subjects.

Among these strategies, experimentation is expensive and complex, yet the most effective for testing a hypothesis. The strength of this approach lies in the possibility to systematically study cause and affect by manipulating the variables of a system. In addition to control, repetition is another attribute of experimentation. Repetition allows greater certainty of observation.

Anecdotal and case studies are based on observation of actual practice, however the researcher cannot exercise the same level of control over the study that is possible in experimentation. Another practical problem is that it is difficult to find case studies that match the researcher's goals.

The third strategy is the demonstration of a concept on selected subjects. This method is most effective when a study has the limited scope of emphasizing the utility of an existing technique.

For the rest of this discussion, we will concentrate on the first of these strategies, experimental research.

The Concept and Structure of Experimentation

A theory is well supported if its conclusions can be validated by a logical as well as physical explanation of its workings. The logical component explains the idea behind the theory and lends it credence; the physical component actually implements the methodology and derives the conclusions held by the logical component through observation and analysis.

At MAPSTEXT, we follow the experimental approach. In addition to substantiating our hypotheses logically, we also implement them on subjects often selected from industry. We use the method that best meets the requirements of the theory and observe whether the expected conclusions can be derived. Most of the doubts pertaining to the practicability of a method are quelled when we simulate it and show how it works. Empirical evaluation of a concept assumes the primary role in the way we conduct research.

In the next section, we examine the components that constitute the structure of an empirical study. Most of the steps we discuss are based on the structure proposed by Perry, Porter and Votta [8]. An article on experimentation would seem incomplete without an example. Hence, the description of each step is followed by an instance, which is actually a study being performed at MAPSTEXT. We examine this study in the light of the structure we define for experimental software engineering. Though the experiment is still underway, it serves the purpose of explaining how the structure can actually be used in an experiment.

Problem Definition

Among the expectations from a study, Anger [1] discusses the "positioning of the proposed investigation with respect to other related work." Wherever possible, the goals in our study should be based on questions not resolved by previous research.

The industry also adds perspective to the problem. By observing practices in industry, we gain significant input on the expectations of our research.

Once we process the inputs from previous research and from industry, we have the problem's background in perspective. We proceed by focusing on specific portions of the problem, which our study will investigate. The illustration that follows examines how observations and inputs from industry and previous research lead us to the problems that define the scope of our study.

Background to our study

The focus of our study at MAPSTEXT is regression testing, which is the process of testing software to verify that the modified program still works correctly on all of the test cases used to test the original program. Current practices in the industry for regression testing are primitive [7]. Normally, a test suite is executed in its totality during system testing after each code change and before a formal release. This is as inefficient as it is costly and time-consuming. Nevertheless, in the absence of cheaper and more efficient alternatives, the industry has been following this mode of testing.

In an effort to tackle this problem, the research community proposed selective regression testing wherein only the impacted modules in the source code are targeted. Quite a few techniques have been proposed to implement selective regression testing. Rothermel and Rosenblum [2] have described a comparative evaluation of different techniques for selective regression testing on a set of experimental subjects. Rothermel and Harrold [10] [11] have also presented a series of safe algorithms and empirical studies for regression test selection. In particular, the two methods DejaVu and TestTube developed by the authors are evaluated and compared to show that neither can be both precise and efficient.

Vokolos and Frankl [13] developed a different approach to selective regression testing, called textual differencing. Initially, a mapping is formed from test cases to basic blocks of the original source code. Next, modified statements in a program are identified by comparing the old and new versions of the source code. Finally, regression testing is performed by executing those test cases that map to the basic blocks containing the modified program statements. Vokolos and Frankl [14] also developed a tool, Pythia, based on this approach. The efficacy and precision of the tool were later empirically substantiated in [15] using a program written for the European Space Agency. Our work begins from where this study left off.

Scope of our study

The experiment by Vokolos and Frankl [15] empirically evaluated only one instance of their approach; replication of the method is deemed necessary to ensure the absence of any circumstantial results specific to that implementation. Hence, the technique needs to be tested on software from different domains. Moreover, the concept was not implemented across multiple versions of source code, which is the norm of software development in the industry. Our objective is to focus and improve upon these aspects of textual differencing.

Hypothesis

The next stage is to define a hypothesis and use it to generate high-level, abstract questions. The use of abstraction helps us break down our hypotheses into definite questions, which can be answered quantitatively. By assimilating, we can derive conclusions to our abstract questions. Since the abstract questions are derived from the initial hypothesis, their answers validate the latter.

The hypotheses of our study

Our objective of our study is to prove that textual differencing could minimize the cost and time involved in the regression testing of software. The study in [13] concentrated on implementing the methodology in one subject. The concept must be tested on different subjects representative of different domains to strengthen the results. Therefore, another objective of this study is to use different subjects for validating our hypotheses.

Our first hypothesis predicts using this technique will reduce the percentage size of the test suite compared to the original test suite. It is implied that this will also cause a reduction in the length of the regression-testing phase. The second hypothesis is that in using the selected test suite, there will not be any percentage decrease in the number of faults detected compared to the case when the complete test suite is used. We refer to the number of faults detected by a test suite as the power of the test suite.

From the above hypotheses, the questions posed in our study are:

  1. By using textual differencing, can we reduce the percentage size of the test suite compared to the original test suite?

  2. In using the technique, can we retain the power of the test suite?

It is important to note that these questions are interdependent. At the end of the study, it is not very useful to have a smaller test suite, which detects a smaller percentage of faults in relation to the original test suite. Our idea is to keep the power of the test suite constant, while reducing its size. The best case is that we may be able to reduce the size of the test suite while retaining its power, and vice versa.

These questions can be further defined:

  1. What fraction of the original test suite is the selected test suite?

  2. What is the time taken to execute the original test suite?
  3. What is the time taken to execute the selected subset of the original test suite?

  4. For a given version, what is the number of faults detected by using the original test suite?

  5. For a given version, what is the number of faults detected by using the selected subset of test suite?

These are only some of the many concrete questions that will be encountered through the course of the study. These questions are simple; they can be answered directly by aggregating our results across various versions and subjects.

Subjects

In order to establish the empirical validity of a theory, we need to test it on subjects from different domains. This is to eliminate the effect of circumstantial variables in the experiment's environment. This step involves the preparation of objects that need to be processed and manipulated by our experiments. Some of the concerns we keep in mind while selecting a subject are:

  1. The subject should be amenable to the implementation of experiments. In other words, the subject should facilitate experiments that are needed to answer our questions. As we will see in the instance below, this might mean that we have various versions of the source code.

  2. Ensure that the subjects facilitate experiments that are repeatable, controlled [1], and generic. Repeated simulation and controlled variation of dependent variables is necessary for us to generalize our conclusions. Hence, we should make sure that the selected subjects allow this flexibility.

  3. Selecting the right subjects has some constraints. It is difficult to find a subject that suits the goals of our study; on the other hand, preparing a subject is usually expensive in terms of both time and money. We have to determine whether a subject is germane enough to our goals so that a marginal amount of subject preparation will suffice our needs.

The exact nature of subject preparation will be more lucid from the following illustration, which presents the subjects being used in our study, preparing them for the experiments and other data sources.

Subject identification and preparation in our study

Each research member at MAPSTEXT [6] is working on preparing and experimenting a different subject for substantiating our hypotheses. Our subjects are the source code of programs under development. At the time of writing this, three programs are being prepared for the study:

  1. The UNIX Shell Program BASH from GNU
  2. The GZIP compressor/decompressor utility for UNIX from GNU
  3. The client mail program MAILX, which comes integrated with most shell packages.

Besides these three, there are other subjects that are being examined for suitability. In this paper, we will illustrate the use of the BASH program, which we are using versions 1.14.7 through 2.04. The rationale behind the selection of BASH as the subject for this study was based on its amenability to regression testing. Since subject preparation is expensive, BASH was selected because it was open source software developed by the GNU. In addition, most versions of BASH come with at least a partial test suite, which can significantly reduce the time and cost of subject preparation. BASH also met our requirement of being evolutionary software with multiple versions, which is how software is usually developed in the industry.

The characteristics of the BASH program make it a good subject for ensuring that performed experiments are repeatable and facilitate analysis (we will discuss this in the design section).

Preparing the subject, BASH in this case, includes developing test scripts, creating the function coverage tables, creating and locating of fault tables and any other data that we might be required for processing. The process of identifying the bugs in various versions of the software and comparing them against future versions for fixtures is a part of the subject preparation stage. The group follows similar rules of analysis, selection, and preparation for all the subjects in the experiment.

Design of the Study

The chief characteristic of an experiment is the approach taken to execute it. The design of a study deals with the method followed to experiment on the subject and the process to generate data. In effect, it is a detailed plan about how the subjects will be used to test the hypotheses.

One set of components described as part of the plan for creating the data is the independent and dependent variables. The authors in [8] define independent variables as attributes that define the study setting. Manipulating these independent variables can spawn significant amounts of data, which can help us to determine whether the hypotheses is verified or not. Dependent variables are defined as the results whose values vary predictably by manipulating the independent variables.

We begin by identifying the independent and dependent variables in our experiment. The dependent variables should lead us to solutions for our concrete questions by observing and analyzing their behavior. Our experiments are run by controlling the independent variables and measuring the effect on the dependent variables. We need to confirm that the changes observed in the dependent variables are indeed caused by regulating the independent variables. Therefore, we systematically "manipulate" the independent variables repeatedly to isolate the effect on the dependent variables. This process negates the effect of any circumstantial variables in the experiment that might effect the observed values of the dependent variables. Once we can ascertain the cause-and-effect relationship between the independent and dependent variables we suffice a fundamental assumption in our experiment (we will further examine this and other assumptions in the statistical data analysis section).

A study's design ties together the various entities that are involved in an experiment, like the hypotheses, subjects, and the metrics. For example, it might state how the subjects could be manipulated to produce data (metrics) that can answer our questions (hypothesis). The design also describes the environmental conditions needed to conduct the experiments. Perry, Porter and Votta [8] describe this as the physical, intellectual and cultural surroundings where the study takes place. This information is significant for better interpretation of the data when the concept is implemented in a different setting. The design also includes the resource constraints and risks that are inherent as part of the study. We will examine how the design principles of an empirical study are implemented in our running example.

Design of our study

In our study, the independent variables are subjects, versions, and test suites; the dependent variables are the percentage reduction in the size of the test suite and the percentage change in the number of defects uncovered.

We ensure that the test cases in each test suite (pertaining to each version) optimally verify the functionality of the version to which it belongs. We record the defects uncovered by the test suite. We repeat this process for all the versions so that we have coverage data for the whole subject. The next step is to run our textual differencing tool on consecutive versions of the source code. This identifies the functions in the subsequent version of the source code that have been added, modified, or deleted from the previous version. We enumerate the functions where changes occur and with the help of the coverage data, identify the test cases that validate these functions. We thus have a subset of the original test suite, which verifies the functionality of only those functions in the current version that have been added, modified or deleted since the previous version. The subset size of the original test suite is our first dependent variable. Our next step is to run this subset of the test suite on the original source code and verify the number of defects uncovered. This is our second dependent variable. Our objective is to keep a reduction in the number of defects uncovered to a minimum while trying to achieve a maximum percentage decrease in the size of the test suite. We will have achieved optimum efficiency in using this method when the number of defects uncovered remains the same for a reduction in the size of the test suite. We repeat this process for each version of the source code, and then for each subject. In this manner, we reach a stage where we have sets of values for the dependent variables across all our various subjects and constituent versions.

We adhere to a factorial design in testing our hypotheses. A factorial design implies that each independent variable is isolated and manipulated to measure its effect on the relevant variables. The following is a listing of the factors involved in our study.

  1. The type of program is the first factor since it can vary depending on the domain from which it is extracted, Different subjects from different domains can provide different values to the dependent variables. For a factorial approach to test our hypotheses, we ascertain the changes in the dependent variables across several versions of the program for several subjects.

  2. The changes in the source code between various versions are the second factor. The number of versions we have and the extent of changes between them maximize the generality of our results. Each version, depending on the magnitude of change from the previous version can give a different value to the dependent variables.

  3. Finally, the test suite itself is another factor - the difference in functionality of a version from a previous version also affects the various test cases that constitute the test suite. Consequently, the relative size of its selected subset as well as the relative number of defects uncovered differs with respect to the original test suite.

We apply these factors in various combinations as shown in the table below. For the sake of readability, only three subjects and two versions are tabulated. For a given version of the source code, the textual differencing technique is used to derive a subset TS1' of the original test suite TS1. The size of test suite TS1 and TS1', which may be the number of test cases in each test suite, is recorded for that version; this is our first dependent variable. TS1 and TS1' are each independently run on the given version of the program. The number of faults detected is observed; this is our second dependent variable. The experiment is repeated for each of Ver1, Ver2... Ver N. The observations are repeated for each subject, Sub 1, Sub 2, Sub 3... SubM. The measurements are analyzed as described in a subsequent section on Data analysis.

INDEPENDENT VARIABLES

Sub 1

Sub 2

Sub 3 ...

DEPENDENT VARIABLES

Ver 1

Ver 2 ...

Ver 1

Ver 2 ...

Ver 1

Ver 2 ...

TS1

TS1'

TS2

TS2'

TS1

TS1'

TS2

TS2'

TS1

TS1'

TS2

TS2'

Size of the test suite

OBSERVATIONS

Number of faults detected

Note: For each version Ver(N), TS(X)' is a subset of TS(X) selected by the textual differencing technique.

Tools

Tools are the means to conduct experiments. The strength of an implementation lies in the simplicity of the tools used so that simulation becomes a relatively easier task. A goal of an experimental study is to be able to easily reproduce the study under various subject domains. We facilitate this goal by including a comprehensive description of source and content of the tools used in our experiment.

Tools used in our study

We instrument the source code of each subject using the Clic tool [3]( the details can be found at the MAPSTEXT website) [6]. Clic is a software profiler that instruments the targeted source code, in order to provide an execution trace for each test suite. Several C programs and Shell scripts have been developed by the team members to process the trace reports and generate coverage information. These are used in combination with fault tables, which enumerate the faults uncovered in each version of the program, to determine the percentage change in test suite size and faults detected by the selected test suite.

The differencing part of the experimentation technique is based on a standard Unix utility 'diff', which is the used by the coverage tool. Most of the other processing is accomplished through the use of standard unix tools and utilities. All the details of the tools used in course of the experiments, such as the origins and versions, will be fully presented at the end of the study.

Statistical Data Analysis

Statistics provide an objective mechanism to analyze the data and produce conclusions. They provide guidelines to validate the reliability of the results by attaching a confidence level to our hypotheses.

We may analyze the data generated from our experiments quantitatively or qualitatively. When we perform quantitative analysis [12], given the fact that our experiments involve more than a couple of factors across our data samples we need to determine:

We should determine whether our data samples allow these assumptions in order to arrive at the significance of our results, which determines the confidence level of our results. This varies depending on the extent of data and the precision of measurements. Where data is plentiful and measurements are precise, we have a greater confidence level; on the other hand, limited data and imprecise measurements imply lower confidence levels on the data generated by the tests. It is difficult to determine exactly what level of confidence suffices to validate a hypothesis. Hence, it is best that our study just determines the confidence level and the arbitration on whether it validates our hypothesis be left to the reviewers.

Qualitative analysis uses untenable data. However, the sources of such data are usually the people who are involved, including their views and impressions about the study. These observations can prove to be very vital to the implementation of the concept in the industry. Perry, Porter and Votta [8] refer to a technique for qualitative analysis called the grounded theory proposed by Glasser and Strauss [4]. Glasser and Strauss point out, "In many instances, both forms of data are necessary -- not quantitative used to test qualitative, but both used as supplements, as mutual verification and, most important for us, as different forms of data on the same subject." The section below discusses how data analysis is performed in our example.

Data analysis techniques in our study

As per the definitions before, the independent variables in our study are:

  1. The distinct domains from which, our subjects (programs) are selected.
  2. The different versions and the changes that have occurred between them for each subject.
  3. The nature and composition of test cases that comprise each version's test suite.

By altering the independent variables, we observe the changes in the dependent variables. These are:

  1. The percentage change in the size of the test suite.
  2. The percentage change in the number of faults uncovered by the selected test suite

In other words, we have two basic measurements - one, the number of faults found in test suite execution before and after instrumentation of the source code; two, the number of test cases that are selected by the test suite after running our differencing tools on the instrumented version of the source code. We perform these measures for each of the following:

Our measures are thus the outcome of all possible combinations of the investigated factors. These measures are assimilated to generate information on function coverage. Assimilation of such information from experiments on all the subjects enables us to observe the dependent variables - the percentage reduction in test suite size and the percentage change in faults detected. From this data, we validate our assumptions for statistical significance. We use statistical packages like SYSTAT and SAS to help us with these computations.

The magnitude of statistical significance helps a reviewer validate our hypotheses - whether the technique of textual differencing enables reduction in the size of the test suite and while retaining the power of the test suite. The values of these dependent variables across various subjects determine whether the method is effective in general, irrespective of the domain from which the program originates.

Threats to Validity

The validity of our conclusions are influenced by factors that originate both from within and outside our study. Once we have performed the experiments and derived the data sets, drawing conclusions from them is dependent on whether we can validate our assumptions while analyzing the variance (ANOVA) of the results. Perry, Porter, and Votta [8] categorize these threats in experimental studies into construct, internal, and external validity.

Threats to the validity of our study

In our study, we determine whether the relation between the independent and dependent variables is a necessary and sufficient condition to validate our hypotheses. In other words, we seek to establish that experimenting with different subjects, numerous versions, and the many-selected test suites directly affects the size of the test suite while retaining its power. We can see that the later are in fact our hypotheses, and hence we may assume that our hypotheses are indeed validated by the action-reaction behavior of our independent and dependent variables.

Inherent in the construct validity of our study is another threat. We have to determine the confidence that we can state the reduction in size and preservation of the power of the test suite is a direct consequence of the controlled application of the independent variables. This is the internal validity of our study. The measures of normal distribution and variance across data sets should help us determine the extent to which these threats prevail.

The factorial design of our study should eliminate any concerns to the external validity of our study. First, we conduct this study across multiple subjects, thereby reducing the influence of any external environmental factors that pertain to the domain of our subject. Secondly, by using multiple versions of each subject, we should be taking into consideration the aging factor of source code. The variation in size and content of each complete test suite and its selected subset should also bring in a good measure of the heterogeneity.

Conclusions

The idea of experimental analysis is that our conclusions are validated by the analysis of observations we make. The significance of our results gives us a quantitative evaluation of our concept in practice. It offers a definite measure of the efficacy of our method as opposed to the abstract analysis and extrapolation involved in most studies without an empirical basis. A good logical analysis is also involved in making sense of all the data that we possess at the end of the study. However, the strength of our conclusions are predominantly determined by the results of our experiments.

It is true that many researchers fall short of providing meaningful conclusions to their experiments [1]. Wherever possible we should map conclusions to the real problems faced by practitioners so that the study does not just remain a concept on paper, but evolves into being practiced. Another reason for emphasizing lucid conclusions is that they should behave as inputs to subsequent studies in the same line. Subsequent researchers should be able to continue where our study leaves off and for this to be possible, our conclusions should clearly state what questions have been answered and what other questions still remain to be answered.

Though our example of the study on textual differencing is far from complete, we will briefly discuss possible conclusions.

Conclusions to our study

We have not yet arrived at a stage where it is possible for us to state the conclusions of this study. However, we hope that some or more of the following will be substantiated by the study.

  1. Depending on the significance of our results, we may be able to state that application of textual differencing as a selective regression testing technique will reduce the test cycle (the duration of regression testing after each release) while retaining the power of the test suite.

  2. We hope to give further strength to the utility of textual differencing in software regression testing; if presented with the tools and implementation details, it will be a great deal easier for implementation in the industry. A corollary to this conclusion is that we expect to show how standard UNIX tools such as 'diff' can be used in our studies.

Summary

The need to develop fail-safe and cost-effective computerized systems in the face of their increasing complexity propelled researchers to advocate the use of empirical studies in software engineering. The empirical approach offered the prospect of building a science of construction underneath the existing paradigms of software engineering, which would enable practitioners to determine what is possible and what is not, given the current state of technology. Among the strategies for empirical analysis, experimentation is the most complex and expensive, but a powerful approach. In this work, we revisited the structure of experimental studies in the context of regression testing. We illustrated our discussion by taking the example of a study on textual differencing technique of selective regression testing, which is underway at MAPSTEXT [6].

Acknowledgments

The research at MAPSTEXT formed a major content of this work. We are thankful to David Gable for his contribution to the experimental setup.

References

1
Anger F.D. Directions for the NSF Software Engineering and Languages Program. Software Engineering Notes vol.24 no.4, p.52, ACM SigSoft. July 1999.
2
Bible J., Rothermel G., And Rosenblum D.S. A Comparative Study of Coarse- and Fine-Grained Safe Regression Test Selection. Computer Science Department, Oregon State University, Technical Report 99-60-05, March 1999.
3
Elbaum S., Munson J., And Harrison M. Clic: A Tool for the Measurement of Software System Dynamics. (Clic Manual is at http://mapstext.unl.edu)
4
Glasser B. And Strauss A. The Discovery of grounded theory: Strategies for qualitative research. Chicago: Aldine Publishing, 1977.
5
Knight J.C. And Brilliant S.S. Report on Empirical Research in Software Engineering: A Workshop. National Science Foundation, The University of Maryland, and The University of Virginia, Greenbelt, MD, June 1998.
6
MAPSTEXT. Department of Computer Science and Engineering, University of Nebraska-Lincoln, Web: http://mapstext.unl.edu.
7
Onama K, Tsai W., Poonawaka M., And Saganuma H. Regression Testing in an Industrial Environment. Communications of the ACM, 41 (5), 81-86, May 1998.
8
Perry, E.D., Adam A.P, And Lawrence G.V. Empirical Studies of Software Engineering: A Roadmap. Future of Software Engineering, Limerick, Ireland, 2000.
9
President's Information Technology Advisory Committee. Information Technology Research: Investing in Our Future. National Coordination Office (NCO) for Information Technology Research & Development, February 1999.
10
Rothermel G. And Harrold M.J. A Safe, Efficient, Regression Test Selection Technique. ACM Transactions of Software Engineering and Methodology, 6 (2): 173-210, April 1997.
11
Rothermel G. And Harrold M.J. Empirical Studies of a Safe Test Selection Technique. IEEE Transactions of Software Engineering and Methodology, 24(6): 401-419, June 1998.
12
Statsoft Inc.ANOVA/MANOVA. StatSoft Inc. 1984.
13
Vokolos F.I. and Frankl P.G. A regression Test Selection Technique Based on Textual Differencing. Ph.D. dissertation, Polytechnic University, January 1998.
14
Vokolos F.I. and Frankl P.G.Pythia: A regression Test Selection tool based on Textual Differencing. Proceedings of the 3rd International Conference on Reliability, Quality & Safety of Software intensive Systems (ENCRESS '97), May 1997.
15
Vokolos F.I. and Frankl P.G.Empirical Evaluation of the Textual Differencing Regression Testing Technique. IEEE, 1998.




Biography

Praveen Kallakuri (praveen@unlserve.unl.edu) is a first year MS student in Department of Computer Science and Engineering at University of Nebraska-Lincoln. He has been in the industry for more than 2 years working on assignments in Satyam - General Electric Global Development Center, India http://www.satyam.com).

Sebastian Elbaum (elbaum@cse.unl.edu) is an assistant professor in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln. He has a PhD in Computer Science from the University of Idaho, Moscow, ID.


Want more articles about Software Engineering? Go to the index or to the next one.


Location: www.acm.org/crossroads/xrds7-4/empirical.html