Experiment Design

Achieve Your objective.

Definition

Most statistical experiments attempt to understand a measurable population by observing a number of entities from that population. On the other hand, simulation experiments attempt to understand a non-measurable population (e.g., a jury that has not yet been selected) by observing a number of entities from a measurable population that is thought to be similar (e.g., a focus group.

Experiment design is a plan to achieve an objective efficiently. Accuracy is an important part of achieving any objective. Accuracy is increased when bias is decreased and precision is increased.

In order to design an experiment, one must

  • determine the objective,

  • select the analysis,

  • define the response variable,

  • define the population,

  • select the design, and

  • determine the sample size.

A number of conditions, each having one or more levels or values, exist during an experiment. If only one condition affects the variable of interest (usually called the response variable), the design may be very simple. However, if a number of conditions affect the variable, the design can be quite complex. Letting the levels of a number of conditions vary simultaneously is desirable because doing so may reveal interactions between conditions.


1. Introduction

The design of an experiment should always be the first step in conducting the experiment. Without a proper design, the objective cannot be achieved. Before discussing experiment design, consider the preliminary notions of

  • inductive inference,

  • populations,

  • samples, and

  • the four principals of experiment design.

1.1 Inductive Inference

Drawing conclusions about the general (i.e., the population) from knowledge of the specific (i.e., a sample from the population) is called inductive inference. Even though inductive inference results in uncertain success, the methods of statistics allow us to measure the uncertainty and reduce it to a known, tolerable level. In contrast, deductive inference always yields a correct conclusion because the conclusion results from a chain of proved conclusions (e.g., mathematical theorems are proved by deductive inference.)

1.2 Populations

A population can be defined by the number of entities comprising it and by the characteristics or conditions of those entities.

1.2.1 Number of Entities

An entity is a single fact, trial, individual, etc. The population is the totality of entities of a specified type. There are two types of populations, depending upon the number of entities:

  • Real Populations. If the number of entities is finite, the population is said to be real. For example, the players on a basketball team are entities from a real population.

  • Hypothetical Populations. If the number of entities can be infinite, the population is said to be hypothetical. For example, free throw attempts by a basketball player are entities from a hypothetical population. The number of entities is said to be countable if they can correspond one-to-one to the natural numbers.

1.2.2 Characteristics or Conditions of Entities

An entity is defined by one or more characteristics or conditions. Each characteristic or condition can have one or more values (if it is quantitative) or it can have one or more levels (if it is qualitative). For example, an individual might be 35 years old (a value because it is quantitative) and Caucasian (a level because it is qualitative). (Henceforth, for simplicity, the word "conditions" will refer to either characteristics or conditions, and the word "levels" will refer to either values or levels.)

  • Fixed Conditions. If a condition has only one level, it is called fixed. If a condition can have more than one level, but has only one during the experiment, the condition is considered to be fixed.

  • Variable Conditions. If a condition has more than one level, it is called variable. It is the variable conditions of entities that provide an opportunity for design and analysis of an experiment.

    • Primary Variable Conditions. Primary variable conditions are conditions thought to have a direct effect upon the variable of interest, called the response variable. Primary variable conditions can be quantitative or qualitative. Controlled variable conditions have levels that can be controlled by the experimenter.

    • Extraneous Variable Conditions. Extraneous variable conditions are those whose effects are not of interest; they are considered to be "nuisance" variable conditions. "Time of day" and "day of week" are two examples of extraneous variables. Extraneous variable conditions can be controlled. ¹

1.3 Samples

A sample is a subset of a population. The number of entities in the sample is called the sample size. An outcome is the observed level of a trial (not a court trial). A test provides outcomes from a number of trials. Outcomes from a test can provide an estimate of the (true) population level. If one is to infer something about the population from a sample, care must be taken that the sample is from the target population, and it is a random sample:

  • Target Population. Entities in the sample should come from the intended (target) population. (Otherwise the conclusions drawn from the sample are representative of another population.)

  • Random Sample. The sample should be a random sample (i.e., a sample whose outcomes are independent):

    • For real (i.e., finite) populations, a random sample can be obtained only if the entities are replaced after each sampling.

    • For hypothetical (i.e., infinite) populations, a random sample can be obtained without replacement.

1.4 Principals of Experiment Design

Bias is the difference between an estimate (from a sample) and the true value (of a population). Precision is the measure of the closeness of the values in an estimate: Accuracy of an estimate is the bias plus the precision. The following four principals of experiment design minimize the inaccuracy of an estimate. The first principal tends to reduce bias, and the other three principals tend to reduce imprecision:

  • Randomization. Systematic errors cause bias. Randomization eliminates systematic errors, thus providing an unbiased estimate. (Mathematicians often use the word pseudorandom instead of random because randomization is theoretically not achievable.)

  • Replication. Random errors cause imprecision. Replication reduces imprecision. Precision is usually measured by the length of a confidence interval about the estimate for a specified confidence level (e.g., 95%). If trials are independent, imprecision decreases as
    n(-1/2) where n is the number of trials. However, imprecision is usually not decreased that much if trials are dependent - a condition that can exist if consecutive trials occur too close in time or space.

  • Blocking. If the trials are measured at multiple levels of a condition that produce similar results, precision can be increased by grouping trials. (Grouping trials from multiple levels is equivalent to increasing replication at any one of those levels.)

  • Balance. Balance is achieved by obtaining the same number of trials in each test. Balance tends to achieve the same precision for a comparison of, say, two conditions. Balance also helps determine the relative effects of different conditions upon the response variable.

2. Determine the Objective of the Experiment and Select Its Analysis

There are a number of possible objectives and analyses. The proper analysis depends upon the objective of the experiment. Following is a list of common experiment objectives, a list of recommended analyses, and a table showing the recommended analysis for each objective.

2.1 Common Experiment Objectives

The variable of interest, called a response variable, is analyzed to achieve the objective. There are five rather common objectives of an experiment.

  • Acceptance. An experiment for acceptance should be conducted only under the levels of the conditions that are expected to be typical. That is, the conditions are regarded as fixed. The experimenter selects a threshold as an acceptable level. This objective is appropriate

  • Characterization. A characterization experiment is conducted to estimate the response variable and its confidence limits. The observations can come from a single test or from multiple tests. In the latter case, the data are pooled, and the resulting estimate may be representative of a range of levels selected for pooling. Characterization could be an appropriate objective

    • to create profiles of potential jurors or

    • to characterize the probability of liability for any application.

  • Optimization. The objective of this experiment is to identify the optimum level of each condition for the response variable. For example, an optimization experiment would indicate the most favorable/unfavorable age of a prospective juror who would judge a certain type of crime. Tests for optimization are conducted under a combination of levels of conditions. Optimization could be an objective prior to a civil trial to select the optimum demeanor or behavior level of the counselor or a witness (as far as a jury would be concerned).

  • Selection. The selection experiment is usually conducted to choose between two or more populations, such as between two venues or two venires. Tests in this experiment are conducted under a selected combination of levels of variable conditions - the same for each sample. This analysis can

  • Verdict Prediction. Verdict prediction is an objective for court trials only. It is appropriate for

    • the appeal for a new trial,

2.2 Recommended Analyses

Although a great variety of analyses are possible, one of five analyses is recommended to achieve each objective:

  • Estimation. A response variable and its confidence limits can be estimated from either data from a single test (conducted under a single combinations of levels of variable conditions) or pooled data from multiple tests (each conducted under different combinations of levels of variable conditions). By virtue of the larger number of trials and the greater variety of conditions, pooled data usually provide a more precise estimate than do data from a single test.

  • Acceptance Test. The null hypothesis for an acceptance test states that the response variable is equal to an acceptable (threshold) value. An acceptance test determines whether this hypothesis is true. Because of sampling error, an interval of uncertainty exists about the acceptable value. The precision of the test is defined by the length of the interval and by the probability of making an incorrect decision when the response variable falls outside that interval. An acceptance test uses data from a single test.

  • Comparison Test. The null hypothesis for a comparison test states that the means (or proportions) of two response variables are equal. The comparison test determines whether the value of a response variable from one sample is significantly different from that of another. If there is a significant difference, the experimenter can identify the preferred sample. This test is a special case of the following test.

  • Test to Determine if a Variable Condition is a Factor. A population is defined by a number of fixed conditions and a number of variable conditions; the fixed conditions have one level, and the variable conditions have more than one level. Each test is conducted under a single combination of levels of variable conditions, and multiple tests are usually conducted under a set of combinations of levels of variable conditions. The null hypothesis for these tests states that the means (or proportions) of the multiple populations are equal. A test of this hypothesis can determine whether a variable condition is a factor for (i.e., affects) a response variable. That is, if the null hypothesis is not confirmed by a hypothesis test, the means (or proportions) are concluded to be not equal, and one or more variable conditions are concluded to be factors.

  • Prediction. A number of analyses are employed to predict the verdict of a court trial. The results of FactLogic provide data for these analyses. Prediction provides the probability that the verdict is

    • For the Defendant or For the Plaintiff in a civil case,

    • Guilty or Not Guilty in a criminal case,

    • True Bill or No Bill in a grand jury investigation, etc.

2.3 Select the Analysis to Meet the Objective

The analyses and objectives discussed in this section are not exhaustive: The objectives are common objectives, and the analyses are plausible analyses. The following table matches these objectives and analyses. The "x" indicates the analysis appropriate for each objective. After the analysis is selected, one can judiciously choose the number of levels of the variable conditions.

  ANALYSES
Estimation Acceptance Test Comparison Test Factor Test Prediction
COMMON
OBJECTIVES
Characterization X

 

     
Acceptance   X      
Selection     X    
Optimization       X  
Verdict Prediction         X


3. Select the Response Variable

The response variable is the variable chosen to determine if the objective has been achieved. Examples of response variables might be

  • the standard of proof (which is chosen by the evaluator in a criminal case),

  • the probability an assertion is true,

  • the difference between the probability an assertion is true and the standard of proof, etc.

4. Define the Population

The population is the set of all entities. The population is defined by the conditions and levels existing during the experiment, both fixed conditions and variable conditions. A fixed condition will either have no options or the experimenter will choose a single option (e.g., a single level, such as gender) for the entire experiment. Variable conditions have more than one level - and, hence, present an opportunity to design an experiment; the experiment is conducted over selected combinations of levels of the variable conditions.

5. Select the Design

The design must be selected to meet the specified precision, time, and budget. Following are some common designs:

  • Complete Randomization. Complete randomization is a simple design because it requires only that the tests (each conducted at different combinations of levels) occur in a random order.

  • Randomized Blocks. There are many types of randomized block designs. They range from one that determines if a single variable condition is a factor to one that determines if all tested variable conditions are factors. (For even three variable conditions, this design can be impractical to implement.)

  • Factorial Design. A (full) factorial design requires that a test be conducted for every combination of selected levels of variable conditions. This is the most desirable design, but it is often the most time consuming.

  • Fractional Factorial Design. This design is similar to the factorial design, but some combinations of levels are omitted.

  • Blocking Design. An extraneous variable condition is one whose effects are not of interest; it is considered to be a "nuisance" variable condition. Levels of extraneous variable conditions (such as "time of day") can be combined either randomly or systematically (i.e., in a block) to remove their effect.

  • Response Surface Design. A primary variable condition is one whose effect is of interest; it is thought to have a direct effect upon a response variable. If levels of primary variable conditions are quantitative, orthogonal composite designs can be used, and a regression surface can be fitted.

6. Determine the Sample Size

The sample size can be determined from time, budget, or specified precision. For most objectives, FactLogic can estimate both the absolute precision (i.e., measured by the difference between the true value and the estimate) and the relative precision (i.e., measured by the percentage difference between the true value and the estimate). FactLogic estimates the precision from a preliminary test using a focus group using, probably, the Internet. However, for the acceptance objective, the sample size is determined from an operating characteristic curve. Because a sample has a finite number of trials, an interval of uncertainty exists about an acceptable (threshold) level. The probability of committing two types of errors in acceptance tests can be measured and reduced by using a larger sample:

  • Probability of Committing a Type I Error. The probability of rejecting the value of a response variable which is totally satisfactory is to be, say, α= 5% or less.

  • Probability of Committing a Type II Error. The probability of accepting the value of a response variable that is totally unsatisfactory is ß. This probability is reduced if the sample size is sufficiently large.



Summary

An experiment must be well designed in order to achieve its objective.

Drawing conclusions about the general (i.e., the population) from knowledge of the specific (i.e., a sample from the population) is called inductive inference. Even though inductive inference results in uncertain success, the methods of statistics allow us to measure the uncertainty and reduce it to a known, tolerable level. The four principals of experiment design help reduce uncertainty. They are

  • randomization,

  • replication,

  • blocking, and

  • balance.

A number of entities might affect the variable of interest (called the response variable). Each entity has a number of characteristics or conditions and each characteristic or condition has one or more values or levels. When these values or levels are varied and the response variable is observed for each variation, we can determine which, if any values or levels affect the response variable. In order to design an experiment, one must

  • determine the objective,

  • select the analysis,

  • define the response variable,

  • define the population,

  • select the design, and

  • determine the sample size.


Footnotes

¹Of course, time itself cannot affect an experiment, but events occurring during time may. Listing time of day and day of week as conditions is a tacit admission that unknown or unknowable events may occur during time that could affect the experiment.


Return to previous page Go to top of page Go to glossary


© Convex Corporation 1999 - 2005