Journal Detail back to listing
Sampling and Extrapolation in Legal Cases
- May 22, 2024
- Latest Journal
An Expert’s view by James Rothman M.A. FMRS FRSS MAE
The Importance of Sampling
Negligence and other cases often concern extremely large amounts of evidence. This places a burden on lawyers and storage can inflate costs. At the time of writing, £6.7m is being spent to store a large quantity of unusable PPE equipment because it might be needed as evidence in a PPE case 1. In other cases, such as those in the Technology and Construction Court, a single case may concern so many separate events, such as the issuance of variation orders, that they cannot be considered individually. This is especially likely to happen when the cost of investigating, or proving a claim on each event is high. Sampling is also relevant in cases concerning ship cargoes and class actions but these are outside the scope of this article.
The obvious method of dealing with problems like this is to select a sample, i.e. to choose a smaller number to represent the whole body of evidence or claims. The Courts are willing to accept evidence derived from samples but are likely to insist that the sample can be proved to be representative and that the extrapolation produces fair and reasonably accurate estimates. The Technology and Construction Court Guide recommends that careful thought should be given to ‘any appropriate ... sampling … that could be undertaken jointly or in collaboration with other experts.’ It also recommends a meeting of experts to agree a protocol. The Court could appoint a single joint expert to organize sampling in suitable cases.
Sampling and extrapolation are not as simple as they might appear. Unless proper procedures are used, the sample will be biased or subject to too much error to be useful. If the procedures used are not properly documented time and cost will be wasted in argument. Even if the evidence is not rejected a case can be lost or a claim reduced if the Court finds sample evidence to be unsatisfactory. Examples are:
• In Amey v Cumbria CC The Judge ruled that the claim could not succeed because Cumbria had failed to satisfy him that results from the sample could be extrapolated to the overall population of repairs, (para 26.49).
• In Imperial Chemical Industries v Merit Merrell Technology the Judge decided that a sample of 412 welds was not representative (Para 72) and reduced the claimed weld error rate from 38% to 5%, (para 160).
• In Building Design Partnership v Standard Life Assurance, an extrapolated claim was reduced from over £20m to less than £1.3m by orders made at the case management stage, (Appeal Judgement para 96).
• In Interflora v Marks and Spencer plc, not only was a sample survey rejected by the Court because it was not statistically reliable, but witness statements from individual respondents that supported the claimant’s case were also rejected2. The guidelines set out by Whitford J in Imperial Group v Phillip Morris (1984) apply to samples in cases like this.
I note that The Building Design Partnership v Standard Life Assurance case has established the principle that samples could be used to plead a claim as well as to prove it or demonstrate its size 3.
Amey v. Cumbria Principles
The Amey case judgement lists key principles relating to sampling that had been agreed by the two experts on each side. These can be summarized as follows:
1. A sample can provide a practical and efficient means to collect data, but the sample must be representative of the population.
2. The population and the unit of analysis need to be defined.
3. A sampling frame needs to be identified or prepared. It need not be identical to the population. (Although this was not stated, my opinion is that differences between the sampling frame and the population need to be discussed).
4. Measurements made on each element in the sample should follow a reliable protocol. (It seems to me that this is true of any claim rather than being specific to a sample one)
5. Data from the sample must be properly analysed to enable extrapolation.
6. Probability and non- probability samples can be used. The better method is a probability sample in which each unit is selected at random with a known probability.
Technical Terms
Before discussing these principles it is helpful to describe some terms:
The first step in sampling is to decide what is to be sampled. These are termed the elementary units or units. These units may differ in size or importance. If so it may be advantageous to arrange for the more important ones to be sampled with a higher probability. Account will need to be taken of this when calculating the results.
A sample is a set of units drawn from a population by a process that enables it to represent that population. Samples are usually drawn in order to make an estimate, e.g. the proportion of defects or the value of the work done. The word ‘sample’ is also sometimes used in place of ‘example’, e.g. when parties choose a set of units to demonstrate their case. Examples like this cannot be used forextrapolation. As His Honour Stephen Davies points out4 ‘the results will be skewed towards the outliers at both ends’ and the parties may well be required to agree a random sample in addition.
A probability sample is one in which the probability of any given member being selected is known. Sample members can have different probabilities of selection providing the estimates take account of this. The most common form of adjustment is by weighting 5.
A non-probability sample is any other type of sample. If a non-probability sample is to be used for extrapolation, it needs to be representative6 and it should be possible to make reasonable judgements about the probability of selection of each sample member and to take these into account.
If a number of samples were all taken by the same method the results would differ through chance. The expected value is the average value that would be obtained from all these samples after suitable adjustments for differing probabilities of selection have been made. The standard or sampling error is a measure of the extent to which results produced from a sample, might be expected to differ from the expected value. It is possible to estimate this from a single sample by using statistical methods.
In a perfect probability sample the expected value will be the same as the population value so sampling error can be used to measure the accuracy of population estimates. This cannot be assumed if there is non-response 7 or a non-probability sample is used because there is a risk of bias error on top of the sampling error. It is possible to design and analyse non-probability samples to reduce this bias error and to provide evidence that it is likely to be small. I will outline some of these later in this article.
Bias error can be categorised as suspected, discovered or unknown. Suspected bias exists when there are reasons to suppose that some unit types are more or less likely to be selected than others, perhaps because they are easy or difficult to access.
On the other hand unexpected differences between the sample and the population may only be found when the two are compared. If the sample shows that these are correlated with a relevant aspect of the evidence then this indicates discovered bias.
Even if suspected or discovered bias is eliminated, certain types of units may be over or under represented in ways not covered by the analysis. This produces unknown bias error 8, which reduces the accuracy of estimates by more than that indicated by the sampling error. Experts may have opinions on the extent of this increase and may make an allowance for it, but they cannot calculate it exactly. I expect that this is why the experts in Amey v. Cumbria stated that ‘non-probability samples cannot depend upon the rationale of probability theory to calculate confidence levels and margins of error’.
Selecting the Sample
When selecting a sample, it is tempting to choose the items that come first to hand, or to select the ones that most clearly demonstrate the case. Samples, which have been chosen in this way, risk being rejected by the Court if it is thought that they are likely to be biased. A probability sample avoids this problem. In some cases it is not possible to draw a probability sample and in others a non-probability one has been mistakenly drawn before an expert has been consulted. In these circumstances special forms of analysis need to be used to ensure that results approximate as closely as possible to those that would have been obtained if a probability sample had been used.
The first step, when sampling, is to find out what information is available about the population. This may include both where and how each unit is stored and also how it is documented 9. Documentation listing each unit in the population to be sampled is referred to as the sampling frame. It is useful if this frame contains as much potentially relevant information about each unit as possible. This will include both descriptive information such as the size of the unit and information about any known claims relating to it. The extent to which the relative importance of units of different type, location etc. varies is of particular relevance. Sampling experts will need to analyse this information to assist them in selecting the sample and interpreting the results. They will use this to estimate the likely sampling error for different sizes of sample after taking account of the variation between units and to consider how to reduce it.
The choice of sample size is based on the degree of accuracy required and the expected cost of selecting and categorising each individual unit. If the population contains a mixture of different types of unit, it may also be necessary to ensure that the numbers of each type of unit 10 is adequate. In legal cases involving the sampling of things rather than people, normal sample costs are likely to be small in comparison with the other costs of the case. Consequently there is seldom need to use techniques like cluster or multi-stage sampling to reduce the cost per unit. However substantial legal time can be spent in agreeing how each individual unit in the sample should be categorised, e.g. whether it contains a fault and if so whether it supports a claim for damages. This argues against unnecessarily large samples. The statistician advises on the likely accuracy of different sample sizes but the final decision rest with the parties involved.
It is worth remembering that accuracy increases with the square root of the sample size, e.g. quadrupling the first stage sample size, whilst otherwise keeping to the same sample design, will double the accuracy, i.e. halve the error. However, the way that the sample is chosen will also have an effect on its accuracy. In extreme cases a large badly designed random sample can be less accurate than a smaller well designed one.
When dealing with physical evidence, it may also be useful to visit the site(s) where it is located to verify that the proposed sampling procedure is practical. Evidence on the ground is not necessarily stored exactly in the manner described. When the evidence is physical, the method used to select the sample may depend on the way that the evidence is stored. A procedure for drawing the sample needs to be designed and documented so that it is correctly implemented by all those involved.
Wherever possible a random procedure should be used to draw the sample to ensure that it is unbiased. This can be done by numbering the units in the population and drawing random numbers to choose the ones that should be selected. Equal interval sampling from a randomly chosen starting point can also be used. This not only makes selection easier it also can improve the accuracy of the results by making the sample more representative. If it is necessary to select a non-probability sample the procedure should be as blind as possible to whatever measurements are relevant to the case. In addition it may be helpful to set quotas for certain types of unit to avoid suspected bias.
In either case it is prudent to analyse the selected sample to make sure that it has been selected correctly.
Extrapolation
Extrapolation refers to the production of population estimates from a sample together with an estimate of their accuracy based on their sampling error with a discussion of the possibility of unknown bias. The extrapolation report should also cover investigations into suspected and discovered bias. The procedure is as follows:
Initial population estimates should be made from the sample by taking into account any differences in the known probability of selection of the units. These should include descriptive information as well as that directly concerning the claim. The existence of discovered bias should be checked by comparing this descriptive information with that for the population or sampling frame. In a probability sample these estimates can be used directly to make an extrapolation, if there is no discovered bias. If there is such bias, the sample should be treated as if it were a non probability one.
If a non-probability sample is used or bias is discovered in a probability one 11, the sample should be analysed to see whether the estimates are likely to have been affected. The sample should then be re-weighted to remove these discrepancies and make an extrapolation. It may be helpful to test the procedure for robustness by employing alternative weighting schemes to show how far results could have been influenced by the failure of the sample to match the population 12.
Conclusion
With suitable analyses, it is possible to make an extrapolation from almost any sample but the procedure is more straightforward and the extrapolation is more likely to be accepted by the Court if the sampling procedure has been carried out correctly from the outset.
Notes and References
1 Times 23/12/23 ‘Inside the Michelle Mone PPE scandal
2 Proof by Sampling after Amey LG v Cumbria County Council, Frances Pigott, July 2017, Society of Construction Law.
3 Pleasing Claims by Sampling and Extrapolation, Glen Haley & Horace Pang, March 2022, Bryan Cave, Leighton Paisner.
4 Building a Claim by Extrapolation – The legal and evidential blocks, His Honour Judge Stephen Davies, June 202, Society of Construction Law.
5 i.e. multiplying the results for each sample member by a factor proportional to the reciprocal of its probability of selection.
6 Proof by Sampling after Amey LG v Cumbria County Council, Frances Pigott, July 2017, Society of Construction Law
7 i.e. some of the sample units are omitted or not fully measured, e.g. in Amey some patches could not be located. (Proof by Sampling after Amey LG v Cumbria County Council, Frances Pigott, July 2017, Society of Construction Law).
8 By definition, the expected value of unknown bias error is zero
9 At this stage it may be agreed to exclude some units from the sample because their difficulty of access is excessive in relation to their importance. If this is done it should, of course, be reported.
10 The sampling expert can advise on how this can be achieved through weighting, i.e. varying the probability level by type of unit and adjusting for this in the subsequent analysis, and stratification to ensure that the number of units of particularly relevant types is correct.
11 e.g. because some units were missed.
12 It should be remembered that the sampling error of the estimates and the greater the range of weights used the greater will be the increase in sampling error. It may be possible to reduce this increase in sampling error by using other statistical methods.
About the Author:
James Rothman M.A. FMRS FRSS MA
James Rothman is an independent marketing research and statistical consultant who can provide unbiased and objective assistance in a wide range of fields. He is experienced in acting as an expert witness and has given evidence in Court and undergone Single Joint Expert training. He is able to advise on quantitative or numerical evidence of all types apart from accountancy.
He has acted as an expert witness or advisor in cases concerning: Passing Off, Licensing Applications, Advertising Claims , Probability and Coincidences, Sampling and interpreting large volumes of evidence, Incomplete Records, Estimation of Damages and Sample Surveys.
He is a Fellow of the Royal Statistical Society (Chartered Statistician), a Fellow and Honorary Member of the Market Research Society (Dip MRS), a full member of the Academy of Experts, and a member of the Operational Research Society.
He chaired the Panel of Judges for the Market Research Society Awards and was Joint Editor of the Journal of the Market Research Society. James Rothman holds the Thomson Gold Medal and Award for Media Research, the Coglan Award and two Market Research Society Gold Medals.
Contact
James Rothman
Email: jamesr@rothmancompany.com
Tel: +44 (0)207 5862925.