Following up on this post, see this Seattle Times article. Michael McDonald also questions whether the GOP-expert statistical report has failed to include measures of statistical uncertainty in the calculations. UPDATE: McDonald provides more information here. Update 2:: You can find the Katz and Gill reports posted here. FINAL UPDATE: Michael McDonald has now analyzed the Katz report and sends along the following analysis:
- I have reviewed the method used by Jonathan Katz. I am glad to report that he did indeed calculate a measure of uncertainty for his estimate presented
on p.8 of his report. However, I believe that he used the wrong measure of uncertainty that greatly understates the uncertainty of his estimates, as I
will elaborate. The correct measure of uncertainty throws the validity of
his conclusions into doubt. As Dr. Katz mentions, it is unfair to attack
his work when he can’t respond because of legal considerations. He may have
a reasonable explanation why he used his approach. So, please keep this in
mind until he has had a chance to defend his work. However, I note that all
I present below is elementary statistics that can be found in any
introductory statistics book.
(you can see the remainder of the analysis by clicking the link below)
The task here is to estimate how many ‘improper votes’ each of the
candidates received so that the election outcome could be known if these
votes were removed. The most appropriate method is to assume the ‘improper
votes’ are random draws from the population of voters, much like as is done
in polling. In the familiar framework of polling, the best guess, or
expected value, of the percentage of support for a candidate is the
percentage from the sample. However, because of the error that occurs from
the random draws from a distribution, the value will be in error, what we
familiarly know as the margin of error of the poll.
The method of random sampling can be used in this situation, where we might
think of the ‘improper votes’ as draws from the population of voters. The
only significant difference from the polling framework in this case is that
the true percentage support for a candidate within a jurisdiction is known:
it is the election result. However, we can still apply the same formula to
describe the margin of error of a poll in this situation (in statistical
lingo, we use p in the place of p-hat).
Here is an the example of how to calculate the margin of error around the
expected value drawn from p.7 of Dr. Katz’s report, where 157 ‘invalid
votes’ are alleged to have occurred among the 57.7% vote for Gregoire in
King County.
Step 1) 157 ‘invalid votes’ were alleged to have been cast by felons in King
County, where Gregoire won 57% of the vote. The expected votes for Gregoire
is Np = 0.577*157 = 90.59 (Dr. Katz finds the same number).
Step 2) The standard error of the sample of ‘invalid votes’ (from any
introductory statistics book) is: s.e. = [p(1-p)/N]^.5 =
[(.577)*(.423)/157]^.5 = .039 (unreported by Dr. Katz).
Step 3) The 95% confidence interval, or margin of error = s.e.*1.96 or