How accurate are polls at predicting a winner? Not too. So long as a candidate is within 10 points, most polls shouldn't be readily relied on as predictors for who will win. Charles Franklin, a political science professor at the University of Wisconsin has an interesting post today about just how important the "margin of error" really is.

On a graph, Franklin compares poll results with actual election results, resulting in several observations, one of which is the importance of realizing that polls cannot reliably predict races that are less than 10 points apart.

One interesting feature is that a margin of zero (a tied poll) produces

a 50-50 split in wins with remarkable accuracy. There is nothing I did

statistically to force the black trend line to go through the

"crosshairs" at the (0, .5) point in the graph, but it comes awfully

close. So a tied poll really does predict a coin-flip outcome.The

probability of a win rises or falls rapidly as the polls move away from

a margin of zero. By the time we see a 10 point lead in the poll for

the Dem, about 90% of the Dems win. When we see a 10 point margin for

the Rep, about 90% of Reps win. That symmetry is also not something I

forced with the statistics-- it represents the simple and symmetric

pattern in the data.More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.

A

5 point lead, on the other hand, turns out to be right only about

60-65% of the time. So bet on a candidate with a 5 point lead, but

don't give odds. And for 1 or 2 point leads (as in some of our closer

races tomorrow) the polls are only barely better than 50% right in

picking the winner. That should be a sobering thought to those enthused

by a narrow lead in the polls. Quite a few of those "leaders" will

lose. Of course, an equal proportion of those trailing in the polls

will win.So read the polls-- they are a lot better than

nothing. But don't take that 2 point lead to the bank. That is a

failure to appreciate the practical consequences of the margin for

error.

The parties themselves are also the biggest indicator of which seats are competitive. If you look at it as detailed as Jay Cost does here, a picture emerges, not of a Democratic sweep but of uncertainty:

Absent reliable polling in each district, I would say that these 35

seats, plus the 4 seats that it fails to capture, are the real

battlefield. That would mean that 37 Republican seats and 4 Democratic

seats are, in one way or another, up for grabs.This might seem like a lot for the Republicans to defend, and from a

certain perspective it is. However, whether or not the Democrats pick

up control of the House by plucking a net of 15 of these districts

really depends upon the probability of flipping we assign to each race.

If, for instance, both parties have an equal shot in every seat, we

should expect the Democrats to net 16 to 17 seats - and the Democrats

have a 73% chance of taking the House. If the Democrats have a 40%

chance in every seat, we should expect them to net 12 to 13 seats - and

they have a 25% chance of taking the House. If the Democrats have a 60%

chance in every seat, we should expect them to net 20 to 21 seats - and

they have a 97% chance of taking the House.This is the main reason I am skeptical of the "wave," i.e. a net of

25 or more for the Democrats. Even if we give the Democrats 2/1 odds in

each contest, with this battlefield there is still only a 35% chance

that they net 25 or more seats.What are the true probabilities for these races? I honestly have not

the foggiest idea beyond some basic intuitions (e.g. the GOP has a less

than 50% chance in at least 6 to 7 seats). Of course, lots and lots

(and lots and lots) of people are ready and willing to assign very

specific probabilities to these races. But are theyable? What are the data points we should use in such an endeavor?Should we use House polls from companies we have never heard of, who

are obviously pushing polls to drum up business for themselves after

the election, who use samples that have strange origins, who use

methods that are unpublished and probably underdetermined, who publish

results that contradict other polls?Should we use the rumors and innuendos we happen to stumble upon,

the inside gossip to which almost all of us are not privy, and to which

- if we are privy - we hear third, fourth, or fifth hand?Should we use a favored set of anecdotes, interesting stories told

by local news outlets on a given horse race that capture our attention,

even if its actual effect on the race is undeterminable?Assuming we

canuseanyof this data - a condition that I think remains unfulfilled - another question presents itself:how

do we use this data to assign odds? What weight should we give each

data point? I honestly have no clue. It seems to me to be quite easy,

in almost all districts, to write a storyline, a believable storyline,

that favors one side over the other - and from that assign a

probability. But, at the end of the day, what is the data that is

inducing this assignment? It seems to me that it is is just a set of

questionable polls, unfounded rumors and potentially irrelevant

anecdotes to which we have, without any real justification, assigned

determinative weight.

Ultimately, only one poll counts in the end. So don't get swayed into not doing so.