Good afternoon, sports fans – I’m going to start out by saying that the whole ‘entry every two days’ thing isn’t going to continue all summer, so if you’re getting tired of reading a novel every couple days, never fear, this rate of posting will only continue through shortly after the season ends. I’ll probably settle into a once- or twice-a-week schedule over the summer, depending on how long my ideas for analysis hold out.
Speaking of which, does anyone know the plural for ‘analysis’? Analysises? Analyses? Analysi?
Today we’re going to look at one of the two things I previewed in the last Box Score Analysis entry – how quarter differentials correlate to wins. Essentially, we’re asking the question “how often did a team winning by X points after the first quarter go on to win the game?” for every possible value of X (and for every quarter and half).
In this case, I’ll be splitting the analysis in half. Unknown to me when I set out on this part of this research, there are a lot of conclusions, some far more important than others. Putting them all in one entry would dilute the impact of the more meaningful ones, so in this entry we’ll be covering the less impactful (though still interesting) ones. Next entry we’ll cover the real heavy-hitters. So today we want to see if there’s a certain time when the probability of winning drastically increases – for example, how much more likely to win is a team leading by 7 at halftime compared to a team leading by 5? Is it significant at all?
Unlike the last entry, I’m going to spend a good bit less time covering the statistical reasoning behind the conclusions and more time covering the conclusions themselves. If you want to see the proof behind the numbers, by all means let me know and I’d be glad to send it to you; or, you can run the numbers yourself: I’m posting the data sheet that’s being used to derive all this information right here.
Statistical Significance Overview
But let me start by going back to that pesky ‘statistical significance’ idea (which, if you understand already, jump ahead three paragraphs). Again, the upcoming ‘Stats Primer for a Sports Fan’ will detail what statistical significance is, but basically if something doesn’t have it, it’s not proven. A stat is ‘statistically significant’, by definition, if it is very unlikely to have simply happened by chance. For example, if a player is listed as a 60% free-throw shooter and misses three times out of three free-throw attempts, that’s not statistically significant enough to make us doubt that he’s really a 60% shooter (because statistically there was a 1-in-20 chance he’d miss all three). But, if a player is listed as a 95% free-throw shooter and misses three straight, that’s pretty significant because it’s unlikely that a shooter who was really that good would miss three out of three (statistically, it’s about a 1-in-10,000 chance). (Important note: we’re saying this as if we only observed the shooter taking three free-throws. The best free-throw shooter in the world will miss three straight at some point in his career – but what are the odds that the specific time we say ‘hey, take three free-throws’ and observe only those three that he misses all three?)
Statistical significance is thrown around a lot because it’s a pretty general term, but here we’re going to mainly use it when talking about comparing two statistics. For example, Peja Stojakovic shot 92.9% from the free-throw line this year, and Dirk Nowitzki shot 87.9%. Is that difference statistically significant? If so, we can say that there’s statistical proof that Stojakovic was a better free-throw shooter than Nowitzki this year; but if not, we can’t conclusively assert that (incidentally, it’s not statistically significant, although the difference between Chauncey Billups shooting 91.8% and Dirk is significant even though Chauncey shot worse than Stojakovic. See why we call it ‘Little White Statistics’?).
And a final note: when we refer to ‘confidence’ in terms of statistical significance, it means something pretty simple: basically, we can that confident that the observed results come from an actual difference, rather than just a random sampling error. So basically, when we say “we can conclude this at 95% confidence”, it means we’re 95% sure what we’re concluding is true.
Alright, enough fluff. The reason I bring up statistical significance is because this analysis really depends on it to make any kind of conclusions. But before we get to the takeaways, a brief background:
This portion of the study was completed by taking all the box scores from the 2007-2008 NBA regular season, computing the quarter/half differentials for each quarter (with respect to the home team, so a negative differential means the away team outscored the home team), and then looking at how many wins and losses each quarter/half differential led to. Then, we did our correlation voo-doo magic to see what increase in win percentage each point added to the differential gave. And finally, we looked to see if any of that crap was statistically significant. And if you really want to see the numbers, I can show them to you – but I’d recommend taking my word on it. If I was making stuff up, I’d make up far more conclusions than this.
And with that, on to the results, subdivided into topics for your reading convenience:
The Halftime Differential
Let’s lead off with something bizarre. In the 2007-08 season, what halftime differential from leading-by-5 to trailing-by-5 was most likely to lead to a home team victory? Leading-by-5? No – within that range, the home team won most often (over that margin) when they were trailing by three points at halftime. This season, the home team trailing at halftime by 3 points won a bizarre 75.7% of their games (28 out of 37), compared to about 65% from margins +1 to +5, and around 55% from -1 to -2. That’s statistically significant at 95% confidence compared to differentials -2 through 1, but not statistically significant compared to 2 and higher.
Similarly bizarre, in games that were tied at halftime, the home team actually lost more often than they won – the home team won only 46% of games that were tied at halftime (24 out of 52). That’s not statistically significant compared to most negative differentials, but it is compared to that -3 halftime differential (at an excessively high confidence level, too).
So is the home team really more likely to win when they’re down by 3 at halftime than if they’re tied? I’m taking this conclusion with a grain of salt. 95% confidence is a high level, but statistically that means that for every 20 conclusions you make at 95% confidence, one will likely be wrong. I have a feeling this might be that one – but fortunately, this topic is very easy for further research (which I’ll mention later). And yes, in case you’re keeping score at home, we just used statistics to analyze statistics. To be specific, we statistically proved that statistics aren’t always reliable. But is that a reliable conclusion? And with that, this blog disappeared in a puff of logic.
But by that same token, we’re not talking 95% confidence in this statistic. According to the numbers, we can (apparently, note I’m still as skeptical as you) assume a 3-point halftime deficit leads to more home team wins (than a halftime tie) with a remarkable 99.7% confidence. So either I completely screwed up the math somewhere, or we’re on to something (if anyone’s skeptical enough to check my math, we have a proportion of .757 with 37 samples and a proportion of .462 with 52 samples). But I’m still skeptical, so this will definitely be one of the items touched on when we re-do certain parts of this analysis for all the games over the past ten years (oops, gave away the ending).
I should also note I’m not implying any causation here – I’m certainly not saying it’s wise for a home team to drop down 3 points before halftime. What we’re looking at here are measures that predict what would happen anyway. We aren’t saying that trailing by three at halftime leads to a win – what we’re saying is that the conditions that lead to a 3-point halftime deficit also lead to a victory by the end of the game.
The team leading at the end of three quarters was always more likely to win this season, regardless of whether they were home or away, and regardless of the differential. Away teams leading by as little as one point after three quarters won 61.5% of the time, while the home team leading by as little as one point won 54.7% of the time. The difference in the winner is certainly statistically significant (at 94% confidence).
Also interesting (and touched on more in the next analysis) is that once you get to a meager 4-point lead going into the fourth quarter, your victory percentage is sky-high – 75% for the home team, 71% for the away at a 4-point differential, and the percentages only get higher from there.
There’s absolutely no way to phrase this section title that completely prevents any possible puns.
At the beginning, we said we wanted to see if there’s a certain differential in each quarter/half that signifies greatly increased odds of a win. And, as it turns out, one does appear. Analyzing statistical significance here is difficult (because we’d have to compare every pair of differentials’ winning percentages over a large range, for each of the seven time periods), but just some random sampling (yes, now we’re randomly sampling our statistics) for statistical significance revealed these are likely significant at the 90% confidence level, at the least.
- 1st Quarter: Home: 2; Away: 6
- 2nd Quarter: Home: 4; Away: 6
- 3rd Quarter: Home: 5; Away: 5
- 4th Quarter: Home: 3; Away: 7
- First Half: Home: -3; Away: 5
- Second Half: Home: 1; Away: 4
- Through-3: Home: 2; Away: 1
There’s some pretty interesting stuff in there, believe it or not. In most cases, those point differentials correspond to a point at which teams become around 20% more likely to win the game, and sustain that increased win percentage over higher differentials. There’s a couple notable items in this:
- First of all, it’s pretty notable how much less the home team needs to do to raise their win percentage. In most cases, a differential of -2 (the away team leading by 2) is what corresponds to an even winning percentage between the two teams.
- Even more notable is that the home team still has a strong chance of winning as long as they’re losing by 3 or less points at the end of the first half. We covered in great length the fact that a 3-point halftime deficit this season still resulted in a winning record for the home team – but after 3, the drop is significant – trailing by four only brings victory 41% of the time, and the ratio decreases steadily after that. And, conveniently, the different between -3 and -4 is statistically significant, adding to the intrigue of the -3 differential.
- We mentioned this earlier, but it’s also notable how delicate the through-three differential is – one 3-pointer drastically changes the odds of victory from the home team’s favor (70% when winning by 2 entering the fourth) to the away team’s (62%), a pretty ridiculous 32% swing.
As I said above, no causation is implied here; I’m not trying to say that the act of winning the first quarter by 2 points causes the home team to be substantially more likely to win. Instead, I’m suggesting that whatever causes the home team to be up by 2 or more also causes the home team to eventually win the game. Leading by those differentials is a sign that they stand a good chance of winning the game – not the reason they do.
Like last time, I ran a regression analysis, seeking a correlation between differential (for each quarter and half) and winning percentage.
There is one – an incredibly strong one. The second, third and fourth quarter differentials each correlate incredibly strongly to winning percentage (the first quarter differential correlates as well, but not quite as strongly – R=.9 for the first quarter whereas R=.94 for two, three and four). What this means is basically, outscoring your opponents by more points during a certain period of time does raise your chance of winning. We’re really uncovering deep, hidden secrets now, aren’t we? I think we just statistically proved that you win a game by outscoring your opponent. Groundbreaking, absolutely groundbreaking.
The slopes of these regression lines border on relevant, though. The quarter regressions all hold slopes of roughly .023, implying that for every point added to the differential, winning percentage increases by .023. To put that in terms that make sense, it means statistically if a team outscores its opponent by 5 points in the second quarter in every game, they’ll likely win two more games (over a season) than if they outscored their opponent by only 4 points in those quarters.
More relevantly, that means that if a team raises its average differential in one quarter by 1 point, it’ll average 2 more wins over an 82-game season. For a long-term coach, that’s a great goal. Raise it by 1 point per quarter and that’s possibly an 8 game improvement. That might sound drastic, but consider how strong a 4-point average differential difference makes in the league – in 2007-08, a 4-point difference is what separated the Jazz and the Raptors.
And beyond all of the above, there are a few things in this analysis that I just find flat-out interesting. There’s no statistical relevance to any of them, but they’re interesting observations.
- No home team recovered from being down 16, 17 or 19 points after one quarter (total of eight occurences), but two of the three home teams down 20 after the first recovered: Minnesota against Indiana and Phoenix against Seattle. Minnesota completely erased the 20-point deficit and led at halftime by 1, whereas Phoenix trailed by only 2.
- The home team actually held a winning record when being outscored by 10 in the second quarter, or by 6, 7 or 10 in the third. They did not, however, hold a winning record when being outscored by anything more than 4 in the first quarter, or 5 in the fourth.
- The lowest quarter differential to yield a 100% winning percentage was 13, when scored in the fourth quarter by the home team. The away team required a 16-point quarter-differential, but could have it occur in either the first or third quarters.
- The away team won 3 times when being outscored by 19 points in the second half, but never won when being outscored by more than 16 unless it was 19.
One of the things I plan to do later in the summer is re-hash the more ‘controversial’ or ‘fuzzy’ conclusions from this analysis by expanding the sample pool ten-fold and looking at the statistics for every game over the past ten years. If the conclusion on halftime differential holds up then, it’ll be only a one in a trillion (in other words, impossible) chance that it’s by coincidence.
I think that’s about all the information I can beat out of this data without stepping into the second half of our analysis. If anyone has any other questions that might be answered by this data, feel free to e-mail me at the heavily disguised e-mail address on the left. Wait until tomorrow though, since I’m only half-done with this portion of the analysis. Now, on to the takeaways.
LITTLE WHITE TAKEAWAYS
So, in this analysis, we looked at how often each point differential led to a win (or, more specifically, the winning percentage associated with each point differential). As always, teams were separated by location, since it’s been thoroughly discovered that differential trends are very different between home and away teams.
- Halftime Differential: Crazy stuff – I recommend reading this part regardless of your knowledge or interest in statistics. Basically, there’s evidence that the home team wins more often when trailing by 3 points at halftime than if the game is tied at halftime. It sounds bizarre, but the statistics behind it are extremely straightforward. Later this summer I’ll look at this again with data from the past 10 years (or 7, depending on how far back Yahoo!’s box scores go) and see if it still holds true.
- Through-Three Differential: The team leading at the end of three quarters, regardless of home or away and regardless of the amount they lead by, is always statistically more likely to win the game (though not extensively – 82% of games are won by the team leading after three, but only around 60% are won by the team leading if they lead by less than 3).
- Critical Points: There are critical points in the differential for each quarter and half, meaning that there is a certain differential that begins to lead to a much larger chance of winning. For example, a 1-3 point advantage for the home team in the second quarter (only the second quarter, not first and second) yields about a 57% chance of victory – however, 4 and above yields a 70%+ chance.
- Regression Analysis: A regression analysis showed there’s a very strong correlation between quarter differentials (in every quarter) and final result. Especially interesting from this part is the impact that a small improvement in differential can have – this part is also interesting reading even for those not interested in statistics.
- Miscellaneous: Weird stuff happens.
Don’t miss next entry, though. In the next entry we unveil some very interesting statistics about the power of individual quarters, and what periods of the game are most important to perform well in. It’s definitely interesting even to the casual fan, so come back tomorrow when it’s finished and posted. Until then, wish me luck on my last week as a college student.