## Box Score Analysis: The Basics

Like I mentioned last entry, there’s lots and lots of ways to approach this – and some of them are really, really interesting. But there’s a basic foundation that should be laid that encompasses the results in the most general sense.

How well do in-game (within and after each quarter) differentials correlate to the actual differential in the final score? While next time we’ll look at how often each differential leads to a win, this time we’re just looking at how well the periodic differentials predict the final result. If you’re unfamiliar with the idea of correlation, don’t worry – it’s pretty easy to understand what’s going on below.

We’re going to check the correlations between seven different Percent Differentials and the final differential: each quarter’s Percent Differential (for example, the differential for JUST the second quarter, not the first two quarters), each half’s Percent Differential, and the differential after the first three quarters combined (just for curiosity sake). So without giving myself any opportunity to be more wordy, on to the analysis:

(If the items like ‘R’, ‘Slope’ and ‘Standard Error’ don’t make any sense to you, come back in a few days when I have the Statistics Primer posted – it’ll give you an overview of what these things mean. In the meantime, just know that ‘slope’ means that, on average, the quarter differential is the slope multiplied by the final differential and R represents how strong the correlation is)

**Correlation #1: First Quarter Differential vs. Final Differential**

R: .44

Slope: .2462

Standard Error: 6.9

Correlating First Quarter differential with Final Differential yields are very loose correlation, as suggested by the low correlation coefficient (R). So, next time you’re tempted to say “a 4-point lead after one quarter? Why, that’s a double-digit win!” come back and look at that chart because, unfortunately, it really doesn’t work that way very often; unless your First Quarter differential’s up in the high 10s or lower 20s, it’s probably best not to try to draw any conclusions.

**Correlation #2: Second Quarter Differential vs. Final Differential**

R: .44

Slope: .2411

Standard Error: 6.7

And the correlation between the Second Quarter differential and the Final differential is… well, essentially identical to the one with the first quarter. Don’t be misled by the graph, however – it may appear that the Second Quarter is even more jumbled and random than the first, but this is really a result of a few outliers in the top right (total blowouts) changing the appearance of the graph.

**Correlation #3: Third Quarter Differential vs. Final Differential**

R: .48

Slope: .2700

Standard Error: 6.8

Now, the observant members of our audience will notice that there is a slight difference between these third quarter measurements and the previous two quarters: namely, R is .04 higher, and the slope is .03 higher. Is this statistically relevant (that is, do these statistics conclusively demonstrate something absolute, or could they be a result of random error)? That… is a question for the end of this analysis.

**Correlation #4: Fourth Quarter Differential vs. Final Differential**

R: .43

Slope: .2355

Standard Error: 6.8

And in the fourth quarter, we return to the results from the first two quarters – actually, even a tiny bit lower. While this small discrepancy isn’t statistically significant (basically, it doesn’t conclusively prove anything), I believe (with no statistical grounds) that it is still accurate, due to one type of game: blowouts. A notable portion (that I can calculate if anyone is interested) of NBA games are decided by 15 points are more. These games usually see bench players entering the game and playing the final minutes, resulting in the fourth quarter differential being completely different from the rest of the game. This would result in a lower R value, as we see here (which, again, statistically isn’t proven to actually be lower – I’m just speculating).

I’m going to pause here before moving on to the first-half and through-three correlations to analyze this a bit, given that these four studies can be directly compared (all are 12 minute periods). Above I mentioned that the third quarter yields higher values for R and slope than the other three quarters. These measurements, if accurate, would suggest two things: (a) a higher third-quarter differential means a higher final differential, compared to that of the other three quarters, and (b) third-quarter differential is a better predictor of final differential. But, are these measurements statistically significant?

There’s good news and bad news on that. First, the bad news: we can’t conclude from this data that a higher third-quarter differential leads to a higher final differential compared to the other quarters; the standard error (basically, how much the data varies) is too high to really draw any statistical conclusions on the slopes of any of the quarters, other than they’re somewhere in the .22-.28 range.

There is good news, though. According to the data, we can say (with 90% confidence) that the R value for the third quarter really is higher than the R value for the others; the 90% confidence interval for the third quarter R value lies just barely outside the 90% confidence interval for the other quarters.

So what does that mean? The statistics show that the third quarter differential – that is, the point differential in *only* the third quarter (not quarters one through three) – *is* a stronger predictor of the final differential than the point differentials of the other quarters. Or, in simpler terms, you’ll find the third quarter predicts the final outcome more often than any of the other quarters. This, to me, is early evidence of something I think will be statistically proven by the time we’re done with this analysis – that is, that the third quarter is the most important quarter in the game. Obviously this hasn’t been conclusively shown here yet, but the early indicators are there.

Now let’s take a look at the halves:

**Correlation #5: First Half Differential vs. Final Differential**

R: .6466

Slope: .4873

Standard Error: 7.84

As could be expected, a half serves as a much better predictor of the game’s final differential than just a quarter, which is shown here by the higher R value. Interestingly though, this R value is still relatively low (given the corresponding R-square value of .42, which symbolizes a present but weak correlation). Also interesting is that the slope – .4873 – is lower than .5. Given that these data are computed from the actual regular-season results, it’s necessary for all the slopes to add to about 1 (you’ll notice the four quarters’ slopes add to roughly 1 as well), which means…

**Correlation #6: Second Half Differential vs. Final Differential**

R: .6592

Slope: .5054

Standard Error: 7.86

…that the slope for the second half should be higher. And, indeed, it is. Unfortunately, the discrepancy between the slopes is nowhere near statistically significant (thanks again to that high standard error), but that doesn’t mean it isn’t notable anyway. Lacking statistic significance means we haven’t *proven* anything, but it doesn’t mean that we haven’t found evidence possibly suggesting something. There is also a difference here in the R-values between the two halves – this difference isn’t statistically significant either (at a 90% confidence level), but it does reinforce the early idea that the third quarter may be the most significant quarter in the game (though its effects may be diluted by the comparably weakest fourth quarter, both of which factor into the second half).

And now, one last analysis, just for kicks and giggles…

**Correlation #7: Through-Three Differential vs. Final Differential**

R: .83

Slope: .7573

Standard Error: 6.90

This correlation isn’t as useful as the others given that it can’t be compared to any comparable time period (except the final three quarters, which wouldn’t be too useful); and additionally, it’s really just the inverse of the quarter analysis. But it’s useful for keeping our sanity while actually watching games because the differential entering the third quarter is strongly correlated (far more strongly than anything else we’ve looked at) with the final differential. This is likely an effect of 36 minutes having an (obviously) stronger impact on the game than any 12-minute period, but it’s still interesting to see just how close the correlation is. While even a 10-point lead after one quarter failed to correlate with a double-digit win, a 7-point lead entering the fourth strongly relates to an easy win (obviously not EVERY time, but a substantial proportion).

So, that’s about all the information I can milk from this portion of the analysis. I’ll sum everything up below in the Takeaways section, but this analysis provides us with a great jumping-off point for the next two portions of this study.

First of all, while the high standard error made it difficult to draw any conclusions about the final differential, it shows that there is a high degree of variability in the differentials after each quarter (as opposed to the majority of games having only a 4 or 5 point swing per quarter). From there, we can examine the question, do particular teams find more success in different quarters, and if so, is there a particular trend among the more successful teams?

Secondly, while we’ve shown that most differentials do a poor job of predicting the final differential, we haven’t examined whether they predict the final outcome at all. A team with a 10-point halftime lead may ease up in the second half, causing the differential to fail to correlate but preserving the win. What differentials at what milestones most often correlate with a victory?

These are our next two topics (not necessarily in that order) – they should be up within the next week at most.

**LITTLE WHITE TAKEAWAYS**

In this portion of the analysis, we’ve uncovered one fact that is actually backed up by statistics, and a handful of ideas that are suggested by the statistics, though far from being explicitly proven.

The notably demonstrated fact is that the differential within the third quarter (that is, only in the third quarter, not through the first three quarters) is statistically the most accurate (of the four quarters) in predicting the final game differential. This serves as possible early evidence that the third quarter may be the most important quarter in an NBA game.

Also notable, though, was the fact that none of the quarters, nor either half, demonstrated the ability to reliably predict the final differential. There are correlations, but they are very weak; this means, for example, that an 8-point halftime lead typically predicts anywhere from a 4-point loss to a 20-point win. So next time you’re tempted to say “oh, a 5-point lead, we’re going to win by double digits!”, remember this entry.

Statistically we couldn’t actually *demonstrate* anything else, but the statistics did suggest a couple other ideas that should be explored further. Note that these are most certainly not proven truths, just possibilities:

- That the second half differential is a better predictor than the first half.
- That the fourth quarter is the least effective predictor, though it is likely diluted by blowouts (in which the fourth quarter is played very differently from the first three).

Coming up next are two extremely interesting (in my opinion) parts of the analysis: first, which teams typically do better in which quarters, and is there a trend to which quarter elite teams outperform their competition in? And second, how often do particular leads after each quarter correspond to victories, even if the margin of victory is lower?

So that’s the end of this marathon analysis. This one is likely longer than the others will be, due to its role as the jumping-off point, so if you’re out of breath after reading this epic of an analysis, don’t worry – so am I.

-DJ