The 2-3-2 Format

And now a quick break from your regularly-schedule box score analysis (next post should be up tomorrow) for a quick thought inspired by this Tim Donaghy business.

Tim Donaghy’s accusing the NBA of some shady refereeing practices in order to ensure one of its marquee series (Kings-Lakers in 2002) went to a full revenue-boosting 7 games.

But, what if that’s not the only method the NBA is using to elongate series?

Now, don’t get me wrong - there are a lot of great reasons the Finals goes 2-3-2. The media is the main one, since ABC is a bit less agile than TNT and ESPN and needs a lot more to move back and forth between two host cities (actually, I’ve always kind of wondered about that - doesn’t ABC have enough money to have a crew in each city and just fly the sportscasters back and forth? But I digress). It benefits the teams too, given that the Finals are the only playoff series to feature travel that’s clear across the country (since New Orleans and Los Angeles are within walking distance, right?). I’m not implying that what I’m about to say is the primary reason for the 2-3-2 format (although I do question the modern-day value of it - I understand why it was needed in 1984, but now I’m not quite as convinced).

But is an added bonus of the 2-3-2 series that it has a tendency to yield more games?

Think about it - more often than not, the team with home court advantage is considered, on some level, to be better than the away team. It’s obviously not always true, but if you have to make a choice without knowing the teams, you’ll likely choose the team with HCA going into the series - after all, the team with HCA has won over 75% of Finals series (I’ll refer to the teams as Home and Away, but note I’m speaking for the entire series, not just game-by-game).

So, that means it’d theoretically be more likely (in a very general sense) for the Home team to win a game away against the Away team than for the Away team to win a game away against the Home team.

Under either format (2-2-1-1-1 or 2-3-2), that means that it’s more likely for the series to enter Game 5 with the Home team leading 3 games to 1 than for the Away team to be leading 3 games to 1.

Under the 2-2-1-1-1 format, that game happens at the Home team’s arena, meaning the Home team is more likely to win (than they would be on the road) and end the series after 5 games.

But under the 2-3-2 format, that game happens at the Away team’s arena - meaning that the Home team is less likely to win (than if they were playing at home).

Considering again that it’s more likely for the Home team to lead 3 games to 1 than the Away team, it follows the Game 5 is more likely to be an elimination game for the Away team - so putting it at their arena lowers their risk of a Game 5 elimination, and thus increases the chance for a 6+ game series.

I should specify that I, personally, find nothing wrong with this approach if it is, in fact, true - it’s one of those rules that’s decided ahead of time, so it can’t be intended to favor any one particular team. Any team could benefit from it in any given year. But as long as we’re on the topic of the league possibly attempting to elongate series using “un-kosher” methods, is it really that odd to consider they might be doing it by more fair and balanced methods as well?

The logic checks out, but this is a statistical blog, so let’s run the numbers. In this case, however, we can’t. It’s unfeasible to straight-up average the number of games per series before and after the change - while it would give an apparently informative result, it’s likely to be heavily influenced by the increased parity and level of competition in more recent years (which, admittedly, would statistically prove this hypothesis, but it wouldn’t make it true). The alternative would be to count how many Game 5’s happened with one team holding a 3-1 lead and seeing the Road team is more likely to win their Game 5 home game after the switch - but, the sample size is too small to derive anything meaningful from that (only 9 Game 5’s with the home team leading 3-1).

So instead we’re just left with our reasoning and no real knowledge of if it holds true; we have no reason to think it wouldn’t, but the impact home court advantage pales in comparison to the talent levels of the teams, injuries and numerous other factors, so it would take a very large sample size to isolate those and statistically see if this theory holds up. So, come back in 150 years.

LITTLE WHITE TAKEAWAYS

Logically, it seems like the 2-3-2 format might result in longer series than the classic 2-2-1-1-1 format, if we assume that in the big picture, the team with Home Court Advantage starting out a series has a better chance of winning a road game than the Road team. With Game 5 being played in the Road team’s home arena, they have better odds to pull a series to 3-2 and force a Game 6 than if the same game were played in the Home team’s home arena. Theoretically, over time this would result in the average NBA finals series being at least one game longer.

But, this is all theoretical since there isn’t enough data to analyze this without heavy influence from other variables, partially due to the increased parity of the league over the years, and partially due to the relative rarity of a 3-1 lead for the Home team in the finals (only 9 of the last 24 Finals series have featured a 3-1 Home team lead).

But who knows, maybe the Celtics will win tonight - then we’ll have the Lakers playing Game 5 at home to try to force a Game 6, rather than on the road (EDIT: Hey, I’m psychic).

That’s my random tangent for this week. Back to the Box Score Analysis tomorrow, or maybe Saturday.

-DJ

June 12th, 2008, posted by joyner

Introducing Elaine, our lovely blogmuse

As promised, our humble blog is now adorned with a more fitting header image. Inspired by the Brent Barry quote (and our subtitle) “Stats are like bikinis - nice to look at, but don’t tell you the whole story”, Elaine (named, naturally, after my girlfriend) is sporting a lovely white swimsuit with an array of statistical formulas as the pattern. And yes, her swimsuit bottom is in the shape of a normal distribution.

Thanks again go to Stuart, a college buddy of mine, for the image. Stuart’s an incredibly talented graphic artist and web designer, so if you’re in need head over to his web site JSL Cyberworks to check out his portfolio.

June 10th, 2008, posted by joyner

Box Score Analysis: The Basics

Like I mentioned last entry, there’s lots and lots of ways to approach this - and some of them are really, really interesting. But there’s a basic foundation that should be laid that encompasses the results in the most general sense.

How well do in-game (within and after each quarter) differentials correlate to the actual differential in the final score? While next time we’ll look at how often each differential leads to a win, this time we’re just looking at how well the periodic differentials predict the final result. If you’re unfamiliar with the idea of correlation, don’t worry - it’s pretty easy to understand what’s going on below.

We’re going to check the correlations between seven different Percent Differentials and the final differential: each quarter’s Percent Differential (for example, the differential for JUST the second quarter, not the first two quarters), each half’s Percent Differential, and the differential after the first three quarters combined (just for curiosity sake). So without giving myself any opportunity to be more wordy, on to the analysis:

(If the items like ‘R’, ‘Slope’ and ‘Standard Error’ don’t make any sense to you, come back in a few days when I have the Statistics Primer posted - it’ll give you an overview of what these things mean. In the meantime, just know that ’slope’ means that, on average, the quarter differential is the slope multiplied by the final differential and R represents how strong the correlation is)

1st Quarter Differential vs. Final DifferentialCorrelation #1: First Quarter Differential vs. Final Differential

R: .44
Slope: .2462
Standard Error: 6.9

Correlating First Quarter differential with Final Differential yields are very loose correlation, as suggested by the low correlation coefficient (R). So, next time you’re tempted to say “a 4-point lead after one quarter? Why, that’s a double-digit win!” come back and look at that chart because, unfortunately, it really doesn’t work that way very often; unless your First Quarter differential’s up in the high 10s or lower 20s, it’s probably best not to try to draw any conclusions.

2nd Quarter Differential vs. Final DifferentialCorrelation #2: Second Quarter Differential vs. Final Differential

R: .44
Slope: .2411
Standard Error: 6.7

And the correlation between the Second Quarter differential and the Final differential is… well, essentially identical to the one with the first quarter. Don’t be misled by the graph, however - it may appear that the Second Quarter is even more jumbled and random than the first, but this is really a result of a few outliers in the top right (total blowouts) changing the appearance of the graph.

Third Quarter Differential vs. Final DifferentialCorrelation #3: Third Quarter Differential vs. Final Differential

R: .48
Slope: .2700
Standard Error: 6.8

Now, the observant members of our audience will notice that there is a slight difference between these third quarter measurements and the previous two quarters: namely, R is .04 higher, and the slope is .03 higher. Is this statistically relevant (that is, do these statistics conclusively demonstrate something absolute, or could they be a result of random error)? That… is a question for the end of this analysis.

Fourth Quarter Differential vs. Final DifferentialCorrelation #4: Fourth Quarter Differential vs. Final Differential

R: .43
Slope: .2355
Standard Error: 6.8

And in the fourth quarter, we return to the results from the first two quarters - actually, even a tiny bit lower. While this small discrepancy isn’t statistically significant (basically, it doesn’t conclusively prove anything), I believe (with no statistical grounds) that it is still accurate, due to one type of game: blowouts. A notable portion (that I can calculate if anyone is interested) of NBA games are decided by 15 points are more. These games usually see bench players entering the game and playing the final minutes, resulting in the fourth quarter differential being completely different from the rest of the game. This would result in a lower R value, as we see here (which, again, statistically isn’t proven to actually be lower - I’m just speculating).

I’m going to pause here before moving on to the first-half and through-three correlations to analyze this a bit, given that these four studies can be directly compared (all are 12 minute periods). Above I mentioned that the third quarter yields higher values for R and slope than the other three quarters. These measurements, if accurate, would suggest two things: (a) a higher third-quarter differential means a higher final differential, compared to that of the other three quarters, and (b) third-quarter differential is a better predictor of final differential. But, are these measurements statistically significant?

There’s good news and bad news on that. First, the bad news: we can’t conclude from this data that a higher third-quarter differential leads to a higher final differential compared to the other quarters; the standard error (basically, how much the data varies) is too high to really draw any statistical conclusions on the slopes of any of the quarters, other than they’re somewhere in the .22-.28 range.

There is good news, though. According to the data, we can say (with 90% confidence) that the R value for the third quarter really is higher than the R value for the others; the 90% confidence interval for the third quarter R value lies just barely outside the 90% confidence interval for the other quarters.

So what does that mean? The statistics show that the third quarter differential - that is, the point differential in only the third quarter (not quarters one through three) - is a stronger predictor of the final differential than the point differentials of the other quarters. Or, in simpler terms, you’ll find the third quarter predicts the final outcome more often than any of the other quarters. This, to me, is early evidence of something I think will be statistically proven by the time we’re done with this analysis - that is, that the third quarter is the most important quarter in the game. Obviously this hasn’t been conclusively shown here yet, but the early indicators are there.

Now let’s take a look at the halves:

First Half Differential vs. Final DifferentialCorrelation #5: First Half Differential vs. Final Differential

R: .6466
Slope: .4873
Standard Error: 7.84

As could be expected, a half serves as a much better predictor of the game’s final differential than just a quarter, which is shown here by the higher R value. Interestingly though, this R value is still relatively low (given the corresponding R-square value of .42, which symbolizes a present but weak correlation). Also interesting is that the slope - .4873 - is lower than .5. Given that these data are computed from the actual regular-season results, it’s necessary for all the slopes to add to about 1 (you’ll notice the four quarters’ slopes add to roughly 1 as well), which means…

Correlation #6: Second Half Differential vs. Final Differential

R: .6592
Slope: .5054
Standard Error: 7.86

…that the slope for the second half should be higher. And, indeed, it is. Unfortunately, the discrepancy between the slopes is nowhere near statistically significant (thanks again to that high standard error), but that doesn’t mean it isn’t notable anyway. Lacking statistic significance means we haven’t proven anything, but it doesn’t mean that we haven’t found evidence possibly suggesting something. There is also a difference here in the R-values between the two halves - this difference isn’t statistically significant either (at a 90% confidence level), but it does reinforce the early idea that the third quarter may be the most significant quarter in the game (though its effects may be diluted by the comparably weakest fourth quarter, both of which factor into the second half).

And now, one last analysis, just for kicks and giggles…

Through-Three Differential vs. Final DifferentialCorrelation #7: Through-Three Differential vs. Final Differential

R: .83
Slope: .7573
Standard Error: 6.90

This correlation isn’t as useful as the others given that it can’t be compared to any comparable time period (except the final three quarters, which wouldn’t be too useful); and additionally, it’s really just the inverse of the quarter analysis. But it’s useful for keeping our sanity while actually watching games because the differential entering the third quarter is strongly correlated (far more strongly than anything else we’ve looked at) with the final differential. This is likely an effect of 36 minutes having an (obviously) stronger impact on the game than any 12-minute period, but it’s still interesting to see just how close the correlation is. While even a 10-point lead after one quarter failed to correlate with a double-digit win, a 7-point lead entering the fourth strongly relates to an easy win (obviously not EVERY time, but a substantial proportion).

So, that’s about all the information I can milk from this portion of the analysis. I’ll sum everything up below in the Takeaways section, but this analysis provides us with a great jumping-off point for the next two portions of this study.

First of all, while the high standard error made it difficult to draw any conclusions about the final differential, it shows that there is a high degree of variability in the differentials after each quarter (as opposed to the majority of games having only a 4 or 5 point swing per quarter). From there, we can examine the question, do particular teams find more success in different quarters, and if so, is there a particular trend among the more successful teams?

Secondly, while we’ve shown that most differentials do a poor job of predicting the final differential, we haven’t examined whether they predict the final outcome at all. A team with a 10-point halftime lead may ease up in the second half, causing the differential to fail to correlate but preserving the win. What differentials at what milestones most often correlate with a victory?

These are our next two topics (not necessarily in that order) - they should be up within the next week at most.

LITTLE WHITE TAKEAWAYS

In this portion of the analysis, we’ve uncovered one fact that is actually backed up by statistics, and a handful of ideas that are suggested by the statistics, though far from being explicitly proven.

The notably demonstrated fact is that the differential within the third quarter (that is, only in the third quarter, not through the first three quarters) is statistically the most accurate (of the four quarters) in predicting the final game differential. This serves as possible early evidence that the third quarter may be the most important quarter in an NBA game.

Also notable, though, was the fact that none of the quarters, nor either half, demonstrated the ability to reliably predict the final differential. There are correlations, but they are very weak; this means, for example, that an 8-point halftime lead typically predicts anywhere from a 4-point loss to a 20-point win. So next time you’re tempted to say “oh, a 5-point lead, we’re going to win by double digits!”, remember this entry.

Statistically we couldn’t actually demonstrate anything else, but the statistics did suggest a couple other ideas that should be explored further. Note that these are most certainly not proven truths, just possibilities:

  • That the second half differential is a better predictor than the first half.
  • That the fourth quarter is the least effective predictor, though it is likely diluted by blowouts (in which the fourth quarter is played very differently from the first three).

Coming up next are two extremely interesting (in my opinion) parts of the analysis: first, which teams typically do better in which quarters, and is there a trend to which quarter elite teams outperform their competition in? And second, how often do particular leads after each quarter correspond to victories, even if the margin of victory is lower?

So that’s the end of this marathon analysis. This one is likely longer than the others will be, due to its role as the jumping-off point, so if you’re out of breath after reading this epic of an analysis, don’t worry - so am I.

-DJ

June 10th, 2008, posted by joyner

Introducing the Box Score Analysis

It’s a trap we’ve all fallen into from time to time. Clinging to a 3-point lead after one quarter, we desperately tell ourselves, “it’s ok, it’s ok! At this rate, that’s a 12-point victory!” Fortunately most of us don’t delude ourselves into thinking that 4-point lead after one minute of play automatically translates into the most lopsided victory of all time, but I think all of us have tried to draw certainty from a first-quarter score at least a few times.

Does it ever work? Of course not. (Drumroll please - incoming is the first statistical fact ever stated on Little White Statistics) In the 2007-08 NBA season, only 47 (out of 1230, a whopping 3.8%) of games had the final score differential be the corresponding multiple of the differential at the end of the first quarter, halftime or through-three (meaning a 12-point win after a 3-point first quarter lead, a 6-point halftime lead or a 9-point through-three lead). In case you’re interested, 15 were multiples of the first-quarter differential, 20 of the halftime differential, and 14 of the through-three differential (2 games had more than one).

Now, I know what you’re thinking - well of course it doesn’t match exactly, but it’s pretty close, right? Well, dear friend, that is the first question we’re going to answer. How well does the score at a particular point in the game correlate to the game’s final outcome? Not just how often does the team leading at halftime win the game - but how accurate is the assumption that a 5-point lead after one might lead to a double-digit victory?

There’s about three hundred eighty-two and a half different ways to examine this - and fortunately, we have all summer! So, after parsing out the box scores for every regular-season game (and discovering some interesting things in Yahoo!’s box scores…), I have a database of every by-quarter box score of every game for the regular season (in related news, if you need any text-parsing applications written, I have some experience).

To make this analysis easier, I’m going to introduce a couple terms to avoid elaborately explaining the same concept over and over. Actually, right now I can only think of one term: I’m calling it Percent Differential (PD). Percent Differential refers to the lead a team holds, with respect to how much of the game has been played.

For example, a team that leads by 3 points after one quarter, 7 points at halftime and 10 points at the end of the third quarter would be said to have relatively the same Percent Differential throughout the game (that is, they outscore their opponent by about the same amount during every quarter). A team that leads by 5 after one quarter, trails by 2 at halftime and ends up winning by 12 would have a very different Percent Differential throughout the game (and in case it’s not obvious, ‘differential’ just refers to how many points a team leads/trails by).

So, next entry we’ll get the ball rolling on what’ll be an ongoing analysis of the predictive power of the game score at different points in the game. Like I said, there’s dozens of things to take away from this sort of analysis - some of the things we’ll look at include:

  • Correlation between quarter-differentials and the final differential.
  • The critical points where a certain lead begins to strongly correlate to winning percentage.
  • Which specific teams are more consistent with their Percent Differentials.
  • Whether home teams have a better chance of maintaining positive Percent Differentials (whether home teams are more likely to increase their leads and decrease their deficits).
  • Whether the statistics change as the season goes along.
  • How quarter differentials impact half differentials, and how half differentials impact games.
  • What teams are historically stronger in certain quarters, and whether that translates into real success (for example, is the best third-quarter team better overall than the best first-quarter team?).

Fortunately, as I’ve gone along with this study, I’ve already started to observe some interesting trends - so unlike some studies where the eventual usefulness of the results is up in the air until they’re obtained, in this particular instance I can already say there will be something notable coming out of this. So, if you’re as interested in this statistical orgy as I am, join us next time - the next post should be up in a couple days or so.

LITTLE WHITE TAKEAWAYS

Told you I was having a simplified ‘takeaways’ section. Most entries will have one of these sections, so that if you want to skip the statistical crap you can jump straight to what they’re supposedly proving. Or, alternatively, you can look here and check what I’m claiming - then, if you agree you can just smile and move along, and if you disagree you can thrash my reasoning to try to find a counter-argument. Point being, here we’ll sum up the results.

Well, except for this post, since we haven’t proved anything yet. The takeaway here is we’re going to do cool stuff, so you should come back. We have cake (the cake is a lie).

-DJ

June 9th, 2008, posted by joyner

Introducing ‘Little White Statistics’

So, after receiving a good bit of positive feedback on my NBA analysis on a forum I frequent (SpursReport.com), I’ve decided to throw my hat into the vast frightening abyss that is the blogosphere. So allow me to be the first to introduce (’cause who the heck else is going to?) “Little White Statistics”, an NBA blog with a statistical focus.

Why statistics? I don’t know. For some reason I just have a tendency to look at things from a statistical point of view, even though I know they’re not always useful. To me, when you analyze something statistically, one of three things happens: it proves nothing (or nothing interesting), it “proves” something that isn’t true, or it proves something that is true. Here’s hoping the latter happens most often here.

If statistics aren’t your thing, don’t worry - I’ll be posting a brief overview of the only parts of stats that are relevant to sports, and I’ll also include an offensively large ‘TAKEAWAYS’ section in most entries, where you can see what the stats showed without actually reading about how they showed it.

Kicking this thing off right before the end of the season is probably a dumb idea, but hey, that gives us an entire off-season to look back at the year before. In the coming months, there are two primary issues I’m going to look at:

  • One, which I’ve already started, is an obnoxiously in-depth look at score differentials throughout a game - that is, at what point a first-quarter lead actually starts to correlate to a win, if different teams are more likely to overcome deficits, if good production in a certain quarter more frequently translates to a win, etc.
  • The second is a more in-depth look at the topic addressed by Christopher Reina’s study, which says that “the Lakers were 38% more likely to win during the regular season when Kobe was not a high-volume shooter and their offense was more balanced, which is a staggering differential.” The study was thoroughly lambasted by the fellas at my personal favorite NBA blog (Yahoo!’s Ball Don’t Lie), and I’d like to look at the statistical viability of some of their criticisms - specifically, to correlate Kobe’s shot attempts on a spectrum to their percentage, not just an either-or over-under 20 shots criteria, and to look if there’s a negative correlation between Kobe’s shot attempts and his teammate’s shooting percentage. On that latter one, I’m hoping to be extra-spiffy and specifically look at his teammates’ FG% per-quarter, to try and answer the age-old question, “which came first, Kobe hogging the ball or his teammates shooting like crap?” (or, in more statistical terms, does Kobe’s high-volume shooting prevent his teammates from getting into a rhythm, or does his teammates’ lack of rhythm force Kobe into high-volume shooting?)

So, jump on for the ride, hopefully it’ll be fun. Like Brent Barry says, statistics don’t ever tell the whole story, and I’m certainly not one to tell you that everything I’m going to prove here is completely and undeniably true. I just hope to shed some light on some different areas of the game and maybe find some subtle nuances that hadn’t been observed. Like, which quarter is it most important to perform well in to win a game? We’ll answer that, and many other questions, in the next few weeks as part of our first study.

But enough about the blog, on to what really matters: me. Because naturally that’s the reason you’re here is to read whatever is that I write. I could spew Lorem Ipsum’s all day and you’d be hanging on my every dolor-sit-amit, right? …right? Ok, so no. But in case you’re interested, I’m a soon-to-be alumni of the Georgia Institute of Technology with a degree in Computer Science - which is currently only being applied to writing text-parsing applications to get the data out of sites that post box scores and the like, since I don’t have the database that ESPN’s Hollinger is so fortunate to have.

I confess to a bit of favoritism in the NBA - I’m a diehard San Antonio Spurs fan and a casual (bordering on bandwagon) Atlanta Hawks fan (when they play their games a mile away from your apartment, you have to be somewhat of a fan). But while it’s pretty common knowledge that statistics do lie, they typically lie pretty evenly in everyone’s favor, so my favoritism shouldn’t affect it. We won’t have to worry about my Hawks favoritism infecting my analysis because there’s not a statistic in the world brave enough to say something good about the Hawks. Stats tend to be pretty favorable towards the Spurs though, so I’ll try to balance any Spurs-positive comments with some Spurs-criticisms too.

Or I’ll just let the stats speak for themselves - fortunately, I don’t set out to find stats that will prove a certain hypothesis, I set out to find stats that’ll answer a certain question. Going into an analysis, I don’t have any hope for what it will or will not prove (other than hoping it’ll prove something, and preferably something interesting). So with any luck my favoritism won’t actually affect anything.

I do have to take a moment to thank a couple people - my friend James for helping me come up with the title of the blog (which, in case you haven’t gotten it yet, is a play on the phrase “Little White Lies”), my friend Sunira for helping me set up the domain and hosting and software and all that, and my friend Stuart for designing the lovely lady that will soon be adorning the top of this blog.

Did I forget anything? Surely I did. But fortunately this brave new world of ours lets us literally go back and edit what we’ve already said, effectively changing the past. Is what you’ve ready here really what I wrote? Have dozens of bad jokes been removed before you got the chance to groan at them? Did I change the descriptions of what I plan to do so that I can say ‘I said I was going to do this 3 months ago!’? In the words of Ball Don’t Lie, that’s for you, dear reader, to figure out.

-DJ

June 8th, 2008, posted by admin