Friday, April 11, 2014

Regular Season vs. Playoffs: Contest+ %

Do teams contest more shots in the playoffs?  I think we'd expect the answer to be a resounding "yes" since, intuitively, contesting shots seems at least in part related to effort, and don't players try a little harder in the playoffs?  But the results are not so clear.

First, let's refresh on the definition of Contest+. We define Contest+ as any shot that is altered, blocked, or contested. Contest+ % will simply be the percentage of shots that are altered, blocked, or contested. For an exact definition on those terms, review this article.

Now, let's explore the methodology and results. I looked at Team Contest+ % for a sample of regular season games and the playoff games in the 2011-12 season. For those interested in just seeing the results and the conclusions, skip ahead. First, I conducted Levene's test for equality of variance and found that we have to reject the hypothesis of equality of variances in the two samples (p-value < .01). Next, I conducted Welch's t-test to see if the means were statistically different. Here are the results:

t-Test: Two-Sample Assuming Unequal Variances
Reg Season Playoffs 
Mean 0.4729023 0.456445
Variance 0.0080041 0.004745
Standard Deviation 0.0894656 0.068886
Observations 905 168
Hypothesized Mean Difference 0
df 283
t Stat 2.7022436
P(T<=t) one-tail 0.0036517
t Critical one-tail 1.6502557
P(T<=t) two-tail 0.0073034
t Critical two-tail 1.9683819
*Note: Playoff teams had a regular season Contest+% of 47.5%

Notice that Contest+ % in the playoffs has a lower average than the regular season. Moreover, not only is the average lower, but the difference in means is statistically significant (p-value < 0.01). So what do we make of this? Does this actually mean that teams contest a lower % of their shots in the playoffs than in the regular season? The evidence certainly seems to say "yes." However, before coming to this conclusion, I think we have to keep a few things in mind. First, the sample sizes are drastically different and there are nowhere near as many playoff games than regular season games. Second, there are fewer teams in the playoffs, and theoretically better teams that may be just better at getting open looks.  More than a quarter of the playoff games are coming from the Heat and the Celtics, who played in 23 and 20 games, respectively, that year. Finally, the actual difference in averages is not that large (just 1.6%). Regardless of whether it is statistically significant, this isn't a major difference. So with all this being said, while we may not be ready to conclude that players contest fewer shots in the playoffs, one thing we can safely say is that they aren't contesting more shots. And that is surprising. 

So what about our expectation that teams give more effort in the playoffs and, therefore, should contest more shots?  The fact that players don't contest more shots in the playoffs (despite what we assume to be marginally better effort) suggests one of three things (or some combination): (1) teams can't shake bad habits in the playoffs that were developed in the regular season; (2) skill is the determining factor in contesting shots, not effort; or (3) offenses that are good at getting uncontested shots trump the marginal increase in contesting shots that greater defensive effort produces. 

I'll end this post with a graph of Contest+ % for each playoff game for the conference finalists:


As you may be able to tell from the graph, the Celtics had the smallest variance while the Spurs and Heat had the largest variances. The Thunder also had three of their four worst games in terms of contesting shots in the last three games of the Finals, and this general downward trend is consistent across the teams to a lesser degree.

In part 2 of this article, we'll explore Open+ Frequency, how it changes from the regular season to the playoffs, and its relationship with Contest+%.

If you have any questions, comments or suggestions, please reach me on twitter @knarsu3 or e-mail me at krishnanarsu3@gmail.com 

Thursday, April 10, 2014

Visualizing Vantage's Metrics


This post will be brief. I'm going to allow the reader to visualize Vantage's metrics and interpret the correlations in his or her own way. First, let's look at the correlations between Offensive Efficiency and some of the various offensive metrics that Vantage tracks in the form of a correlation matrix. 

The positive correlations are shown in blue, while the negative correlations are shown in red. The darker the color, the greater the magnitude of the correlation. For the Pie graphs, the more filled they are, the higher the correlation. In addition, notice the lines in the small graphs go forward for positive correlation and backwards for negative correlation.
In the upper left corner is Offensive Efficiency followed by Received Screen Outcome Efficiency, Open+ FG%, Set Screen Points Per Chance, Set Screens Per Chance, Set Screen Outcome Efficiency, Screens Received Per Chance, Isolation Frequency, Roll-Pop%, Solid Screen%, Contest+ FG%, Open+ Frequency and Cut Efficiency. Please see the table at the bottom of this post for definitions.

Open+ FG% and Contest+ FG% have the highest correlations with Offensive Efficiency while the next highest correlation is Set Screen Points Per Chance. This is not in the least bit surprising since the three of these metrics are directly related to points. Likewise, we also see statistics like Received Screen Outcome Efficiency and Set Screen Outcome Efficiency are highly correlated. However, perhaps most interesting is the correlation between Set Screens Per Chance (or Screens Received Per Chance) and Set Screen Points Per Chance. Does setting a lot of screens make teams more efficient?  Or do more efficient teams just set more screens? Which metric causes the other?

Now let's look at the correlations for some of Vantage's defensive statistics.

In the upper left corner is Defensive Efficiency followed by Effective Screen Defense Rate, Contest+, Pressure Rate Per 100 Chances, Effective Help Rate, Turnovers Forced Per Chance, Defensive Moves Per Chance (also called Defensive Activity Rate), Inside Shots Against %, Close-Out Points Allowed, Effective Double Team Rate, Keep in Front % and Deflections Per 100 Chances. Again, please refer to the bottom of the post for definitions.

Turnovers Forced Per Chance and Close-Out Points Allowed are the most correlated with Defensive Efficiency. However, Close-Out Points Allowed is also the only metric that is on the scale of points allowed. For example, Effective Help Rate measures the number of help attempts that don't end in a score, assist+ or a missed Open+ shot. It is not directly correlated with points allowed. This does not mean it's not a useful statistic, as it's more likely to be predictive than reflective. 

Now let's look at the relationship between switching on screens and how effective teams are at defending screens (Effective Screen Defense Rate). 

For teams that have more points (which are just individual games) in the upper right of the graph, they will be switching on screens a lot while still remaining effective defending screens. Theoretically, these teams will need versatile defenders who can guard multiple positions to be able to effectively switch on screens. As you can tell from the graph, not many teams switch on screens very often. One team that is pretty interesting is the Knicks, who might come the closest to having a number of points in the upper right corner. They certainly appear to switch on screens more than most teams while still playing effective defense on screens. Another interesting team is the Nuggets, who appear to have a number of random points all over the place. Their graph appears to be the most spread out (they switch sometimes, other times they don't switch, they also play good and bad defense on screens). Finally, it's worth remembering the scale of this graph which goes from 0 to 0.4 with some occasional games near or above 0.5. However, for almost all of the games, switching on screens is the less likely event.

Let's take a closer look at the graph above with a subset of 6 teams (the Bulls, Celtics, Clippers, Heat, Lakers and Thunder).
Each team is fit with a regression line as well as a shaded region that includes the 95% confidence interval for the fit. For most teams, we see that an increase in switch% on screens leads to a decrease in Effective Screen% (Effective Screen Defense Rate). However, what this graph is really great for is that we get an idea of the magnitude of the decrease in Effective Screen% (Effective Screen Defense Rate). For example, the Lakers are significantly worse defending screens the more they switch but a team like the Thunder plays pretty consistent screen defense whether they switch or not. In fact, we can see a slight increase in their regression line when they switch on defense (meaning they play better screen defense when they switch).




















If you have any questions, comments or suggestions, please reach me on twitter @knarsu3 or e-mail me at krishnanarsu3@gmail.com 

Friday, March 14, 2014

Screening Ability and Offensive Efficiency



Vantage recently collaborated with Amin Elhassan of ESPN for a very interesting article on the 10 best screen setters in the league on a budget.

Due to the interest this piece generated, I wanted to follow up by providing some clarity on the importance of screens in generating efficient offense.

Tracking Screens

The "big news" out of the MIT/Sloan conference is that researchers can now recognize 80% of on-ball screens (with a sensitivity of 82%) using SportVu location data.   I guess this makes them the Carmelo Anthony of analytics (OK, fine, even Carmelo probably recognizes more than 80% of on-ball screens ... he just doesn't use them).

While we have no doubt that the algorithms to accomplish this are very interesting, recognizing 80% of on-ball screens is like having 80% of a ball -- you're not getting any roll there.  This is a perfect example of the disconnect between the analytics bubble and the real world.  People spend all this time and effort in producing something that has no value to players and coaches.

Even if we were talking about recognizing 100% of screens, this information isn't helpful at all without the context of screen effort (did the screener make contact with the defender or reroute the defender?), screen defense (did the defender hedge, soft show, etc.?), screen usage (did the ball-handler use the screen or split the defenders?), and screen outcome (did the player get an open shot, get a teammate an open shot, or develop a play that resulted in an otherwise effective outcome?).

Spoiler alert - Vantage already tracks every screen and all this context to boot.  If you missed it, here is a background on Vantage screen analysis.

Recognizing screens is actually the easy part. We're analyzing why they are important and who is good at them. 

Who Cares About Screens? 

As the only legal method for blocking in basketball, screens are vital to creating open shots, favorable matchups, and forced rotations. Brand new research using Vantage data is verifying what coaches (and smart fans) have long understood: with a league full of world-class athletes, teams that don't screen effectively cannot score efficiently.

The guiding star for any offense is "Offensive Efficiency" or, in other words, Points Per 100 Possessions. Research by Vantage Contributor Lorel Buscher verifies that two Vantage metrics, "Received Screens Per Chance" and "Set Screen Points Per Chance," are significant predictors of Offensive Efficiency.

Set Screen Points Per Chance is the amount of points generated through screens per offensive chance, while Received Screens Per Chance is the amount of screens set per offensive chance.  The upshot is that teams that set a lot of high-quality screens have more efficient offenses.
More detail is provided below for the statistics-inclined.

Spurs & Thunder vs. Jazz 

The top two teams in Set Screen Points Per Chance over the past 2.5 seasons are the Spurs and Thunder, each at .032. In other words, these teams have generated .032 points per offensive chance from their screens.

Both Tiago Splitter and Tim Duncan generated .101 points per chance through screens. For the Thunder, Kendrick Perkins (.115) and Nick Collison (.081) led the way.

Here is an example from each screen setter for the Thunder:



Simple plays that require (1) a screener that creates space, (2) a ball handler defenders must respect (or do respect even if they should not), and (3) a shooter to make a non-contested shot. We see these plays every night, and they are the backbone of efficient teams.

On the other end of the spectrum, the Jazz have struggled to generate points through screens. As a team, they've only generated .015 points per offensive chance through screens, almost half of what the top teams generate. The Jazz are also in the bottom four in amount of screens set.

Even a team like the Heat that can rely on a dominant one-on-one player is still in the top eight in Set Screen Points Per Chance.

The Nitty-Gritty

Researcher Lorel Buscher led our research analysis on this project by employing a Bootstrapped Regression Model to verify the importance of screens. Bootstrapping is simply a method of resampling that assigns measures of accuracy among the included metrics. 

By bootstrapping, we were able to take 100 random samples from the original data, allowing each sample to be considered independent of the others. A requirement was made that a metric appear in at least 70% of the models built to be deemed a significant predictor.

Once the conclusion was made that a metric was not only significant but also a positive predictor of Offensive Efficiency, its relative importance (a weighted average of its recorded estimate across all 100 models) was calculated.

Both "Received Screens Per Chance" and "Set Screen Points Per Chance" were positive predictors with relative importance measures exceeding 0.8.

Keeping It Rolling 

Our clients have moved beyond the data-collection problem to the integration and analysis problems, and it is time for fans (and the analytics community itself) to do the same. Stay tuned as we roll out more discoveries from our data.



Friday, February 7, 2014

The Intersection of Shot Defense, Location, and Clock (DLC)

Is it better to shoot early in the shot clock? What if the shot is contested, is it still better to shoot early? How does shot location influence the decision to shoot? Is it better to attempt an open mid-range shot early in the shot clock versus a contested shot near the basket late in the shot clock?

These are likely some of the decisions going through players' minds when they decide to shoot, even if they don't realize it. The last example is particularly interesting because it gets at the crux of shot selection: the player has an open mid-range shot, but is it worth passing that up to get a potentially contested shot near the basket later in the shot clock?

There's been previous research into whether teams should be shooting early in the shot clock. However, we can take this a step further and look at the shot defense faced. WARNING: this will be a lengthy exercise but there will be a number of visuals to help you along.

First, before we start, I suggest refreshing yourself on the definitions for each type of shot defense. You can find that here. Done? Let's start off by looking at how FG% changes given the time in the shot clock and the type of shot defense.

Note: Shot Clock values are time remaining



As we can see in the graph, the earlier you can shoot in the shot clock, the more likely you are to make the shot. This is true for basically all forms of shot defense (the exception being altered, where the sample size is pretty small and filled with bad shots taken after shot-clock resets often in desperation). Is there a particular reason for this? Certainly, we can posit that players rush their shots at the end of the shot clock, which may lead to the diminished FG% (labeled Panic Room in the graph). However, what about the increase in FG% even as we get away from the "late in the shot clock" area? Well, one reasonable assumption is that players who shoot early in the shot clock are likely in transition (labeled Transition in the graph). It could be that a majority of the open/guarded/pressured shots that are being taken early in the shot clock are shots that are close to the basket causing a higher FG%. But what about the contested shots? Whether the player is in transition or not, a contested shot has a lower chance of going in, especially near the basket.

There are a few issues with the graph presented above: first, there is no adjustment for the 3 being worth one more point. This is easily correctable: we'll look at points per shot instead


The graph above looks mostly the same with the differences between open and contested shots better defined. The second issue we have with both graphs is that the sample size on the ends are considerably smaller and it's particularly clear with altered shots where there were only four altered shots taken at 24 seconds in the shot clock. This is why you see that big nosedive for altered shots at 24 seconds. So let's fix this issue by creating 5-second bins for each type of shot defense. The graph is below:


We can see that regardless of shot defense, if you shoot earlier in the shot clock, the PPS goes up. There is also a drastic increase from the 16-19 seconds to 20-24 seconds bin for guarded, open and pressured shots. As I alluded to earlier, it's likely that many of those shots are occurring close to the basket in transition. So we'll need to break it down even further by shot location. But before we do that, I think it's important to note that for altered and contested shots, we still see a steady increase in PPS as you shoot earlier in the shot clock. It's possible that a lot of the shots in the 20-24 second interval are still in transition but because the shot is being contested, I'd argue a shot in transition is irrelevant. We know contested shots near the basket have a much lower chance of going in. 

Before we break down the graphs by shot location, let's first look at the distribution of shot defense at each second:


This graph may seem a bit confusing but what the graph shows is approximately 50% of the shots taken at one second were contested while almost 20% of the shots taken at one second were pressured. If you add up all the points for each level of shot defense at one second, it will be 100%. So what does this graph tell us? The majority of the shots taken are contested and as the shot clock winds down, the rate of contested shots goes up. In regard to the end points (such as 24 seconds), keep in mind the sample size for these points are going to be much smaller so that's why you see some odd spikes or nosedives at that time interval.

Now let's move onto looking at the shot clock and shot defense with respect to the shot location. As I mentioned earlier, it's likely that a lot of the guarded, open and pressured shots early in the shot clock were coming near the basket in transition.

So we can remove the shots that were near the basket and look at only mid-range shots and three point shots. This should essentially remove the effects of transition offense since most players attack the basket in transition.


Again, we can see that shooting at the end of the shot clock is not recommended. However, perhaps what is most interesting is that at one second, the chances of making the shot are essentially the same across all levels of shot defense. The sample size is not ideal-- about 220+ shot attempts for open and guarded-- but it's large enough that we should feel comfortable drawing some conclusions. If you are wondering why the graph only goes to 23 seconds, it is because I removed shots taken at 24 seconds due to a very small sample size (22 total shots).

Let's also take a look at the distribution of these shots at each time interval:


Like the first graph, we see that the majority of shots taken from mid-range or three are contested.

Now, let's look at mid-range shots and threes separately since the value of each shot is significantly different. First, we'll look at the dreaded mid-range shot. How does FG% vary for mid-range shots over time in the shot clock?


We see that FG%'s decline as the shot clock winds down with the exception of open and guarded shots at one second. However, the reason for this is simply due to a small sample size-- less than 100 shot attempts for both open and guarded shots at one second. In fact, we have some sample size issues on both ends of the graph so let's look at the same graph but with five second binned intervals:


We've removed some of the sample size issues from the previous graph (although as you can tell, there are still some issues with guarded and open shots from 20-24 seconds, where fga is 140 and 165, respectively) and we can continue to see that it is advantageous to shoot earlier in the shot clock. In fact, contested mid-range shots taken with 16-24 seconds remaining are more likely to go in than guarded shots taken with 1-5 seconds remaining and almost as likely to go in as guarded shots taken with 6-10 seconds remaining.

How about the distribution of shots taken at each second?


Again, we see that as the shot clock winds down, we see more and more contested shots taken. After looking at this graph, I'm sure you are left wondering how often is the "worst shot" taken? (If you are wondering what the worst shot is, it's a contested mid-range shot taken with 1-5 seconds remaining) Over the last two years of shots that Vantage has tracked, the "worst shot" was attempted 4% of the time. If we include pressured mid-range shots taken with 1-5 seconds remaining, that number jumps to 5.2%.

Let's move on to exploring three-point shots and how the FG% varies over the shot clock. Like before, we'll look at the FG% for the different types of shot defense:


As I mentioned in the previous article, we can see the difference between open and guarded shots versus contested shots. We also see that for pressured shots, there seems to be a pretty high fluctuation from second to second. As I did in a previous graph, I had to remove the 24 second interval as well as some of the values for the 23 second interval due to sample size issues. Let's fix those issues by looking at the 5-second binned intervals:


This is the first graph where we can't really see any effect of shooting earlier in the shot clock. We do see that in the 1-5 second bin, FG% is at its lowest. However, there is no trend among any of the other bins. So if a team wants a three, it does not matter when in the shot clock they shoot it. Another interesting conclusion we can draw is that there seems to be a significant difference in the FG% for shots taken in the 1-5 second bin across each level of shot defense. We see that going from a contested three to a guarded three can raise your FG% by about 7% and going from a guarded three to an open three can raise your FG% by 5.5%. So if a shooter is able to get a wide open three within the last five seconds of the shot clock, it will still be very beneficial to the offense. We also see that a guarded three taken in the last five seconds of the shot clock doesn't have a much higher chance of going in than a pressured or contested three taken earlier in the shot clock.

Let's look at the distribution of shot defense at each time interval:


Again, contested threes are the most likely shot and the rate of contested shots goes up as the clock winds down. Players are also attempting more open threes with 15-22 seconds remaining on the shot clock.

Finally, let's look at shots close to the basket. When looking at the FG%, keep in mind that many of the shots early in the clock may come in transition. However, if the level of shot defense is the same, should the shot being in transition really inflate the FG%? I'll let the reader decide when looking at the graph.


This graph may be the best visualization of the research conducted in my first article. The difference between getting a contested shot near the basket and a pressured shot near the basket is night and day. And the difference between getting a pressured shot and an open shot (which will mostly be layups or dunks) is also night and day. In regard to the shot clock, there doesn't appear to be any difference between getting an open shot early in the shot clock as opposed to late: it's about a 90% proposition either way. This isn't entirely surprising since a dunk with five seconds left in the shot clock is as likely to go in as a dunk with 20 seconds left in the shot clock. However, there is a difference for every other level of shot defense. We see that guarded shots taken early in the shot clock are as likely to go in as open shots taken late in the shot clock while pressured shots taken early in the shot clock are as likely to go in as guarded shots taken late in the shot clock. We also see that the FG% for contested shots seems to gradually increase as well. Still, like many of these graphs before, we do have a bit of a sample size issue (especially the guarded and open shots). So let's look at the same graph with five second binned intervals:


This graph is a bit easier to interpret and we see that for every level of shot defense with the exception of open shots, FG% goes up as you shoot earlier in the shot clock. We can also see that as you move "up" each level of shot defense, a player is more likely to make the shot.

As we've done before, let's look at the distribution of shot defense at each time interval:


This graph is different from some of the past ones we've looked at. Early in the shot clock, there are actually more pressured shots attempted than contested shots. There are also more open shots attempted early in the shot clock. Like the past graphs, the rate of contested shots goes up as the time on the clock elapses. Players also foul less as the shot clock winds down- clearly a bright idea. Although it is interesting that block rate increases as the shot clock goes down. Despite the higher rate of blocked shots, players are still able to foul less.

Let's answer some of the questions posed at the beginning of this article.

Is it better to shoot early in the shot clock?
I think the evidence certainly points to this. However, for three point shots, there doesn't appear to be any advantage for shooting early.

What if the shot is contested, is it still better to shoot early? How does the shot location influence this?
At first glance, it is certainly never a good idea to shoot a contested shot. However, for mid-range shots and threes, there is a cutoff point where contested shots early in the shot clock are basically as good as guarded or open shots late in the shot clock. For mid-range shots, that cutoff appears to be around approximately 10 seconds. For threes, that cutoff point only applies to guarded shots and occurs at around approximately five seconds. Of course, I'm only eyeballing the cutoff point based on the binned graphs so it would be an interesting exercise to determine the exact cut off point. But to answer the question above, it is not better to shoot a contested shot early. However, if you are going to wait for the shot clock to go all the way down, you may as well shoot that contested shot.

Is it better to attempt an open mid-range shot early in the shot clock versus a contested shot near the basket late in the shot clock?
We can take this last question one step further and look at contested threes as well. In order to compare the three types of shots, let's look at our last graph:


In looking at the graph, it is pretty clear that open mid-range shots are never preferable over contested threes or close shots. However, at each end, we do see that the PPS for contested threes and open mid-range shots are nearly the same. If the shot clock is winding down to near zero, it isn't a horrible idea to step in a few feet and shoot the mid-range jumper. Likewise, if you have an open mid-range shot early in the clock, instead of stepping back and risking it being contested, you may as well take the open shot. Otherwise, contested threes and contested shots near the basket are better bets than open mid-range shots.

Unfortunately, for those clamoring for the mid-range shot to remain a big part of the game, there isn't much here to support your claim. In fact, the evidence here seems to point to mid-range shots being completely abandoned. Still, it is important to break this down further by each specific shot location. Perhaps there are areas of the mid-range game where shooting an open shot is better than shooting a contested three/close shot. Additionally, each player is going to have different percentages. While the league as a whole should be shooting less mid-range shots, for a specific player, this graph may look much different.

Also, since I mentioned that the league should be shooting less mid-range shots, what exactly does the distribution of shots look like?


As expected, the rate of close shots goes down as defenses are able to set up and protect the basket. We also see the rate of mid-range shots increase till about 15 seconds at which point it levels off at around 40%. Instead, we see the rate of threes increase as the shot clock winds down.

Bonus question: What about contested threes versus contested shots near the basket?
Perhaps the more interesting part of the first graph is looking at contested threes versus contested shots close to the basket. If we look at the best fit lines (the solid lines with no points) for contested threes and contested shots close to the basket, we see that contested threes are preferable to contested shots near the basket from about 19 seconds remaining to about 5 seconds remaining. However, if it is either early in the shot clock or late in the shot clock, it is better to shoot a contested shot near the basket than to take a contested three.

If you have any questions, comments or suggestions, please reach me on twitter @knarsu3

Note: Shot locations were defined as followed:
Close shots were shots taken in locations t,u, and v
Threes were shots taken in locations a, b, c, d, and e. Locations aa, bb and cc were removed due to them being past half-court.
Mid-range were shots taken in locations f, g, h, i, j, k, l, m, n, o, p, q, and r
Shots in locations s and w were ignored due to not really fitting in any category. The total shots taken from those locations were about 130 apiece i.e. not a significant difference. 



Friday, January 31, 2014

Defending the Three

Before Paul George broke out as an offensive force, he was widely known for his suffocating defense. But is he a good enough defender to make shooters miss wide-open threes?

video

That was a trick question. Of course not! This video is not an example of good defense but rather Paul George getting lucky that an open shooter missed a wide-open shot.

In a previous article, I developed a framework for calculating defensive XPPS based on Ian Levy's Expected Points per Shot. Let's apply this method but focus specifically on three-point shooting and determine which players are getting lucky. Have Paul George's opponents missed a ton of open shots?

In order to understand what constitutes an open shot, let's revisit some of Vantage's definitions for shot defense.

Altered shot: Defender is within 3 feet of shooter and hand is up. Offensive player must change shooting angle or release point while in the air.
video

Altered shot:
video

Contested shot: Defender is within 3 feet of shooter and hand is up. Offensive player does not alter shooting angle or release point.
video

Pressured shot: Defender is within 3 feet of shooter but does not have his hand up.
video

Guarded shot: Defender is within 5 feet of shooter but not within 3 feet.
video

Open shot: Defender is not within 5 feet of shooter.
video

So what are the league's percentages on threes when left open versus contesting a shot?

Shot Location     contest       altered    guarded     open  pressured
a 34.97% 0.00% 44.21% 42.50% 35.20%
b 34.51% 0.00% 39.58% 41.04% 32.36%
c 32.37% 12.50% 35.81% 39.95% 30.35%
d 32.69% 0.00% 39.65% 41.18% 33.51%
e 35.53% 0.00% 38.81% 48.27% 42.22%
*there were only 35 altered 3 point attempts in the sample of data
(Note: See location chart at the bottom of this post.)

For the most part, the biggest differences in three point shooting occur when pressuring/contesting the shooter as opposed to guarding him. The one exception to this is for threes attempted in the right corner, where we see some odd splits. We also see that straightaway threes are the worst type of three with lower percentages at every level of shot defense (besides altered shots, where the sample size is small).

We can use these percentages - in the form of points per shot - to calculate defensive XPPS and develop a Luck statistic. Based on the percentages above, we know it is important for defenders to at least pressure the opposing shooter. So to calculate our Luck statistic, we'll add up the points that a defender gave up on guarded and open shots and subtract the number of points they would be expected to give up based on the league wide rates for guarded and open shots. 

Let's look at an example using Paul George. First, we need to calculate the points per shot for each type of shot defense at each three-point location. We can do this by multiplying all the values in the table above by 3. Next, we multiply these values by the number of field-goal attempts against Paul George. For example, in the left corner, opponents attempted 3 open shots against George. We multiply this value times the league points per shot value on open shots (so from the table, .425*3=1.275*3 open shots=3.825). We do this for both open and guarded shots and get the following table for Paul George:

     points exp points
a 9 5.151
b 9 13.368
c 3 7.892
d 12 12.171
e 3 8.434
Total 36 47.016
Luck -11.016

As shown in the table, after calculating the points and expected points for each shot location, we add it up and then subtract the expected points from the actual points to get a "Luck" statistic. A negative value represents the fact that Paul George gave up less actual points than he was expected to give up on guarded and open shots given the league wide rates. A positive value would indicate the player has been unlucky due to giving up more points than expected. 

Of course, there are a few issues with the Luck statistic calculated above. First, we are using a counting statistic, which will reward those who give up more open/guarded shots. The more open/guarded shots the defender gives up, the more likely it is that his Luck statistic will be larger in either direction. This will also favor those players who have defended the most shots in the sample because they are also more likely to have defended more open/guarded shots. We can address this by dividing the Luck statistic by field goal attempts. However, those players who give up a ton of open/guarded shots should be penalized and we get a better idea of which players those are by looking at "Luck" as a counting statistic.

With the flaws of Luck in mind, let's look at Luck versus Points per Shot:

note: minimum 50 3pt FGAs defended

As we see here, Paul George has gotten lucky with his opponents missing about 11 points on open shots. Other notable players who have gotten lucky with their opponents missing more open shots than they should include Steve Nash, Rudy Gay, Avery Bradley, Kevin Durant, Paul Pierce and Jrue Holliday. Tayshaun Prince leads the way with opponents missing about 23 points on open shots.

How about if we divide Luck by field goal attempts? (note: this will adjust for how many shots a player defended) What does Luck/FGA versus Points per Shot look like?

Many of the same players appear in the same places but some high minute players like Kevin Durant and Paul George are closer to average (zero) after the conversion of Luck to a rate statistic.

Let's now look at Contest+ (defined as blocking, altering or contesting a shot). Which players contest the most three point shots? What is the range for Contest+?

To answer the questions above, Willie Green contests the highest percentage of his 3 point shots (80.3%) while Chris Paul contests the lowest percentage of his 3 point shots (34.1%). The range for contest frequency is 46.3% which we can see in the histogram below:


The range in contest+ for defending threes was larger than it was for rim protection rate (37.8%) but we also had a smaller sample for rim protection rate. We can look at the distribution of both rim protection rate and contest+ for defending threes in the histogram below:


Keep in mind, the sample size is smaller for rim protection rate so the distribution will be smaller. However, we can see there were more players with contest+ rates in the 30%-35% bin for shots close to the basket (light green). This is not surprising as the league average contest+ frequency is lower for shots near the basket than for three-point shots.

Finally, let's plot XPPS (expected points per shot) versus actual PPS and see which defenders are under-performing or over-performing their expected points per shot: 

In the graph above, the x- and y-axis scales are the same so we can visualize the magnitude of "luck" or "hidden skill" that many of these defenders have in regard to their actual points per shot versus what would be expected given league average shot defense rates. As we see in the graph above, there is a much wider distribution in actual points per shot versus XPPS.

Let's increase the scale for the x-axis so it is easier to determine the different players:

Tony Allen is the best player at defending threes according to XPPS and his actual points allowed per shot is pretty much the same. We can also tell from the size of the circle that Allen contests a large majority of his three-point attempts which isn't surprising since Allen is regarded as one of the best wing defenders in the league. We can also see that players such as Nicolas Batum and Jeremy Lin (upper left quadrant) have been unlucky (or lack a "hidden skill") with their PPS being much higher than their XPPS. Conversely, we see that JJ Redick and Kyle Lowry (bottom right quadrant) have been lucky (or possess a "hidden skill") with their PPS being much lower than their XPPS.


Note: Due to the odd splits for threes attempted from the right corner, we decided to include pressured shots from the right corner in the luck statistic. In general, pressured shots are considered to be open shots (see this article) but for three-point shots, there is almost no difference between contesting a shot and pressuring it. However that is not the case for shots from the right corner.

Disclaimer: These statistics include the previous two years but not this year (2013-14).

Wednesday, January 15, 2014

Is LeBron Getting Lucky on Defense?

LeBron is widely regarded as the best wing defender in the NBA. But has he actually been getting lucky on defense over the last two years? Wait, LeBron getting lucky on defense? The man who has inspired a video on his chasedown blocks?


That can't be right, can it?

Over at Hickory High, Ian Levy calculates Expected Points per Shot (XPPS) based on shot locations. The calculation is simple: multiply each player's total FGA from each location by the leaguewide expected points per shot at that location, add it up and then divide by the player's total FGA. Levy then adjusts the total by the number of fouls a player draws using the average value of a pair of free throws (1.511). Then he compares this to the players' actual points per shot.

Here at Vantage, we can take this metric one step further. Not only do we have shot location data but we have shot defense data as well. Additionally, we even know who is guarding the shooter. So we can calculate a defender's XPPS. Using this metric, we can see if the defender is getting lucky (or unlucky) with his defense. If the defender is contesting shots and forcing the shooter to take the shot from more difficult shot locations, we'd expect to see a very low XPPS.

Let's look at LeBron's defense and see if he has been lucky. Is he forcing his shooters into tough shots? First, we need to calculate the XPPS for the league.

contested shots:
Shot Location  XPPS         fga
a 1.049048 1733
b 1.035159 3214
c 0.971154 2496
d 0.980715 3215
e 1.065789 1748
f 0.923077 26
g 0.777579 1677
h 0.700235 2125
i 0.781012 5119
j 0.719453 2046
k 0.738318 1712
l 1 12
m 0.47619 21
n 0.76643 2252
o 0.769585 868
p 0.783333 840
q 0.760017 2371
r 0.736842 19
s 1.0625 32
t 0.989032 5197
u 1.0197 5127
v 0.967966 4901
w 0.807692 52
(Note: See location chart at the bottom of this post.)

As we can see in the table, some locations have significantly higher XPPS for contested shots than others. We can do this for each type of shot defense.

Now let's look at LeBron. How has he defended each location?

Location  XPPS           fga
a 1.241379 29
b 0.833333 54
c 1.241379 29
d 1.090909 66
e 1.2 25
g 0.526316 19
h 0.727273 22
i 0.878049 41
j 0.545455 22
k 0.714286 14
n 0.48 25
o 0.857143 14
p 0.571429 14
q 1 18
s 2 1
t 1.125 64
u 1 100
v 0.8 50
w 1 2
Total 0.934319 609
(Note: See location chart at the bottom of this post.)

We can then break this down even further looking at the type of shot defense LeBron has played at each location. 


After calculating LeBron's expected points and actual points for each shot location and type of shot defense (his FGA*points), we add it all up, include expected points on fouls and then divide by total attempts plus missed foul shots. We find that LeBron's XPPS is 1.01 compared to his actual PPS 0.96. 

What does this mean? Our intuition is that LeBron has gotten slightly lucky with his actual points per shot allowed being lower than what you would expect given where and what type of shot defense he played against the opposing player. We suspect the difference isn't significant enough to say he's due for any type of regression.

I plan to do a full analysis of all players in the NBA to ground the intuition with data, and to identify the luckiest and unluckiest defenders in the NBA. Stay tuned.

Note: And 1's were given an additional 0.755 value to their XPPS since the value of two FT attempts is 1.51. So for 1 FT attempt, the value would be exactly half. As an example, the league had 634 fouled And 1's in the left low post. Since these shots are always made, their XPPS is going to be 2. However, a free throw attempt follows making the shot, so that is given the value of 0.755 so the XPPS for And 1's in the left low post would be 2.755.


Tuesday, January 7, 2014

Who Protects the Basket?

Over the last few months, Omer Asik has been involved in lots of trade discussion. However, the question to most fans is why would a team want to trade for a player that has career averages of 5 points and 7 rebounds? The analytically savvy may be aware of Asik's worth on defense, but how do you quantify it? Here's another question: with 2 bigs who are so-called rim protectors, which one of the Rockets' big men is actually better? Before screaming "Dwight," continue reading below. 

In a previous article, we looked at the value of contesting shots. However, rim protection entails more than just contesting shots. We need to look at how often the defender blocks shots as well. Together, by combining a player's contest/altered frequency with his blocks frequency, we can develop a rim protection rate statistic. So who is protecting the basket?

We can look at this a few ways. First, rather than just presenting a table of information, let's look at rim protection in the form of a heat map.

We see some interesting if not completely surprising names here. Brandan Wright has long been a per minute star, averaging a 21.2 Player Efficiency Rating over the last 2 years. He also finished with a 2.2 xRAPM last year and was on the plus side in the previous year. After missing the first 23 games with an injury, Wright is back and playing well. However, the assumption with Wright has long been that he's too thin to muscle with the big boys down low. Yet, despite his skinny frame, he has a long wingspan that is able to help him block an above average amount of shots and contest whatever he doesn't block. If Wright is able to stay healthy, he's a player to keep an eye on.

We see that Asik is one of the better rim protectors as well. In fact, we see that he protects the rim at a better rate AND allows a lower FG% at the rim on those contested shots than Dwight Howard. As has been mentioned in many rumors surrounding Asik, the Rockets are actually better defensively when he is on the court then when Dwight is. Does this mean Houston is trading away the wrong center? That would be taking it a bit far -- Dwight is still much better offensively and while he hasn't been as good as Asik at protecting the rim, he isn't chopped liver either.

Brook Lopez finished with the highest Contest+ so it is not surprising to see him leading the way in rim protection. While he only blocks shots 0.6% more than the average big man (PFs and Cs), he is able to get his hand up and at least contest or alter most shots.

When I first ran the numbers in the previous article for contest frequency, I was surprised to see Serge Ibaka with a relatively low contest/altered frequency. Alas, it was only because he was too busy blocking nearly 25% of the shots attempted against him near the basket!

However, perhaps the biggest surprise on this list is Tyson Chandler, the one time DPOY. With a 45.8% rim protection rate, he comes in as one of the worst players in our sample and worse, he is 8% worse than the average big man which you can see visually with the "cold" blue box.

One thing I'm sure we all noticed when looking at rim protection rate is that many of the names we would expect to see at the top are actually not at the top. So are guys like Roy Hibbert, Tim Duncan, and Marc Gasol not as good as we expect? Is rim protection rate perhaps misleading? Is rim protection FG% actually the better statistic? It's hard to answer the last question without having a larger sample size and observing how both rim protection rate and FG% vary over time. However, we can plot rim protection rate vs. rim protection FG% to see if there are significant differences.

Well that is pretty messy. The correlation coefficient is actually not as bad as the graph looks (-0.26) and the fact that the two statistics are negatively correlated is certainly a good thing -- i.e. as rim protection rate goes up, rim protection FG% should go down. Theoretically at least. Still, it's not a very strong correlation and leaves us asking the question, Which of these metrics is stronger in predicting a team's defensive efficiency? 

We also have one other metric we can look at: close frequency. What is close frequency? It is the percentage of a player's shots defended that came near the basket. We can get an idea of which players are always near the basket and which players wander around a bit more and are guarding shots further away from the hoop. 

In the graph above, the players with larger bubbles are guarding a higher percentage of their shots near the basket. We see that some of the past DPOYs like Marc Gasol, Tyson Chandler, Tim Duncan and perhaps this years DPOY Roy Hibbert do not have the largest bubbles. The percentage of shots they have defended near the basket range from about 60%-65%, which means that about 35%-40% of the shots they have defended are at mid-range or near the 3 point line. Of course, with the 3-seconds-in-the-paint rule, no one can maintain a 100% rate here because ultimately you have to leave the paint. In the future, we will look at the distribution of each player's shots defended along with his defensive usage (Vantage tracks shots defended per chance). We can also use the distribution of shots to develop an expected points per shot metric. 

For now though, we can see which players are near the basket the most (close freq), which players attempt to protect the basket (RP rate) and which players successfully protect the basket (RP FG%).

Finally, as a refresher, it's worth visually seeing what the difference between getting an open shot under the basket versus if that shot is contested:


Almost every player in the league allows a higher FG% when the shot is open versus if it is contested. And we can see a clear difference between the two types of shot defenses.