Saturday, December 6, 2008

Comp and Circumstance

Boz has his 1,961st column today about how the Nationals should sign Mark Teixeira. That means it's time for me to put on my hat as the Self-Appointed Ombudsman for Boz (S.O.B.).

My first big problem with this column (and it's a frequent problem with Boz's stuff) is that he blatantly cherry-picks data that backs his point while ignoring all the rest.

Arguing in favor of signing Tex for as long as eight or even ten years (which would make him a National through his age 38 season), he visits (without crediting them at all, which I think some people would call plagiarism, but whatever) and grabs their list of "age-based similar players."

Similarity score is a Bill James concept that basically compares a bunch of stats of every major-leaguer and subtracts for every difference in stats. The players with the most similar stats across the board come up as "most similar." The age-based comps from Baseball Reference does the same thing, except instead of looking at career-long numbers, only looks at the numbers of players though the same age. That way you can get a sense of the career path of that player so far and what similar players have done going forward

There are better ways to do this. Not all stats are of equal predictive value; for instance RBI and batting average tend to fluctuate pretty wildly based on factors outside a hitter's control, while walk rate, strikeout rate, and ISO Power (SLG minus OBP) tend to correlate strongly from year to year. There are some systems that are more sophisticated in factoring in these elements. It would also make sense to factor in some body type data, injury history, and adjust for era. But let's not nit-pick too much. This is a decently useful way to put together a list of ten reasonably comparable players.

The list we get from Baseball-Reference, in order of similarity, is Carlos Delgado, Kent Hrbek, Fred McGriff, Jim Thome, Will Clark, Jeff Bagwell, Willie McCovey, Richie Sexson, Shawn Green, and Paul Konerko.

OK, so what does this tell us? First, you will notice that Boz conveniently chopped off the last three names on this list. Yes, they are marginally "less comparable" based on this metric, but we really shouldn't get too wrapped up in who's "most comparable" versus who's 8th, 9th, or 10th on this list. None of these guys are truly "clones," as Boz says, and there are enough random variables and sample size problems here that it's best to look at the full list and just try to see if any patterns emerge.

So let's take a look:
  • Delgado: Looked like he'd hit the wall in '07 and the first months of '08, but then had a great second half. For the most part has remained an elite player through age 36.
  • Hrbek: Cratered at age 34, out of baseball at age 35.
  • Thome: Has remained an elite offensive player to age 37, but only as a DH. He hasn't been a full-time first-baseman since age 34, a year his numbers cratered, and he got hurt then Pipped by Ryan Howard.
  • McGriff: He was a legitimate starter until age 38, but for the most part he was a below-average first-baseman after age 30.
  • Clark: Was an average to good 1B but not elite after age 30, out of baseball at 37.
  • Bagwell: He had a nice, gradual decline, rather than a cratering, but was below average at 36 and out of baseball at 38.
  • McCovey: Played to age 42, but his last really good season was at age 36. He was terrible more often than good after 37.
  • Sexson: Cratered at age 31, scrap-heaped at age 33.
  • Green: Wasn't any good after age 29; retired at 35.
  • Konerko: Has faded badly after age 30, is probably done as an above-average first-baseman.
In other words, at least four of these career paths (Hrbek, Sexson, Green, Konerko) would be unmitigated disasters. For three (Bagwell, McGriff and Clark), we'd be getting maybe 3-4 years of elite value and then really expensive vanilla after that. And just two of these guys (McCovey and Delgado) provided very good to elite value to age 36, the age Tex would be at the end of an eight-year deal. Thome could be counted as a third in this group, but since he really needed the shift to DH to maintain that performance he belongs in a separate category. None of these guys would have justified a ten-year deal, though I suppose Delgado could still break that mark.

Put another way, the fair conclusion is that there's about a 20% chance of success in signing Tex for eight years. There's a 40% chance of utter disaster. And there's a 30-40% chance of getting decidedly mixed returns.

Does that mean we shouldn't sign Teixeira? No, but it does mean that it's nowhere near the no-brainer that Boz suggests, and that if we do this we should assume that we're going to be really overpaying (and hurting the team) on the back-end years, which is why generally you only do this kind of deal if you think the player will put you over the top right now, in those first 3-4 years when the odds of getting value are still mostly in your favor.

I have one other nit to pick with this column, which is this:
Why should the Nats want him? Because he's the second coming of the Capital Punisher, Frank Howard -- only Dunn's better. Bad teams need drawing cards and credibility as they improve. That'd be Dunn.

In his career, the 6-foot-7, 270-pound Howard hit 40 homers three times. The 6-foot-6, 240-pound Dunn has already had five 40-homer years -- in a row.
OK, people, with all the Hondo love spewing forth these days, and the fact that Boz's one remaining redeeming quality is supposed to be his respect and knowledge for DC baseball history, this is truly an outrageous apostasy, you all should let him have it in the next chat.

First, this 40-HR stat is absurdly misleading; there are park and era adjustments you have to make to understand these stats in context. Hondo hit in RFK and Dodger Stadium in the 60s and 70s. Dunn has hit almost his whole career in the Great American Small Park in the '00s. You're comparing two guys who hit in literally two of the most extreme hitters' and pitchers' environments in baseball history.

The park factor is easy to quantify--just look at what each guy did at home versus on the road. Dunn has 150 homers at home, and 128 away. Hondo while in DC hit 116 homers at home and 121 on the road. That might not sound like a big difference, but if you think about it, that means Dunn is getting about 3-4 extra GASP-inflated HRs a year, while Hondo was losing about one RFK-depressed HR a year. So chalk up a 5-HR difference right there.

And then whether it's the juice or the ball or the ballparks or expansion or what, home runs have been far, far more frequent in Dunn's era than Howard's. From 1965-1971, Hondo's seven years in DC, there were a total of 22 40-homer seasons in MLB, including Howard's three. Conveniently, Dunn also has seven full seasons under his belt. From 2002-2008, Dunn's seven full seasons, there have been 54(!) 40-homer seasons. I was an English major, and even I can see--that's more than twice as many!

Let's put these two factors together--era and park factor. Baseball-Reference has another really neat toy that Boz apparently hasn't figured out how to plagiarize yet--the "neutralize stats" feature. Using it, you can translate every player's numbers into what they would have done in a totally average environment, as well as the most extreme hitters' (2000 Rockies) and pitchers' (1968 Dodgers) environments. Just go to the player page, click on neutralize, and choose the translation you want (Babe Ruth's 1927 season translates into 75 HRs in Colorado '00, Barry Bonds's 2001 season turns into 63 HR in LA '68).

Translated into a dead-average hitting environment, here's Howard in DC:

1965 24 0.315 0.387 0.522
1966 20 0.298 0.371 0.470
1967 42 0.288 0.376 0.571
1968 54 0.316 0.384 0.635
1969 53 0.316 0.425 0.615
1970 49 0.304 0.441 0.588
1971 29 0.300 0.391 0.507

And here's Dunn:

2002 28 0.262 0.418 0.479
2003 30 0.235 0.380 0.504
2004 48 0.276 0.400 0.589
2005 43 0.263 0.408 0.574
2006 41 0.236 0.369 0.497
2007 40 0.264 0.386 0.554
2008 42 0.243 0.395 0.531

Now, as you probably know, Howard's career path was a strange one. For many years before he came to DC and even his first couple years with the Senators, he was a very good but not really great player. But for a window of about three years (1968-1970), Hondo was a truly magnificent player. If Boz wanted to say that Dunn was better in his early- to mid-20s than Howard was at that age, clearly he'd be right. If he said Dunn has been more consistent, he'd have a case.

But Dunn has never been, and I dare say will never be, anywhere close to the dominant offensive force that Hondo was at his best in the era of the "Capital Punisher." Not close.

