Starting with the 2011 MLB season, Diamond Baseball's official projection disk has been produced using Dan Szymborski's ZiPS projections.
Note about ZiPS at Baseball Think Factory (BBTF)
ZiPS projections are Dan Szymborski's computer-based projections of performance. Performances have not been allocated to predicted playing time in the majors - many of the players listed above are unlikely to play in the majors at all in 2009. ZiPS is projecting equivalent production - a .240 ZiPS projection may end up being .280 in AAA or .300 in AA, for example. Whether or not a player will play is one of many non-statistical factors one has to take into account when predicting the future. Players are listed with their most recent teams unless Dan has made a mistake. This is very possible as a lot of minor-league signings are generally unreported in the offseason. ZiPS is projecting based on the AL having a 4.46 ERA and the NL having a 4.41 ERA. Players that are expected to be out due to injury are still projected. More information is always better than less information and a computer isn't what should be projecting the injury status of, for example, a pitcher with Tommy John surgery. Positional offense is ranked by RC/27 and divided into quintiles based on what the most frequent starting players at each position did in 2007-2009. Excellent is the top quintile, Very Good the 2nd quintile and so on.
• ZiPS FAQ (2008); BBTF link. --As noted, starting with the 2011 season, Dan Szymborski will be producing the official projection disk for Diamond Mind Baseball, where the disk will be available for purchase.
Dan Szymborski of BaseballThinkFactory.org puts ZiPS out annually. They're based on three or four years of weighted data depending on a player's age and he uses various 'growth and decline' curves based on the type of player. 'I don't try to find particularly similar players but instead large groups with similar characteristics, such as K rate for pitchers, Speed Score for batters, [batting average on balls in play] BABIP for batters, handedness, and a lot of other stuff.' Pitching projections do take DIPS theory into account by not only regressing BABIP toward the mean but also by taking into account handedness, knuckleballs, and groundball-to-fly ball ratios. It's worth noting that ZiPS does not attempt to project playing time and of the four projection systems, it has the most players with 995 batters and 989 pitchers, many of whom have yet to play in the majors. [from SI.com]
It's a question of math. If I take the 500 players that have the longest careers that I did not give splits to, I would expect 2009 platoon splits to be closer to generic platoon splits than platoon splits generated from their careers for 79% of players. The odds that percentage drops below 50% are 1-in-86, so I'm pretty confident that I'm likely to be closer with more players. Jed Lowrie has a better chance of hitting 45 home runs than his platoon splits from 2009 being, over a full season, as large as they were in 2008. I'm sure you wouldn't take his projection seriously if I projected Lowrie to hit 45 home runs, so why would you like it if I gave him splits that were even less likely to be an accurate representation of his abilities? It's probably a philosophical difference that we can't resolve, but I don't know what's so fun about codifying randomness into fact. Carlos Zambrano threw no-hitters in 3.3% of his starts last season; would you really want there to be a 3.3% chance of him throwing a no-hitter in his DMB starts? It's false precision. I know when I play a projection disk, I want to be dealing with the same questions that a manager would face. Projected platoon splits for the majority of players, because they're turning randomness into a reality, provide a layer of exploitation that nobody would have available to them in real life.
-- 2011 ZiPS, now with (regressed) platoon splits for all:
"What's changed? I've added projected platoon splits for everyone. These are heavily regressed (as platoon splits ought to be). There's also about a dozen new players, guys who are on 40-man rosters and are projectable (sorry, Bryce Harper fans), like Trystan Magnuson and Joe Paterson. These dozen players do not have projected splits as they were last minute additions."
"... Not so much a shift in my thinking, but a more rigorous model. Before, I was simply doing players over a certain threshold. This year, I've included splits regressed towards generic (the underlying fact that generic platoon splits are better predictors of future platoon splits than actual platoon splits remains true). I wrote a generalized regression model so there's more shading. Instead of either/other, a player with 1200 professional PA will have his platoon splits move most of the way to generic while a player with 3000 PA will be about 50/50 while guys like Jim Thome will have platoon splits that more reflect their actual than generic. Now that I have the ability to get all those platoon splits onto the disk (well, Luke does), I can do what I believe is the best possible solution. Short-term extreme platoon splits are still going to be ironed out for obvious reasons (we're looking forward, not back) and some are going to be unhappy there's less opportunity to exploit these short-term extreme platoon splits, but I feel this is the most intellectually honest way to do it. Getting switch-hitters right was the hard part - I had to get the splits for every switch-hitter going back to the start of the retrosheet era to develop a probabilistic model since you don't have anything easy to regress towards."
Defensive and other "subjective" ratings in ZiPS Projection Disks-- From old DMB Forum:
"I use a combination of UZR, Dial's LWZR, and PMR, wherever available, with scouting reports breaking close ties between tiers. For minor leagues, scouting is a little more important because of the lack of quality defensive data, and that's combined with a minor league DR estimator from play-by-play data from Jeff Sackmann (both Sean Smith and I made our own, almost identical systems, in November '07 when Sackmann had stopped calculating his). I tend to be very conservative at assigning defensive ratings. At first, only Pujols gets an EX for range and no other position has more than 4 given out."
-- Also from Dan Szymborski:
"I evaluate the defensive ratings every single season. I'm conservative about arm rating as arms don't really change all that quickly unless there's an injury. For the defensive ratings, I use a combination of three year Dial and Lichtman ZR translations and +/- with scouting reports 'breaking ties' and for minor leaguers, a combination of scouting reports and a rough ZR I developed from PBP data from Jeff Sackmann. For the running and bunting, ZiPS actually spits out the tiers for me with the projection. I use a modified speed score for the running rating and apply EX/VG/AV/FR/PR divided among the population in percentages of 10/20/40/20/10, as with jump. I use a mix of SB% and jump to calculate steal success rates, simply because I don't want a bunch of PR jumpers with EX steal."
-- SG on differences between official DMB projection disks (through 2008) and ZiPS:
"One of the biggest differences that I am aware of between the two projection systems is that ZiPS uses Voros McCracken's controversial DIPS theory when projecting pitchers. DIPS basically focuses on a pitcher's strikeouts, walks, and home runs allowed, and assumes that their control of hits on balls in play(non-homer hits and outs) is minimal. Tom Tippet of Diamond Mind did his own research on this theory, and concluded that pitchers "have more influence over in-play hit rates than McCracken suggested", so he uses a pitcher's hits allowed totals in his projections. ZiPS also uses comparisons with similar players in building its projections, whereas Diamond Mind uses a Marcel type projection system which only focuses on what a player himself has done. I am also pretty sure that ZiPS is harsher to older players than Diamond Mind....
"ZiPS projections are Dan Szymborski's computer-based projections of performance. Performances have not been allocated to predicted playing time in the majors - many of the players listed above are unlikely to play in the majors at all in 2009. ZiPS is projecting equivalent production - a .240 ZiPS projection may end up being .280 in AAA or .300 in AA, for example. Whether or not a player will play is one of many non-statistical factors one has to take into account when predicting the future. Players are listed with their most recent teams unless Dan has made a mistake. This is very possible as a lot of minor-league signings are generally unreported in the offseason. ZiPS is projecting based on the AL having a 4.46 ERA and the NL having a 4.41 ERA. Players that are expected to be out due to injury are still projected. More information is always better than less information and a computer isn't what should be projecting the injury status of, for example, a pitcher with Tommy John surgery. Positional offense is ranked by RC/27 and divided into quintiles based on what the most frequent starting players at each position did in 2007-2009. Excellent is the top quintile, Very Good the 2nd quintile and so on."
- from Baseball Think Factory (BBTF) [/stextbox] • ZiPS FAQ (2008); BBTF link. --As noted, starting with the 2011 season, Dan Szymborski will be producing the official projection disk for Diamond Mind Baseball, where the disk will be available for purchase. [stextbox id="custom" bgcolor="f4f4f4"]
"Dan Szymborski of BaseballThinkFactory.org puts ZiPS out annually. They're based on three or four years of weighted data depending on a player's age and he uses various 'growth and decline' curves based on the type of player. 'I don't try to find particularly similar players but instead large groups with similar characteristics, such as K rate for pitchers, Speed Score for batters, [batting average on balls in play] BABIP for batters, handedness, and a lot of other stuff.' Pitching projections do take DIPS theory into account by not only regressing BABIP toward the mean but also by taking into account handedness, knuckleballs, and groundball-to-fly ball ratios. It's worth noting that ZiPS does not attempt to project playing time and of the four projection systems, it has the most players with 995 batters and 989 pitchers, many of whom have yet to play in the majors." [from SI.com]
[/stextbox]
Beginning with the 2011 MLB season, Diamond Baseball's official projection disk has been produced using Dan Szymborski's ZiPS projections.
-- SG on differences between official DMB projection disks (through 2008) and ZiPS: [stextbox id="custom" bgcolor="f4f4f4"]
"One of the biggest differences that I am aware of between the two projection systems is that ZiPS uses Voros McCracken's controversial DIPS theory when projecting pitchers. DIPS basically focuses on a pitcher's strikeouts, walks, and home runs allowed, and assumes that their control of hits on balls in play(non-homer hits and outs) is minimal. Tom Tippet of Diamond Mind did his own research on this theory, and concluded that pitchers "have more influence over in-play hit rates than McCracken suggested", so he uses a pitcher's hits allowed totals in his projections. ZiPS also uses comparisons with similar players in building its projections, whereas Diamond Mind uses a Marcel type projection system which only focuses on what a player himself has done. I am also pretty sure that ZiPS is harsher to older players than Diamond Mind...."
------------------------------------------------------------- -- Dan Szymborski on ZiPS and platoon splits: [stextbox id="custom" bgcolor="f4f4f4"]
"It's a question of math. If I take the 500 players that have the longest careers that I did not give splits to, I would expect 2009 platoon splits to be closer to generic platoon splits than platoon splits generated from their careers for 79% of players. The odds that percentage drops below 50% are 1-in-86, so I'm pretty confident that I'm likely to be closer with more players. Jed Lowrie has a better chance of hitting 45 home runs than his platoon splits from 2009 being, over a full season, as large as they were in 2008. I'm sure you wouldn't take his projection seriously if I projected Lowrie to hit 45 home runs, so why would you like it if I gave him splits that were even less likely to be an accurate representation of his abilities? It's probably a philosophical difference that we can't resolve, but I don't know what's so fun about codifying randomness into fact. Carlos Zambrano threw no-hitters in 3.3% of his starts last season; would you really want there to be a 3.3% chance of him throwing a no-hitter in his DMB starts? It's false precision. I know when I play a projection disk, I want to be dealing with the same questions that a manager would face. Projected platoon splits for the majority of players, because they're turning randomness into a reality, provide a layer of exploitation that nobody would have available to them in real life."
[/stextbox] -- 2011 ZiPS, now with (regressed) platoon splits for all: [stextbox id="custom" bgcolor="f4f4f4"]
"What's changed? I've added projected platoon splits for everyone. These are heavily regressed (as platoon splits ought to be). There's also about a dozen new players, guys who are on 40-man rosters and are projectable (sorry, Bryce Harper fans), like Trystan Magnuson and Joe Paterson. These dozen players do not have projected splits as they were last minute additions."
....
"... Not so much a shift in my thinking, but a more rigorous model. Before, I was simply doing players over a certain threshold. This year, I've included splits regressed towards generic (the underlying fact that generic platoon splits are better predictors of future platoon splits than actual platoon splits remains true). I wrote a generalized regression model so there's more shading. Instead of either/other, a player with 1200 professional PA will have his platoon splits move most of the way to generic while a player with 3000 PA will be about 50/50 while guys like Jim Thome will have platoon splits that more reflect their actual than generic. Now that I have the ability to get all those platoon splits onto the disk (well, Luke does), I can do what I believe is the best possible solution. Short-term extreme platoon splits are still going to be ironed out for obvious reasons (we're looking forward, not back) and some are going to be unhappy there's less opportunity to exploit these short-term extreme platoon splits, but I feel this is the most intellectually honest way to do it. Getting switch-hitters right was the hard part - I had to get the splits for every switch-hitter going back to the start of the retrosheet era to develop a probabilistic model since you don't have anything easy to regress towards."
[/stextbox]
Defensive and other "subjective" ratings in ZiPS Projection Disks -- From old DMB Forum: [stextbox id="custom" bgcolor="f4f4f4"]
"I use a combination of UZR, Dial's LWZR, and PMR, wherever available, with scouting reports breaking close ties between tiers. For minor leagues, scouting is a little more important because of the lack of quality defensive data, and that's combined with a minor league DR estimator from play-by-play data from Jeff Sackmann (both Sean Smith and I made our own, almost identical systems, in November '07 when Sackmann had stopped calculating his). I tend to be very conservative at assigning defensive ratings. At first, only Pujols gets an EX for range and no other position has more than 4 given out."
[/stextbox]
-- Also from Dan Szymborski: [stextbox id="custom" bgcolor="f4f4f4"]
"I evaluate the defensive ratings every single season. I'm conservative about arm rating as arms don't really change all that quickly unless there's an injury. For the defensive ratings, I use a combination of three year Dial and Lichtman ZR translations and +/- with scouting reports 'breaking ties' and for minor leaguers, a combination of scouting reports and a rough ZR I developed from PBP data from Jeff Sackmann. For the running and bunting, ZiPS actually spits out the tiers for me with the projection. I use a modified speed score for the running rating and apply EX/VG/AV/FR/PR divided among the population in percentages of 10/20/40/20/10, as with jump. I use a mix of SB% and jump to calculate steal success rates, simply because I don't want a bunch of PR jumpers with EX steal."