Retrosheet Game Logs

This weekend I was planning on updating my park factors, in preparation of Part II of “Different Factors For Different Folks”. I downloaded all the new event files from the RetroSheet website, not just 2008, but 1999, 1953 and all the updates and corrections for the other seasons. I then spent more than a few minutes waiting for my query to build a table for Split Data, and when it finished I discovered there was no 2008 data to be found.

I forgot to update the Retroheet game logs as well. With no 2008 entries in my Games table, there could be no matching output from the query. I then went back and downloaded all the game logs, but when I opened one to have a look, it didn’t match my table’s data structure.

There are actually two different versions of game logs available from RetroSheet data, and it’s important to know the differences and how to acess them. And I wasn’t using either one of them, as I will explain later.

When you go to RetroSheet’s data download section, they have game logs with an entry for every one of the 193,866 games played in major league history. There are fields for date, site, home team, away team, etc…but no game ID. The other logs are generated from information contained in the header section of each game in the event files. Ted Turoczy’s Chadwick includes cwgame.exe, which will extract the game info from the event files, and create a record for each game for which there are events on file. There is a field for game ID, plus a lot of fields the game logs don’t contain. However, the date field doesn’t have the “19” or “20” listed at the beginning of the year field – once event files for games from the 1800’s are available, they will have same dates as games 100 years later in the 1900’s.

Last year, I recognized some of these problems, and stitched the two files together to get the games table I have been using. Now I want to update my table, but neither of the original source files matches my table. Shovel the front walk off for the wife, go for a two mile walk around the neighborhood, then shower up, and I had the answer, which I will share in order that you don’t have to repeat my past mistakes.

When you have files created by others that you periodically wish to import and update, receive them in a table which exactly matches their existing data structure. Then, anytime in the future, you can append records, or overwrite the whole thing when you need to correct or update. Tom Tango has written the code for a blank CWGAME table at his wiki. I couldn’t find an equivalent for the Retro game logs, so I had to do it myself. There are 161 fields, and documentation at the RetroSheet website had a description for each. Where the fields were in common with CWGAME, I used the same name and datatype as Tango. Where the fields were unique, I was on my own, but I tried to follow the same naming conventions that Tango established. Also, there are 139 different files in the game logs, one for each season. If you don’t want to do 139 imports, at a command prompt type “copy gl*.txt gamelogs.txt”. This will copy all the files starting with “gl” and ending in “.txt” into one file, which is then the one you will import.

Once you have any original source files imported as-is, you might decide that they are not arranged in the way you would do it. Stuff isn’t normal. There are fields that aren’t necessary. You want some things from one table, and some from the other. When you want to manipulate the data, use a query. That way, when you come back to do the same thing next year, you only need to rerun the query and overwrite last year’s result, instead of comparing the files and trying to remember what you did the last time.

The Retro game logs contain a record for every game ever played in the major leagues. CWGAME outputs a record for every game where there are events on file. I would prefer to establish a one-to-one relationship between these two tables, so that CWGAME is an extension to the Game Logs, providing additional information where available.

GAME_ID is the primary key for CWGAME, but it does not exist as a seperate field in the Retro game logs. It must be created as HOME_TEAM_ID+GAME_DT+GAME_CT_CD. If this key is created only as part of a query, I do not know how to link the tables in a LEFT JOIN manner which would return all the fields in Game Logs and the matching fields in CWGAME. My solution is to use a make table query to create a modified version of the Game Logs, which is then linked to CWGAME.

Neither of the games tables has a field for year, which makes life much easier when coding many common queries. CWGAME can not be used, as it has a truncated date field which does not have left left two characters of the year. The Game Logs do have a full date. In Access I used the functions Left(GamesLOG.GAME_DT,4) AS [YEAR], Mid(GamesLOG.GAME_DT,5,2) AS [MONTH], Right(GamesLOG.GAME_DT,2) AS [DAY] to parse the field into three.

Once GAME_ID and YEAR have been added to a midified Game Logs table, it is ready to go, with CWGAME linked to it using the GAME_ID as a foreign key. Remember to use LEFT JOIN instead of INNER JOIN, so that you can return all records from Game Logs and the matching recods from CWGAME. If you wish, you can further modify the Game Logs table, mixing and matching the fields from the two input tables in whatever combination you wish, perhaps eliminating the need for a JOIN.

The Data Definition (DDL) code for CWGAME is at Tom Tango’s site (scroll down to the Games table). To use in Access, change “varchar” to “text”. Here is my DDL code for the Game Logs table in MySQL and Access format. To run the DDLs in Access, open a new query in Design View. Do not add any tables. Switch to SQL view. Paste in the DDL code, then save under a name of your choosing. To create a blank table, double click on the DDL query you’ve created.

 

Advertisements

10 Responses to Retrosheet Game Logs

  1. Chris says:

    I know this is OT, but I was wondering if I could email you a couple of questions about your Oliver projections? I couldn’t find your email address anywhere on this site.

  2. John says:

    Whenever I try loading the column headers into MySQL im getting
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘VARCHAR2(8) NOT NULL
    ,GAME_CT_CD VARCHAR2(1) NOT NULL
    ,GAME_DY VARCHAR2(3)
    ,A’ at line 3

  3. Brian Cartwright says:

    My bad. I didn’t test the MySQL code, turns out I was using an Oracle template.

  4. john says:

    Do you have any idea what I would have to change to get it to work with MySQL?

  5. Colin Wyers says:

    My GAMES table right now looks like this:
    DROP TABLE IF EXISTS `games`;
    CREATE TABLE `games` (
    `seq_games` int(11) NOT NULL DEFAULT ‘0’,
    `GAME_ID` varchar(12) DEFAULT NULL,
    `YEAR_ID` int(4) NOT NULL DEFAULT ‘0’,
    `GAME_DT` int(11) DEFAULT NULL,
    `GAME_CT` int(11) DEFAULT NULL,
    `GAME_DY` varchar(9) DEFAULT NULL,
    `START_GAME_TM` int(11) DEFAULT NULL,
    `DH_FL` varchar(1) DEFAULT NULL,
    `DAYNIGHT_PARK_CD` varchar(1) DEFAULT NULL,
    `AWAY_TEAM_ID` varchar(3) DEFAULT NULL,
    `HOME_TEAM_ID` varchar(3) DEFAULT NULL,
    `PARK_ID` varchar(5) DEFAULT NULL,
    `AWAY_START_PIT_ID` varchar(8) DEFAULT NULL,
    `HOME_START_PIT_ID` varchar(8) DEFAULT NULL,
    `BASE4_UMP_ID` varchar(8) DEFAULT NULL,
    `BASE1_UMP_ID` varchar(8) DEFAULT NULL,
    `BASE2_UMP_ID` varchar(8) DEFAULT NULL,
    `BASE3_UMP_ID` varchar(8) DEFAULT NULL,
    `LF_UMP_ID` varchar(8) DEFAULT NULL,
    `RF_UMP_ID` varchar(8) DEFAULT NULL,
    `ATTEND_PARK_CT` int(11) DEFAULT NULL,
    `SCORER_RECORD_ID` varchar(50) DEFAULT NULL,
    `TRANSLATOR_RECORD_ID` varchar(50) DEFAULT NULL,
    `INPUTTER_RECORD_ID` varchar(50) DEFAULT NULL,
    `INPUT_RECORD_TS` varchar(18) DEFAULT NULL,
    `EDIT_RECORD_TS` varchar(18) DEFAULT NULL,
    `METHOD_RECORD_CD` varchar(18) DEFAULT NULL,
    `PITCHES_RECORD_CD` varchar(1) DEFAULT NULL,
    `TEMP_PARK_CT` int(11) DEFAULT NULL,
    `WIND_DIRECTION_PARK_CD` int(11) DEFAULT NULL,
    `WIND_SPEED_PARK_CT` int(11) DEFAULT NULL,
    `FIELD_PARK_CD` int(11) DEFAULT NULL,
    `PRECIP_PARK_CD` int(11) DEFAULT NULL,
    `SKY_PARK_CD` int(11) DEFAULT NULL,
    `MINUTES_GAME_CT` int(11) DEFAULT NULL,
    `INN_CT` int(11) DEFAULT NULL,
    `AWAY_SCORE_CT` int(11) DEFAULT NULL,
    `HOME_SCORE_CT` int(11) DEFAULT NULL,
    `AWAY_HITS_CT` int(11) DEFAULT NULL,
    `HOME_HITS_CT` int(11) DEFAULT NULL,
    `AWAY_ERR_CT` int(11) DEFAULT NULL,
    `HOME_ERR_CT` int(11) DEFAULT NULL,
    `AWAY_LOB_CT` int(11) DEFAULT NULL,
    `HOME_LOB_CT` int(11) DEFAULT NULL,
    `WIN_PIT_ID` varchar(8) DEFAULT NULL,
    `LOSE_PIT_ID` varchar(8) DEFAULT NULL,
    `SAVE_PIT_ID` varchar(8) DEFAULT NULL,
    `GWRBI_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP1_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP1_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP2_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP2_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP3_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP3_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP4_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP4_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP5_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP5_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP6_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP6_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP7_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP7_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP8_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP8_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_LINEUP9_BAT_ID` varchar(8) DEFAULT NULL,
    `AWAY_LINEUP9_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP1_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP1_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP2_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP2_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP3_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP3_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP4_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP4_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP5_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP5_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP6_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP6_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP7_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP7_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP8_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP8_FLD_CD` int(11) DEFAULT NULL,
    `HOME_LINEUP9_BAT_ID` varchar(8) DEFAULT NULL,
    `HOME_LINEUP9_FLD_CD` int(11) DEFAULT NULL,
    `AWAY_FINISH_PIT_ID` varchar(8) DEFAULT NULL,
    `HOME_FINISH_PIT_ID` varchar(8) DEFAULT NULL
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY LIST (YEAR_ID) (PARTITION p1 VALUES IN (1953) ENGINE = MyISAM, PARTITION p2 VALUES IN (1954) ENGINE = MyISAM, PARTITION p3 VALUES IN (1955) ENGINE = MyISAM, PARTITION p4 VALUES IN (1956) ENGINE = MyISAM, PARTITION p5 VALUES IN (1957) ENGINE = MyISAM, PARTITION p6 VALUES IN (1958) ENGINE = MyISAM, PARTITION p7 VALUES IN (1959) ENGINE = MyISAM, PARTITION p8 VALUES IN (1960) ENGINE = MyISAM, PARTITION p9 VALUES IN (1961) ENGINE = MyISAM, PARTITION p10 VALUES IN (1962) ENGINE = MyISAM, PARTITION p11 VALUES IN (1963) ENGINE = MyISAM, PARTITION p12 VALUES IN (1964) ENGINE = MyISAM, PARTITION p13 VALUES IN (1965) ENGINE = MyISAM, PARTITION p14 VALUES IN (1966) ENGINE = MyISAM, PARTITION p15 VALUES IN (1967) ENGINE = MyISAM, PARTITION p16 VALUES IN (1968) ENGINE = MyISAM, PARTITION p17 VALUES IN (1969) ENGINE = MyISAM, PARTITION p18 VALUES IN (1970) ENGINE = MyISAM, PARTITION p19 VALUES IN (1971) ENGINE = MyISAM, PARTITION p20 VALUES IN (1972) ENGINE = MyISAM, PARTITION p21 VALUES IN (1973) ENGINE = MyISAM, PARTITION p22 VALUES IN (1974) ENGINE = MyISAM, PARTITION p23 VALUES IN (1975) ENGINE = MyISAM, PARTITION p24 VALUES IN (1976) ENGINE = MyISAM, PARTITION p25 VALUES IN (1977) ENGINE = MyISAM, PARTITION p26 VALUES IN (1978) ENGINE = MyISAM, PARTITION p27 VALUES IN (1979) ENGINE = MyISAM, PARTITION p28 VALUES IN (1980) ENGINE = MyISAM, PARTITION p29 VALUES IN (1981) ENGINE = MyISAM, PARTITION p30 VALUES IN (1982) ENGINE = MyISAM, PARTITION p31 VALUES IN (1983) ENGINE = MyISAM, PARTITION p32 VALUES IN (1984) ENGINE = MyISAM, PARTITION p33 VALUES IN (1985) ENGINE = MyISAM, PARTITION p34 VALUES IN (1986) ENGINE = MyISAM, PARTITION p35 VALUES IN (1987) ENGINE = MyISAM, PARTITION p36 VALUES IN (1988) ENGINE = MyISAM, PARTITION p37 VALUES IN (1989) ENGINE = MyISAM, PARTITION p38 VALUES IN (1990) ENGINE = MyISAM, PARTITION p39 VALUES IN (1991) ENGINE = MyISAM, PARTITION p40 VALUES IN (1992) ENGINE = MyISAM, PARTITION p41 VALUES IN (1993) ENGINE = MyISAM, PARTITION p42 VALUES IN (1994) ENGINE = MyISAM, PARTITION p43 VALUES IN (1995) ENGINE = MyISAM, PARTITION p44 VALUES IN (1996) ENGINE = MyISAM, PARTITION p45 VALUES IN (1997) ENGINE = MyISAM, PARTITION p46 VALUES IN (1998) ENGINE = MyISAM, PARTITION p47 VALUES IN (1999) ENGINE = MyISAM, PARTITION p48 VALUES IN (2000) ENGINE = MyISAM, PARTITION p49 VALUES IN (2001) ENGINE = MyISAM, PARTITION p50 VALUES IN (2002) ENGINE = MyISAM, PARTITION p51 VALUES IN (2003) ENGINE = MyISAM, PARTITION p52 VALUES IN (2004) ENGINE = MyISAM, PARTITION p53 VALUES IN (2005) ENGINE = MyISAM, PARTITION p54 VALUES IN (2006) ENGINE = MyISAM, PARTITION p55 VALUES IN (2007) ENGINE = MyISAM, PARTITION p56 VALUES IN (2008) ENGINE = MyISAM) */;
    YEAR_ID is derived from SUBSTR(GAME_ID,4,4) – I normally load a temp table, INSERT the YEAR_ID into there, and then dump into the main table. PARTITIONING the table will greatly increase performance, but should not be done on the temp table.

  6. Brian Cartwright says:

    What Colin has is a merger of the CWGAME and the Game Logs. As he said, load each as is into temp tables, then run a query to build the working games table.

  7. john says:

    Thanks guys

  8. tieldigerse says:

    I am unable to understand this post. But well some points are useful for me.

  9. WP Themes says:

    I think your blog need a new wordpress template. Downalod it from http://genericwpthemes.com . The site has nice and unique wordpress templates.

%d bloggers like this: