This program fetches TV program data from web sources, and converts this data into a slice file which can be loaded into a TiVo. At present, it can only obtain Australian TV program data, but it can be modified to fetch and parse data from other sources.
The program can output program data in the binary slice format which the TiVo requires. Alternatively, it can output the data in a textual form which can then be converted into a slice file using tridge's writeguide program.
It's a good idea to read this README file from start to end, especially the bits about the files that get regenerated during each run. Things may not work if you don't set them up correctly.
3.5: The Yahoo, Austar and Foxtel fetchers have been crippled due to a legal ``cease & desist'' notice. A kudge was added to determine if a program is a movie when it is not explicitly tagged as such. A bug was fixed which was preventing the Live and Premiere fields from working.
3.4: Modified the NoNag variable to let any other station provide data for the NoNag channel. Added code to store a stationday database, so that we can overwrite old data with new data on the TiVo. Fixed bug that was dropping `Movie' genre from movies. Added new PlusN psueudo-fetcher. Added better support for repeats, captions, live programs, premieres and series finals. Added a few more Yahoo genres. Strings comparisons now ignore some punctuation and whitespace to improve matches with bad web data.
3.4-beta: Several new fetchers have been written: ABC, Foxtel, Austar, Yahoo. Several bugs have been fixed in the new fetchers as well.
3.3: A transitional version which never made it out into the wild.
3.2: Fixed bug in TiVo/EbroadcastFetcher.pm which was preventing the per-channel genre from being set. Added a few more Yahoo genres to TiVo/GenreSearch.pm. Modified TiVo/SliceBuilder.pm so that several runs of a movie all get the same series number. This groups them all under the one entry in Search By Title. Added several more Foxtel genres to TiVo/FoxtelFetcher.pm, and I now remove the Foxtel genre words from the program's description.
3.1: First release of new version.
These are now documented in a separate file.
Wktivoguide is a set of four programs, a bunch of configuration files and directories that hold intermediate files. Each program represents one of the four phases of operation.
The first phase is to fetch and parse the web data for the slice(s) that you want to build. At present there are web fetchers for five websites in Australia. Other fetchers would be reasonably easy to write, and the existing fetchers could be used as a template. The fetch phase reads a list of TV channels from a configuration file called websources, and writes a set of intermediate data files into the Data directory. These files contain only the information that is required to build the slices. The fetch phase also stores the raw web pages fetched into the Webfiles directory; if you re-run the fetch phase and there are web pages in the Webfiles directory, no web connections will occur and the files in the Data directory will be recreated from the information in the Webfiles directory.
One problem with the data found by the web search is that it often does not give us an idea of the genre of the programs found (Movie, Comedy, Drama etc.). To overcome this, Wktivoguide keeps a database of programs and their genres, which is called the programs file or the programs database. The second, optional, phase is to read through the files in the Data directory, compare the programs there with those stored in the programs file, and determine which programs don't have a known genre.
For every unknown program, this phase sends a query out to an Australian Yahoo website to see if the program's genre can be determined. The result (known or not found) is stored back into the programs file. The trimtitles file is also consulted: this is used to trim common things from the front of a program's title (e.g. Special:, Movie of the Week:, Drama: etc.)
With the basic program data in the Data directory, this optional phase can add extra data to these files. It will add in the genre of the program, the star rating and director if the program is a movie, and it will use heuristics to determine an episode name. The results are stored into a second directory called Data2.
In the last phase, the program data that we have gathered and augmented will be converted into slice format. If the Data2 directory exists, this phase will use the data files from this directory. If only the Data directory exists, then this phase will load the data from there, and perform Phase Three on the fly. The result is the final slice file which can be loaded into the TiVo.
The Wktivoguide program comes as a front-end Bourne shell script called doit.pl. This calls the four Perl scripts: fetch_data, genre_search, add_data and make_slice. These make use of a number of back-end Perl modules in the TiVo/ directory. You will need a Unix machine with Perl 5 installed and you will also need to install the HTTP::Lite and CGI::Enurl Perl modules.
The software is designed to be unpacked and run in the directory where you unpacked it. Later on, I'll describe what to do if you want to generate multiple slices in the same directory.
Get HTTP::Lite from any CPAN mirror. I found it at: ftp://mirror.aarnet.edu.au/pub/perl/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/HTTP. At the time of writing, the latest version is HTTP-Lite-2.1.6.tar.gz. Download the latest tarball somewhere. Then do:
% tar vxzf HTTP-Lite-2.1.6.tar.gzI hope that works. I normally use the Perl CPAN auto-fetching support to install new Perl modules.
% cd HTTP-Lite-2.1.6
% perl Makefile.PL
% make
% su root
# make install
If you plan to use the genre_search phase, or if you plan to run the command ./doit.pl search, then you will also need to install the CGI::Enurl Perl module. It is installed in the same way as HTTP::Lite. I found it at ftp://mirror.aarnet.edu.au/pub/perl/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/CGI.
Wktivoguide has a top-level configuration file, and a number of other files which store a database of information between runs of the program. The top-level configuration file is called .guiderc and should be placed in the directory where you unpacked Wktivoguide. There is an example .guiderc configuration file in the distribution.
Note: The scripts doit.pl, fetch_data, genre_search, add_data and make_slice all have a -c option that allows you to specify configuration files other than .guiderc. This is useful if you want to generate multiple different slice files using the software.
The .guiderc configuration file controls the basic behaviour of the program, and also defines where the run-time files are and where to store the intermediate guide data. Lines beginning with a hash are ignored; so are blank lines. Configuration lines look like variable = value. Here is a brief description of all variables; for more information see the .guiderc file that came with the distribution.
Variable | Used For | Default Value |
Configdir | Directory where the other config files live | ./Files |
Datadir | Directory where intermediate guide data is stored by fetch_data | ./Data |
Extradir | Directory where intermediate guide data is stored by add_data | ./Data2 |
Webdir | Directory where raw web data is stored | No default |
Webproxy | Web proxy for downloads | Not used |
WebDelay | Delay in seconds between web fetches | 0 |
RepeatWebConnect | Try to get web data this many times | Not used |
NoNag | If set to any value, stop the TiVo from nagging | Not defined |
Outputformat | The format of the output file | slice |
Slicefile | Name of the slice output file | ./output.slice |
Textfile | Name of the text output file | ./output.txt |
Tempdir | The directory used to hold temporary files | /tmp |
Debugfile | File that holds debugging output | No default |
The Configdir variable gives the directory name where the run-time files are stored. You will need to create this directory before you can run the program, and also populate it with the run-time files. Example run-time files are given in the Files/ directory. We will look at these files later.
Phase one of Wktivoguide (i.e. fetch_data) downloads web data and distills it into an intermediate guide format. The data in the intermediate format is stored in the directory named by the Datadir variable. You will need to create this directory before you can run the program. You should also make the directory named by the Webdir variable.
If you choose to run phase three of Wktivoguide (i.e. add_data), then you will need to create the directory named by the Extradir variable.
Wktivoguide will automatically clean out all of these three directories by removing files older than 14 days.
Details for all the other configuration parameters can be found in the .guiderc file that came with the distribution.
Wktivoguide uses the following run-time files which are kept in the Configdir directory. Some example files are provided in the Files/ directory. Warning: You will need to change most of these files.
This file lists the program categories (also known as genres) which the TiVo knows. Do not edit this file; treat it as read-only.
This file sets a genre category to a specific station. When data for a station is fetched, and a program has no genre, then the program gets the station's genre if it is defined. Here is the top of the file that is distributed:
ADV1 Action,ActionAdventureGroup ANIMAL Animals,Documentary,DocumentaryGroup ANT International BBC News,NewsBusinessWeather BIOG Biography,DocumentaryGroup BLM News,NewsBusinessWeather BOOM Children,ChildrensGroup CART Animated,Children,ChildrensGroup CLUBV Music,ArtsMusicLiving CMDY Comedy,ComedyGroup
The name of the station is the same as is defined in the websources file below. The genre list is a comma-separated list of genre names from the genretypes file. The two columns are tab-separated. You should not have to edit this file.
This file holds the names of the stations known to Wktivoguide, the TiVo internal station-id for the station, and the timezone where this station will be received. The 3 columns are tab-separated. Lines starting with hashes are ignored. I am using:
# List of stations and the internal TiVo id TVQ46 Station/1/7111010 Australia/Brisbane ABC-QLD Station/1/7111002 Australia/Brisbane BTQ52 Station/1/7111007 Australia/Brisbane QTQ58 Station/1/7111009 Australia/Brisbane SBS-QLD Station/1/7111028 Australia/Brisbane #TenGC Station/1/7111055 Australia/Brisbane #Prime Station/1/7111064 Australia/Brisbane #NBN Station/1/7111067 DoNotShift #AV Station/1/7111099 Australia/Brisbane NoNag Station/1/7111099 Australia/Brisbane
The TiVo internal station-ids can be found using the method described on this web page.
The timezone is specified as it appears in most Linux or Unix boxes in /usr/share/zoneinfo.
Note: Sometimes the provider of a channel does its own time delay, so that a program appears at the same local time in all timezones, e.g. a cable channel. If this is the case, set the timezone to the magic word ``DoNotShift''. This will preserve the original times for all programs on that channel, i.e. no timezone shifting will be done. You would use the entry for NBN as shown above.
This file holds the list of stations again, what mechanism to use to get them, the timezone where the transmitter operates, and some portion of a URL which is used to fetch the data. At present, there are five fetchers for downloading data: `Foxtel', `ABC', `Yahoo', `Ebroadcast' and `Austar'. As before, the columns are tab-separated and lines beginning with hashes are ignored.
Here is the file I use. Note that the NoNag channel must not be defined in this file. Note also that there can be many more stations defined in this file than in the stations file.
# List of web sources for each station defined in the stations file. # Columns are: station name, web source, timezone, and any # specific URL information that we need to use to get the data. # HALL Foxtel Australia/Sydney HAL # Hallmark Channel MOV1 Foxtel Australia/Sydney MV1 # Movie One ABC-QLD Ebroadcast Australia/Brisbane 2&state=Brisbane&fta=1 SBS-QLD Ebroadcast Australia/Brisbane SBS&state=Brisbane&fta=1 BTQ52 Ebroadcast Australia/Brisbane 7&state=Brisbane&fta=1 QTQ58 Ebroadcast Australia/Brisbane 9&state=Brisbane&fta=1 TVQ46 Ebroadcast Australia/Brisbane 10&state=Brisbane&fta=1
This file holds numbers which have to be unique for certain records in the final TiVo slice. This file is only used by the make_slice phase. I'm using:
# List of incremental numbers Series: 100053808 Program: 140171812 StationDay: 300005812 Slice: 394
You could probably reset them to 100000000, 200000000, 300000000 and a small number like 50. Note that once you use a number, you can't reuse it. I had to pick numbers I knew were not used by tridge's SOFCOM.slices, but if you have a factory fresh TiVo, then I guess you could use any numbers.
This file holds the known program names and a list of genre identifiers separated by commas. The two columns are tab-separated. You should use at least one broad category name, but you can use any of the names given in the genretypes file. We now have a communal programs file available at ftp://minnie.tuhs.org/tivo/guidefiles/programs. E-mail me if you want to help us to maintain this file and fix the Unknown genre problem!
Note on matching program names. For most titles, an exact match is done between the web title and the title in the programs file. If no matches are found, the web program is considered to be Unknown. However, if the two letters |P are at the end of a title in the programs file, then partial matches are permitted. For example, assume that the programs file holds these lines:
About Us|P Documentary,DocumentaryGroupand these raw titles arrive from the web data:
Absolutely Fabulous Comedy,ComedyGroup
About Us: The Life of John CleeseThe first program will be matched, and ``The Life of John Cleese'' will be used as the episode title. The second title will be matched, but the third title will be treated as an Unknown program.
Absolutely Fabulous
Absolutely Fabulous on Stage
You should not need to edit this file, as Wktivoguide builds this file for you automatically. In fact, you might want to download the communal program file before you run Phase 2 or Phase 3, i.e the searching of genres or the adding of the extra information to the intermediate data.
This file holds patterns which occur in program titles that are superfluous, and can be trimmed from the titles. For example, channel Ten often prepends the pattern `Movie: ' to its movie titles. By putting this pattern in here, Wktivoguide will remove it from the title when doing the conversion. Each line contains a genrelist which Wktivoguide will use in case the remaining title is not found in the programs file, and an optional word `Never' to indicate that the program is never an episode of a series. You should not have to edit this file, unless you see program titles that clearly could be trimmed.
This file holds the program names which are known to be episodic; if a program name appears in here, then it must also appear in the programs file. For programs which are named in this file, you can ask the TiVo to record a `Season Pass'. Warning: Do not edit this file, as Wktivoguide builds this file for you automatically.
Hint: It's a good idea to periodically backup all of these configuration files; if they ever get lost you might be able to go back to a known point in time.
These are the files you should edit only once: numbers, stations, websources. You should then download a new copy of the programs file from ftp://minnie.tuhs.org/tivo/guidefiles/programs. Then you should delete the episodes file, as this will be created for you. You can leave the other files (genretypes and trimtitles) as they are.
Each phase will rewrite different files:
The programs database can be kept either in a file (Files/programs) or in a MySQL database. If you want to keep it as a file, then make a copy of the ProgramDbFile.pm module in the TiVo/ directory:
% cd TiVo; cp ProgramDbFile.pm ProgramDb.pmIf you want to keep the database in MySQL, copy this module instead:
% cd TiVo; cp ProgramDbSQL.pm ProgramDb.pmThe SQL database schema is documented in ProgramDbSQL.pm.
Once you have configured your .guiderc and the numbers, stations and websources files, you are ready to try running the program. Most web sources have 7 days of data, so you can run the program pretty much any time.
Make sure the Webdir, Datadir and Extradir directories that you have chosen exist. Then run the doit.pl shell script to fetch some guide data:
% ./doit.pl fetchThis should give some lines indicating that data is being retrieved, and the result will be a collection of files in the Datadir which holds the guide data in intermediate format. The Webdir should also have the raw web files.
Once the fetch phase is successful, you can search Yahoo for new genres:
% ./doit.pl searchWith that complete, you can augment your intermediate data:
% ./doit.pl addFinally, you can create your slice:
% ./doit.pl makeThe result is a new output.slice file which holds a week's worth of data. You can also do ./doit.pl both to both fetch the guide data from the web and produce the output.slice file (i.e. phases one and four). You can also do ./doit.pl both to run all four phases.
The doit.pl script works out the list of days to fetch, and then calls fetch_data, genre_search, add_data or make_slice. You can call these Perl scripts by hand. To run fetch_data, do:
% fetch_data [-c configfile] station start_day end_dayOptions are:
-c configfile Use the named configuration file instead of .guidercThe station is the name of the station from column 1 of the websources file. Alternatively, you can use the word `All', and the program will operate on all the stations named in the websources file.
The start day and end day are day numbers relative to today, e.g 0 is today, 1 is tomorrow. You can specify negative as well as positive day numbers.
You run genre_search and add_data in exactly the same way as for fetch_data.
To run make_slice, you can do:
% make_slice [-n] [-c configfile] station start_day end_dayOptions are:
-c configfile Use the named configuration file instead of .guidercThe station option this time refers to a station from the stations file, or the word `All' to mean all stations in the stations file.-n Don't rewrite the numbers file.
Typically you would run fetch_data to get the web data; then you would do a genre_search; then you would edit the programs file to hand-edit any Unknown entries; then you would do an add_data; finally you would run make_slice to generate the slice file. e.g.
% ./fetch_data All 0 6which is basically what the doit.pl script does.
% ./genre_search All 0 6
% vi Files/programs
% ./add_data All 0 6
% ./make_slice All 0 6
Although you can update this file by hand, it is probably best to download the latest communal programs file, which is available at ftp://minnie.tuhs.org/tivo/guidefiles/programs.
However, if you have done a fetch_data and hopefully a genre_search, you may still find some programs with no known genre in the programs file. Use the genre categories in the genretypes file to replace or improve the entries in this file. And if you see a program with obvious episode names like these:
2002 FIFA WORLD CUP - FRANCE vs URUGUAY Unknownthen remove the episode name and replace the multiple lines with one line ending in |P, for example:
2002 FIFA WORLD CUP - GERMANY vs REPUBLIC OF IRELAND Unknown
2002 FIFA WORLD CUP - JAPAN vs BELGIUM Unknown
2002 FIFA WORLD CUP - KNOCKOUT ROUND - GROUP E vs GROUP B Unknown
2002 FIFA WORLD CUP - MEXICO vs ITALY Unknown
2002 FIFA WORLD CUP - PEOPLE'S REPUBLIC OF CHINA vs COSTA RICA Unknown
2002 FIFA WORLD CUP - POLAND vs USA Unknown
2002 FIFA WORLD CUP - PORTUGAL vs REPUBLIC OF KOREA Unknown
2002 FIFA WORLD CUP - REPUBLIC OF KOREA vs POLAND Unknown
2002 FIFA WORLD CUP - RUSSIA vs TUNISIA Unknown
A COUNTRY PRACTICE: ALL FIRED UP - PART 1 Unknown
A COUNTRY PRACTICE: ALL FIRED UP - PART 2 Unknown
A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 1 Unknown
A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 2 Unknown
A COUNTRY PRACTICE: RAKING OVER THE ASHES - PART 1 Unknown
CHICAGO HOPE: A COUPLA' STIFFS Unknown
CHICAGO HOPE: EVERY DAY A LITTLE DEATH Unknown
CHICAGO HOPE: FROM SOUP TO NUTS Unknown
CHICAGO HOPE: FULL MOON Unknown
CHICAGO HOPE: HELLO GOODBYE Unknown
CHICAGO HOPE: LEAVE OF ABSENCE Unknown
CHICAGO HOPE: RISE FROM THE DEAD Unknown
HIGHER GROUND What Remains Unknown
2002 FIFA WORLD CUP|P SportsGroupNote: If you do edit your own programs file, then please help us out by uploading your changes to the communal programs file. E-mail me below, or join the OzTiVo Twiki if you want to help us to maintain this file and fix the Unknown genre problem.
A COUNTRY PRACTICE|P DramaGroup
CHICAGO HOPE|P DramaGroup
HIGHER GROUND|P Unknown
Each week, the TiVo nags you about being nearly out of program data. Wktivoguide can overcome the nag problem. It does so by generating data for an unused channel for 7 days ahead of now. Thus, the TiVo thinks it has 7 more days of guide data, and it won't nag you.
To make use of this feature, you need to create a channel on the TiVo that you will never use. Just choose a channel number which you can't receive, and create a TiVo station using that frequency. Then in the TiVo setup, make it a channel that you don't receive.
Next, put an extra line in the Files/stations file which has a station named `NoNag'. Put in the station-id which identifies this channel in the TiVo. For an example, see the stations file above.
Finally, define the NoNag variable in the .guiderc file; any value will do.
Each week, Wktivoguide will re-use data from a real channel, but set the date for 14 days in the future and create slice data for the NoNag channel. Thus, the TiVo will be fooled into thinking that it always has program guide data.
Once you have generated your weekly output.slice, you now need to move the resulting slice file over to your TiVo. Consult your local TiVo community to find out how to do this for your type of TiVo and the version of its system.
If you want to see slice entries with star ratings for movies, then go to the Alternate Interfaces section of the Internet Movie Database, scroll down to the Plain Text Data Files section, and download the file called ratings.list.gz. While you are there, also download the file directors.list.gz.
The format for these files is very awkward to parse, so Wktivoguide keeps the data in these files in its own format. To convert from IMDB format to Wktivoguide format, do the following:
% Misc/cvt_ratings ratings.list.gz > ratings.tivo % Misc/cvt_directors directors.list.gz > directors.tivo % gzip -9 ratings.tivo directors.tivo % mv ratings.tivo.gz directors.tivo.gz Files/
The new files ratings.tivo.gz and directors.tivo.gz should be moved into the Configdir directory (i.e. where programs and trimtitles are kept). When you now make a slice, Wktivoguide will read both files, and add movie star ratings and director information to your slices where possible.
Note: The TiVo only has up to 4 stars instead of the usual 5, so Wktivoguide code does a remapping from 5 stars to 4 stars. As a simple guide, here is what you should see:
If you are rolling your own slices 100%, then you won't need to read this section. If you plan to roll some of your own slices, but want to stay compatible with the OzTiVo Emulator slices, then you should read this section.
The difficulty here is that once the series/program/stationday/slice/episode-id numbers are allocated, they cannot be re-used. So if you build your own slices it tends to be difficult if not impossible to switch back to using the OzTiVo Emulator slices.
Fortunately, the OzTiVo Emulator builds so many slices each week that the numbers there will increment faster than your numbers. So a possible solution is to:
Probably plenty. Let me know if you find any.
I do some conversion of program ratings to what the TiVo expects, but it could be further refined. I try to separate the list of actors in a program from the program's description, but it is only heuristic.
If you have any questions or comments, please e-mail me at wkt@tuhs.org.
Warren Toomey, April 2004.