Wktivoguide - Warren Toomey's Homegrown TiVo Slice Generator

Version 3.5, June 2004. (c) Warren Toomey, GPL license.

This program fetches TV program data from web sources, and converts this data into a slice file which can be loaded into a TiVo. At present, it can only obtain Australian TV program data, but it can be modified to fetch and parse data from other sources.

The program can output program data in the binary slice format which the TiVo requires. Alternatively, it can output the data in a textual form which can then be converted into a slice file using tridge's writeguide program.

It's a good idea to read this README file from start to end, especially the bits about the files that get regenerated during each run. Things may not work if you don't set them up correctly.

RECENT CHANGES

3.5: The Yahoo, Austar and Foxtel fetchers have been crippled due to a legal ``cease & desist'' notice. A kudge was added to determine if a program is a movie when it is not explicitly tagged as such. A bug was fixed which was preventing the Live and Premiere fields from working.

3.4: Modified the NoNag variable to let any other station provide data for the NoNag channel. Added code to store a stationday database, so that we can overwrite old data with new data on the TiVo. Fixed bug that was dropping `Movie' genre from movies. Added new PlusN psueudo-fetcher. Added better support for repeats, captions, live programs, premieres and series finals. Added a few more Yahoo genres. Strings comparisons now ignore some punctuation and whitespace to improve matches with bad web data.

3.4-beta: Several new fetchers have been written: ABC, Foxtel, Austar, Yahoo. Several bugs have been fixed in the new fetchers as well.

3.3: A transitional version which never made it out into the wild.

3.2: Fixed bug in TiVo/EbroadcastFetcher.pm which was preventing the per-channel genre from being set. Added a few more Yahoo genres to TiVo/GenreSearch.pm. Modified TiVo/SliceBuilder.pm so that several runs of a movie all get the same series number. This groups them all under the one entry in Search By Title. Added several more Foxtel genres to TiVo/FoxtelFetcher.pm, and I now remove the Foxtel genre words from the program's description.

3.1: First release of new version.

CHANGES FROM VERSION 2

These are now documented in a separate file.

OVERVIEW OF OPERATION

Wktivoguide is a set of four programs, a bunch of configuration files and directories that hold intermediate files. Each program represents one of the four phases of operation.

phases.gif

Phase One - Fetching Web Data

The first phase is to fetch and parse the web data for the slice(s) that you want to build. At present there are web fetchers for five websites in Australia. Other fetchers would be reasonably easy to write, and the existing fetchers could be used as a template. The fetch phase reads a list of TV channels from a configuration file called websources, and writes a set of intermediate data files into the Data directory. These files contain only the information that is required to build the slices. The fetch phase also stores the raw web pages fetched into the Webfiles directory; if you re-run the fetch phase and there are web pages in the Webfiles directory, no web connections will occur and the files in the Data directory will be recreated from the information in the Webfiles directory.

Phase Two - Searching for Unknown Genres

One problem with the data found by the web search is that it often does not give us an idea of the genre of the programs found (Movie, Comedy, Drama etc.). To overcome this, Wktivoguide keeps a database of programs and their genres, which is called the programs file or the programs database. The second, optional, phase is to read through the files in the Data directory, compare the programs there with those stored in the programs file, and determine which programs don't have a known genre.

For every unknown program, this phase sends a query out to an Australian Yahoo website to see if the program's genre can be determined. The result (known or not found) is stored back into the programs file. The trimtitles file is also consulted: this is used to trim common things from the front of a program's title (e.g. Special:, Movie of the Week:, Drama: etc.)

Phase Three - Adding Genre and Other Data

With the basic program data in the Data directory, this optional phase can add extra data to these files. It will add in the genre of the program, the star rating and director if the program is a movie, and it will use heuristics to determine an episode name. The results are stored into a second directory called Data2.

Phase Four - Making the Slice(s)

In the last phase, the program data that we have gathered and augmented will be converted into slice format. If the Data2 directory exists, this phase will use the data files from this directory. If only the Data directory exists, then this phase will load the data from there, and perform Phase Three on the fly. The result is the final slice file which can be loaded into the TiVo.

INSTALLATION

The Wktivoguide program comes as a front-end Bourne shell script called doit.pl. This calls the four Perl scripts: fetch_data, genre_search, add_data and make_slice. These make use of a number of back-end Perl modules in the TiVo/ directory. You will need a Unix machine with Perl 5 installed and you will also need to install the HTTP::Lite and CGI::Enurl Perl modules.

The software is designed to be unpacked and run in the directory where you unpacked it. Later on, I'll describe what to do if you want to generate multiple slices in the same directory.

GETTING HTTP::Lite

Get HTTP::Lite from any CPAN mirror. I found it at: ftp://mirror.aarnet.edu.au/pub/perl/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/HTTP. At the time of writing, the latest version is HTTP-Lite-2.1.6.tar.gz. Download the latest tarball somewhere. Then do:

% tar vxzf HTTP-Lite-2.1.6.tar.gz
% cd HTTP-Lite-2.1.6
% perl Makefile.PL
% make
% su root
# make install
I hope that works. I normally use the Perl CPAN auto-fetching support to install new Perl modules.

GETTING CGI::Enurl

If you plan to use the genre_search phase, or if you plan to run the command ./doit.pl search, then you will also need to install the CGI::Enurl Perl module. It is installed in the same way as HTTP::Lite. I found it at ftp://mirror.aarnet.edu.au/pub/perl/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/CGI.

TOP-LEVEL CONFIGURATION FILE

Wktivoguide has a top-level configuration file, and a number of other files which store a database of information between runs of the program. The top-level configuration file is called .guiderc and should be placed in the directory where you unpacked Wktivoguide. There is an example .guiderc configuration file in the distribution.

Note: The scripts doit.pl, fetch_data, genre_search, add_data and make_slice all have a -c option that allows you to specify configuration files other than .guiderc. This is useful if you want to generate multiple different slice files using the software.

The .guiderc configuration file controls the basic behaviour of the program, and also defines where the run-time files are and where to store the intermediate guide data. Lines beginning with a hash are ignored; so are blank lines. Configuration lines look like variable = value. Here is a brief description of all variables; for more information see the .guiderc file that came with the distribution.

VariableUsed ForDefault Value
ConfigdirDirectory where the other config files live./Files
DatadirDirectory where intermediate guide data is stored by fetch_data./Data
ExtradirDirectory where intermediate guide data is stored by add_data./Data2
WebdirDirectory where raw web data is storedNo default
WebproxyWeb proxy for downloadsNot used
WebDelayDelay in seconds between web fetches0
RepeatWebConnectTry to get web data this many timesNot used
NoNagIf set to any value, stop the TiVo from naggingNot defined
OutputformatThe format of the output fileslice
SlicefileName of the slice output file./output.slice
TextfileName of the text output file./output.txt
TempdirThe directory used to hold temporary files/tmp
DebugfileFile that holds debugging outputNo default

The Configdir variable gives the directory name where the run-time files are stored. You will need to create this directory before you can run the program, and also populate it with the run-time files. Example run-time files are given in the Files/ directory. We will look at these files later.

Phase one of Wktivoguide (i.e. fetch_data) downloads web data and distills it into an intermediate guide format. The data in the intermediate format is stored in the directory named by the Datadir variable. You will need to create this directory before you can run the program. You should also make the directory named by the Webdir variable.

If you choose to run phase three of Wktivoguide (i.e. add_data), then you will need to create the directory named by the Extradir variable.

Wktivoguide will automatically clean out all of these three directories by removing files older than 14 days.

Details for all the other configuration parameters can be found in the .guiderc file that came with the distribution.

RUN-TIME FILES AND INITIAL CONFIGURATION

Wktivoguide uses the following run-time files which are kept in the Configdir directory. Some example files are provided in the Files/ directory. Warning: You will need to change most of these files.

genretypes

This file lists the program categories (also known as genres) which the TiVo knows. Do not edit this file; treat it as read-only.

stationgenre

This file sets a genre category to a specific station. When data for a station is fetched, and a program has no genre, then the program gets the station's genre if it is defined. Here is the top of the file that is distributed:

ADV1            Action,ActionAdventureGroup
ANIMAL          Animals,Documentary,DocumentaryGroup
ANT             International
BBC             News,NewsBusinessWeather
BIOG            Biography,DocumentaryGroup
BLM             News,NewsBusinessWeather
BOOM            Children,ChildrensGroup
CART            Animated,Children,ChildrensGroup
CLUBV           Music,ArtsMusicLiving
CMDY            Comedy,ComedyGroup 

The name of the station is the same as is defined in the websources file below. The genre list is a comma-separated list of genre names from the genretypes file. The two columns are tab-separated. You should not have to edit this file.

stations

This file holds the names of the stations known to Wktivoguide, the TiVo internal station-id for the station, and the timezone where this station will be received. The 3 columns are tab-separated. Lines starting with hashes are ignored. I am using:

# List of stations and the internal TiVo id
TVQ46           Station/1/7111010       Australia/Brisbane
ABC-QLD         Station/1/7111002       Australia/Brisbane
BTQ52           Station/1/7111007       Australia/Brisbane
QTQ58           Station/1/7111009       Australia/Brisbane
SBS-QLD         Station/1/7111028       Australia/Brisbane
#TenGC          Station/1/7111055       Australia/Brisbane
#Prime          Station/1/7111064       Australia/Brisbane
#NBN            Station/1/7111067       DoNotShift
#AV             Station/1/7111099       Australia/Brisbane
NoNag           Station/1/7111099       Australia/Brisbane

The TiVo internal station-ids can be found using the method described on this web page.

The timezone is specified as it appears in most Linux or Unix boxes in /usr/share/zoneinfo.

Note: Sometimes the provider of a channel does its own time delay, so that a program appears at the same local time in all timezones, e.g. a cable channel. If this is the case, set the timezone to the magic word ``DoNotShift''. This will preserve the original times for all programs on that channel, i.e. no timezone shifting will be done. You would use the entry for NBN as shown above.

websources

This file holds the list of stations again, what mechanism to use to get them, the timezone where the transmitter operates, and some portion of a URL which is used to fetch the data. At present, there are five fetchers for downloading data: `Foxtel', `ABC', `Yahoo', `Ebroadcast' and `Austar'. As before, the columns are tab-separated and lines beginning with hashes are ignored.

Here is the file I use. Note that the NoNag channel must not be defined in this file. Note also that there can be many more stations defined in this file than in the stations file.

# List of web sources for each station defined in the stations file.
# Columns are: station name, web source, timezone, and any
# specific URL information that we need to use to get the data.
#
HALL    Foxtel  Australia/Sydney        HAL     # Hallmark Channel
MOV1    Foxtel  Australia/Sydney        MV1     # Movie One
ABC-QLD Ebroadcast      Australia/Brisbane      2&state=Brisbane&fta=1
SBS-QLD Ebroadcast      Australia/Brisbane      SBS&state=Brisbane&fta=1
BTQ52   Ebroadcast      Australia/Brisbane      7&state=Brisbane&fta=1
QTQ58   Ebroadcast      Australia/Brisbane      9&state=Brisbane&fta=1
TVQ46   Ebroadcast      Australia/Brisbane      10&state=Brisbane&fta=1

numbers

This file holds numbers which have to be unique for certain records in the final TiVo slice. This file is only used by the make_slice phase. I'm using:

# List of incremental numbers
Series:  100053808
Program: 140171812
StationDay: 300005812
Slice: 394

You could probably reset them to 100000000, 200000000, 300000000 and a small number like 50. Note that once you use a number, you can't reuse it. I had to pick numbers I knew were not used by tridge's SOFCOM.slices, but if you have a factory fresh TiVo, then I guess you could use any numbers.

programs

This file holds the known program names and a list of genre identifiers separated by commas. The two columns are tab-separated. You should use at least one broad category name, but you can use any of the names given in the genretypes file. We now have a communal programs file available at ftp://minnie.tuhs.org/tivo/guidefiles/programs. E-mail me if you want to help us to maintain this file and fix the Unknown genre problem!

Note on matching program names. For most titles, an exact match is done between the web title and the title in the programs file. If no matches are found, the web program is considered to be Unknown. However, if the two letters |P are at the end of a title in the programs file, then partial matches are permitted. For example, assume that the programs file holds these lines:

About Us|P              Documentary,DocumentaryGroup
Absolutely Fabulous      Comedy,ComedyGroup
and these raw titles arrive from the web data:

About Us: The Life of John Cleese
Absolutely Fabulous
Absolutely Fabulous on Stage
The first program will be matched, and ``The Life of John Cleese'' will be used as the episode title. The second title will be matched, but the third title will be treated as an Unknown program.

You should not need to edit this file, as Wktivoguide builds this file for you automatically. In fact, you might want to download the communal program file before you run Phase 2 or Phase 3, i.e the searching of genres or the adding of the extra information to the intermediate data.

trimtitles

This file holds patterns which occur in program titles that are superfluous, and can be trimmed from the titles. For example, channel Ten often prepends the pattern `Movie: ' to its movie titles. By putting this pattern in here, Wktivoguide will remove it from the title when doing the conversion. Each line contains a genrelist which Wktivoguide will use in case the remaining title is not found in the programs file, and an optional word `Never' to indicate that the program is never an episode of a series. You should not have to edit this file, unless you see program titles that clearly could be trimmed.

episodes

This file holds the program names which are known to be episodic; if a program name appears in here, then it must also appear in the programs file. For programs which are named in this file, you can ask the TiVo to record a `Season Pass'. Warning: Do not edit this file, as Wktivoguide builds this file for you automatically.

Hint: It's a good idea to periodically backup all of these configuration files; if they ever get lost you might be able to go back to a known point in time.

SETTING THESE FILES UP FOR THE FIRST TIME

These are the files you should edit only once: numbers, stations, websources. You should then download a new copy of the programs file from ftp://minnie.tuhs.org/tivo/guidefiles/programs. Then you should delete the episodes file, as this will be created for you. You can leave the other files (genretypes and trimtitles) as they are.

Each phase will rewrite different files:

  1. fetch_data: No files will be modified.
  2. genre_search: The programs file is updated.
  3. add_data: No files will be modified.
  4. make_slice: The numbers and episodes files will be updated.

SETTING WKTIVOGUIDE TO USE THE PROGRAMS DATABASE

The programs database can be kept either in a file (Files/programs) or in a MySQL database. If you want to keep it as a file, then make a copy of the ProgramDbFile.pm module in the TiVo/ directory:

% cd TiVo; cp ProgramDbFile.pm ProgramDb.pm
If you want to keep the database in MySQL, copy this module instead:

% cd TiVo; cp ProgramDbSQL.pm ProgramDb.pm
The SQL database schema is documented in ProgramDbSQL.pm.

RUNNING THE PROGRAM THE EASY WAY

Once you have configured your .guiderc and the numbers, stations and websources files, you are ready to try running the program. Most web sources have 7 days of data, so you can run the program pretty much any time.

Make sure the Webdir, Datadir and Extradir directories that you have chosen exist. Then run the doit.pl shell script to fetch some guide data:

% ./doit.pl fetch
This should give some lines indicating that data is being retrieved, and the result will be a collection of files in the Datadir which holds the guide data in intermediate format. The Webdir should also have the raw web files.

Once the fetch phase is successful, you can search Yahoo for new genres:

% ./doit.pl search
With that complete, you can augment your intermediate data:

% ./doit.pl add
Finally, you can create your slice:

% ./doit.pl make
The result is a new output.slice file which holds a week's worth of data. You can also do ./doit.pl both to both fetch the guide data from the web and produce the output.slice file (i.e. phases one and four). You can also do ./doit.pl both to run all four phases.

RUNNING THE PROGRAM THE HARD WAY

The doit.pl script works out the list of days to fetch, and then calls fetch_data, genre_search, add_data or make_slice. You can call these Perl scripts by hand. To run fetch_data, do:

% fetch_data [-c configfile] station start_day end_day
Options are:

-c configfile  Use the named configuration file instead of .guiderc
The station is the name of the station from column 1 of the websources file. Alternatively, you can use the word `All', and the program will operate on all the stations named in the websources file.

The start day and end day are day numbers relative to today, e.g 0 is today, 1 is tomorrow. You can specify negative as well as positive day numbers.

You run genre_search and add_data in exactly the same way as for fetch_data.

To run make_slice, you can do:

% make_slice [-n] [-c configfile] station start_day end_day
Options are:

-c configfile  Use the named configuration file instead of .guiderc

-n  Don't rewrite the numbers file.

The station option this time refers to a station from the stations file, or the word `All' to mean all stations in the stations file.

Typically you would run fetch_data to get the web data; then you would do a genre_search; then you would edit the programs file to hand-edit any Unknown entries; then you would do an add_data; finally you would run make_slice to generate the slice file. e.g.

% ./fetch_data All 0 6
% ./genre_search All 0 6
% vi Files/programs
% ./add_data All 0 6
% ./make_slice All 0 6
which is basically what the doit.pl script does.

UPDATING THE PROGRAMS FILE

Although you can update this file by hand, it is probably best to download the latest communal programs file, which is available at ftp://minnie.tuhs.org/tivo/guidefiles/programs.

However, if you have done a fetch_data and hopefully a genre_search, you may still find some programs with no known genre in the programs file. Use the genre categories in the genretypes file to replace or improve the entries in this file. And if you see a program with obvious episode names like these:

2002 FIFA WORLD CUP - FRANCE vs URUGUAY   Unknown
2002 FIFA WORLD CUP - GERMANY vs REPUBLIC OF IRELAND    Unknown
2002 FIFA WORLD CUP - JAPAN vs BELGIUM  Unknown
2002 FIFA WORLD CUP - KNOCKOUT ROUND - GROUP E vs GROUP B       Unknown
2002 FIFA WORLD CUP - MEXICO vs ITALY   Unknown
2002 FIFA WORLD CUP - PEOPLE'S REPUBLIC OF CHINA vs COSTA RICA  Unknown
2002 FIFA WORLD CUP - POLAND vs USA     Unknown
2002 FIFA WORLD CUP - PORTUGAL vs REPUBLIC OF KOREA     Unknown
2002 FIFA WORLD CUP - REPUBLIC OF KOREA vs POLAND       Unknown
2002 FIFA WORLD CUP - RUSSIA vs TUNISIA  Unknown
A COUNTRY PRACTICE: ALL FIRED UP - PART 1       Unknown
A COUNTRY PRACTICE: ALL FIRED UP - PART 2       Unknown
A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 1     Unknown
A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 2     Unknown
A COUNTRY PRACTICE: RAKING OVER THE ASHES - PART 1      Unknown
CHICAGO HOPE: A COUPLA' STIFFS  Unknown
CHICAGO HOPE: EVERY DAY A LITTLE DEATH  Unknown
CHICAGO HOPE: FROM SOUP TO NUTS  Unknown
CHICAGO HOPE: FULL MOON Unknown
CHICAGO HOPE: HELLO GOODBYE     Unknown
CHICAGO HOPE: LEAVE OF ABSENCE  Unknown
CHICAGO HOPE: RISE FROM THE DEAD        Unknown
HIGHER GROUND What Remains      Unknown
then remove the episode name and replace the multiple lines with one line ending in |P, for example:

2002 FIFA WORLD CUP|P       SportsGroup
A COUNTRY PRACTICE|P       DramaGroup
CHICAGO HOPE|P              DramaGroup
HIGHER GROUND|P            Unknown
Note: If you do edit your own programs file, then please help us out by uploading your changes to the communal programs file. E-mail me below, or join the OzTiVo Twiki if you want to help us to maintain this file and fix the Unknown genre problem.

STOPPING THE TIVO FROM NAGGING

Each week, the TiVo nags you about being nearly out of program data. Wktivoguide can overcome the nag problem. It does so by generating data for an unused channel for 7 days ahead of now. Thus, the TiVo thinks it has 7 more days of guide data, and it won't nag you.

To make use of this feature, you need to create a channel on the TiVo that you will never use. Just choose a channel number which you can't receive, and create a TiVo station using that frequency. Then in the TiVo setup, make it a channel that you don't receive.

Next, put an extra line in the Files/stations file which has a station named `NoNag'. Put in the station-id which identifies this channel in the TiVo. For an example, see the stations file above.

Finally, define the NoNag variable in the .guiderc file; any value will do.

Each week, Wktivoguide will re-use data from a real channel, but set the date for 14 days in the future and create slice data for the NoNag channel. Thus, the TiVo will be fooled into thinking that it always has program guide data.

LOADING SLICE FILES

Once you have generated your weekly output.slice, you now need to move the resulting slice file over to your TiVo. Consult your local TiVo community to find out how to do this for your type of TiVo and the version of its system.

MOVIE STAR RATINGS AND DIRECTORS

If you want to see slice entries with star ratings for movies, then go to the Alternate Interfaces section of the Internet Movie Database, scroll down to the Plain Text Data Files section, and download the file called ratings.list.gz. While you are there, also download the file directors.list.gz.

The format for these files is very awkward to parse, so Wktivoguide keeps the data in these files in its own format. To convert from IMDB format to Wktivoguide format, do the following:

% Misc/cvt_ratings ratings.list.gz > ratings.tivo
% Misc/cvt_directors directors.list.gz > directors.tivo
% gzip -9 ratings.tivo directors.tivo
% mv ratings.tivo.gz directors.tivo.gz Files/

The new files ratings.tivo.gz and directors.tivo.gz should be moved into the Configdir directory (i.e. where programs and trimtitles are kept). When you now make a slice, Wktivoguide will read both files, and add movie star ratings and director information to your slices where possible.

Note: The TiVo only has up to 4 stars instead of the usual 5, so Wktivoguide code does a remapping from 5 stars to 4 stars. As a simple guide, here is what you should see:

The TiVo has half-star increments, so a 3 1/2 star movie is very good but not excellent.

STAYING COMPATIBLE WITH OZTIVO SLICES

If you are rolling your own slices 100%, then you won't need to read this section. If you plan to roll some of your own slices, but want to stay compatible with the OzTiVo Emulator slices, then you should read this section.

The difficulty here is that once the series/program/stationday/slice/episode-id numbers are allocated, they cannot be re-used. So if you build your own slices it tends to be difficult if not impossible to switch back to using the OzTiVo Emulator slices.

Fortunately, the OzTiVo Emulator builds so many slices each week that the numbers there will increment faster than your numbers. So a possible solution is to:

This means that you won't be lagging behind the numbering used in the OzTiVo Emulator slices. The files with numbers in them can be downloaded from http://minnie.tuhs.org/TiVo/files/guidefiles/, and the files are updated each time new slices are made on the Emulator.

BUGS AND OMISSIONS

Probably plenty. Let me know if you find any.

I do some conversion of program ratings to what the TiVo expects, but it could be further refined. I try to separate the list of actors in a program from the program's description, but it is only heuristic.

If you have any questions or comments, please e-mail me at wkt@tuhs.org.

Warren Toomey, April 2004.


File translated from TEX by TTH, version 2.78.
On 30 Jun 2004, 08:58.