Warren Toomey's Homegrown TiVo Slice Generator

Version 1.76, June 2003. (c) Warren Toomey, BSD license.

This program fetches TV program data from web sources, and converts this data into a slice file which can be loaded into a TiVo. At present, it can only obtain Australian TV program data, but it can be modified to fetch and parse data from other sources.

The program can output program data in the binary slice format which the TiVo requires. Alternatively, it can output the data in a textual form which can then be converted into a slice file using tridge's writeguide program. The binary slice format output is still considered a bit experimental, but it seems to work fine.

It's a good idea to read this README file from start to end, especially the bits about the files that get regenerated during each run. Things may not work if you don't set them up correctly.

RECENT CHANGES

INSTALLATION

The program comes as a front-end Perl script, fetch_one, with a number of back-end modules in the TiVo/ directory. You will need a Unix machine with Perl 5 installed and you will also need to install the HTTP::Lite Perl module. More on that later.

You can either unpack the program and run it in the directory where you unpacked it; that's what I do. Or you can install the modules that are in TiVo/ into your Perl 5 module hierarchy, and then install fetch_one somewhere suitable like /usr/local/bin.

To do the latter, you will need to find out where Perl 5 can keep modules. Run the following command to get the list of possible locations:

% perl -e 'foreach $i (@INC) { print("$in"); }'

/usr/libdata/perl/5.00503/mach

/usr/libdata/perl/5.00503

/usr/local/lib/perl5/site_perl/5.005/i386-freebsd

/usr/local/lib/perl5/site_perl/5.005

Looking at the above output on my system, I would choose to install the modules in /usr/local/lib/perl5/site_perl/5.005. As root, I would do:

# mkdir /usr/local/lib/perl5/site_perl/5.005/TiVo

# cp TiVo/* /usr/local/lib/perl5/site_perl/5.005/TiVo

# chmod 644 /usr/local/lib/perl5/site_perl/5.005/TiVo/*

# cp fetch_one /usr/local/bin

# chmod 755 /usr/local/bin/fetch_one

However, as I mentioned, you can run the software from where you unpacked it. I think that's the easiest way of doing things.

GETTING HTTP::Lite

Get it from any CPAN mirror. I found it at: ftp://mirror.aarnet.edu.au/pub/perl/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/HTTP/HTTP-Lite-0.2.5.tar.gz

Then do:

% tar vxzf HTTP-Lite-0.2.5.tar.gz

% cd HTTP-Lite-0.2.5

% perl Makefile.PL

% make

% su root

# make install

I hope that works. I normally use the Perl CPAN auto-fetching support to install new Perl modules.

TOP-LEVEL CONFIGURATION FILE

fetch_one has a top-level configuration file, and a number of other files which store a database of information between runs of the program. The top-level configuration file called .fetchone can be placed in your current directory, or in your home directory. There is an example .fetchone configuration file in the distribution.

The .fetchone configuration file controls the basic behaviour of the program, and also defines where the run-time files are and where to store the web data. Lines beginning with a hash are ignored; so are blank lines. Configuration lines look like variable = value.

The Basedir variable gives the directory name where the run-time files are stored. You will need to create this directory before you can run the program, and also populate it with the run-time files. Example run-time files are given in the Files/ directory. It's probably easiest to edit the .fetchone file to simply match the fill pathname of this directory.

When fetch_one runs, it caches copies of the incoming web data. The Webdir variable gives the name of the directory where incoming web files are stored. Again, you will need to create this directory before you can run the program.

If you need to use a web proxy, then define the hostname and port number of the proxy with the Webproxy variable.

The Websource variable selects the appropriate web site and html parser for your area. Currently the only option is `sofcom' which retrieves data for Australia from http://www.sofcom.com.au/tv. The software is now modular, so that code for other sources can be written.

As noted previously, fetch_one can output data in binary slice format, or in text format. The Outputformat variable controls this, and it can have the values `slice' or `text'. The normal value here is `slice'.

Here is an example .fetchone configuration file:

Basedir = /usr/home/wkt/TiVo/Wkt_Sofcom/Files

Webdir = /usr/home/wkt/TiVo/Wkt_Sofcom/Webfiles

Websource = sofcom

Outputformat = slice

RUN-TIME FILES

fetch_one uses the following run-time files which are kept in the Basedir directory. Some example files are provided in the Files/ directory. Warning: You will meed to change most of these files.

genretypes: Lists the program categories (also known as genres) which the TiVo knows. Do not edit this file; treat it as read-only.

stations: Holds the names of the stations known to fetch_one, the the TiVo internal station number for the station, and the portion of the webpage URL that identifies this station. The 3 columns are tab-separated. I am using:

# List of stations, the internal TiVo id and the URL to fetch data 

Ten    Station/1/1200001    10&state=Brisbane&fta=1&fox=0&opt=0 

ABC    Station/1/1200002    2&state=Brisbane&fta=1&fox=0&opt=0 

Seven  Station/1/1200003    7&state=Brisbane&fta=1&fox=0&opt=0 

TenGC  Station/1/1200004    10&state=Brisbane&fta=1&fox=0&opt=0 

Nine   Station/1/1200005    9&state=Brisbane&fta=1&fox=0&opt=0 

SBS    Station/1/1200006    SBS&state=Brisbane&fta=1&fox=0&opt=0 

Prime  Station/1/1200007    7&state=Brisbane&fta=1&fox=0&opt=0 

NBN    Station/1/1200008    9&state=Brisbane&fta=1&fox=0&opt=0 

AV     Station/1/1300001    2&state=Brisbane&fta=1&fox=0&opt=0

The TiVo numbers were what I had in my input file to tridge's setheadend script, to configure channels on the TiVo. If you don't know your station numbers, I don't know how to get them. If someone could send me a Tcl script to do this, I would be very grateful. - Warren

To find out the Sofcom URL portion for your channels, go to http://www.sofcom.com.au/tv, select a location for today, and click on `View TV Schedule'. You should see a table with a list of station names down the left-hand side. Click on the ones you want, and grab the appropriate right-hand portion of the new URL from the Location field at the top of your browser.

numbers: Holds numbers which have to be unique for certain things. I'm using:

# List of incremental numbers

Series: 100002281

Program: 200009994

StationDay: 300000316

Slice: 206

You could probably reset them to 100000000, 200000000, 300000000 and a small number like 50. Note that once you use a number, you can't reuse it. I had to pick numbers I knew were not used by tridge's SOFCOM.slices, but if you have a factory fresh TiVo, then I guess you could use any numbers.

programs: Holds the known program names and a list of genre identifiers separated by commas. The two columns are tab-separated. You should use at least one broad category name, but you can use any of the names given in the genretypes file. The genres that I have chosen for the programs I watch (or don't watch) are slightly idiosyncratic, so feel free to alter them if you want.

Note on matching program names. First an exact match is attempted. If this fails, then a partial match on all program names is attempted. So, for example, if Sofcom gives a program name as:

The Simpsons: Homer Alone
then this will be matched by `The Simpsons' if it is in the programs file. So by some judicious partial program names in the programs file, you can match programs better. More details on this file will be given below.

trimtitles: Holds patterns which occur in program titles that are superfluous, and can be trimmed from the titles. For example, channel Ten often prepends the pattern `Movie: ' to its movie titles. By putting this pattern in here, fetch_one will remove it from the title when doing the conversion.

episodes: Holds the program names which are known to be episodic; if a program name appears in here, then it must also appear in the programs file. For programs which are named in this file, you can ask the TiVo to record a `Season Pass'. I only put in here the programs that I do record as Season Passes.

To use this file, just copy in the names of the programs you want to select as `Season Pass' programs, from the programs file. Don't put in any tabs. The output file from the first fetch_one run after this will have special Series records for each series, and will start the episode numbers at 1. The episodes file will be updated to record the unique Series number for the series, and which episode number to use next. Do not edit these numbers, as it will confuse the TiVo, and you may not even be able to load the new slice files.

Hint: It's a good idea to periodically backup all of these files; if they ever get lost you might be able to go back to a known point in time.

The fetch_one program will rewrite these three files each time it runs:

numbers, programs, episodes
You should never have to edit the numbers file apart from setting some initial numbers. The episodes file will be updated as discussed above, and you should only add program names in column 1 in this file. Editing the programs file is discussed below.

RUNNING THE PROGRAM THE EASY WAY

It appears that Sofcom updates their web site on Thursday nights. I have included a short shell script called doit, which you can run to do things semi-automatically. This:

You run this script by going into the Basedir on Friday or Saturday, and doing:

% ./doit fetch

% [ edit the Files/programs file, see below ]

% ./doit make

The result is a new output.slice file which holds a week's worth of data. You can also do ./doit both to both fetch the web pages from Sofcom and produce the output.slice file.

RUNNING THE PROGRAM THE HARD WAY

fetch_one is run manually (or from the doit script) as follows:

% fetch_one [-r] [-w] [-n] [-o output] station day_list [station day_list]
Options are:

-o output Save output in this file. This option is mandatory.

-n Don't update the numbers in the numbers file. This is useful for fetching program data so that you can spot new programs and fix the programs file before you generate the slice.

-w Fetch web data and write it to the Webdir cache.

-r Read web data from the Webdir cache. You should choose one of -w or -r.

The station is the name of the station from column 1 of the stations file. Alternatively, you can use the word All, and the program will operate on all the stations named in the stations file.

The day list is a list of days for which you want to generate a slice. The list is relative to today. You can specify either a single day, e.g 0 is today, 1 is tomorrow, or you can specify a range of days, e.g 0-6 is today up to and including 6 days from now.

Typically you would run fetch_one once to get the web data; then you would edit the programs file; finally you would re-run fetch_one to generate the slice file. e.g

% fetch_one -w -n -o /dev/null All 0-6

% vi Files/programs

% vi Files/episodes

% fetch_one -r -o outfile All 0-6

which is basically what the doit script does. Note the use of -n on the first run; this stops the updating of the sequence numbers. This is not vital, but you might as well use all the sequence numbers you can get!

The -r and -w options are mutually exclusive. If you use -r or -w, the web files saved or read will be stored in the Webdir directory.

UPDATING THE PROGRAMS FILE

NEW: Don't update this file by hand! Instead, help us update a communal file, and then you will be able to download a file with everybody's changes. The communal programs file is available at ftp://minnie.tuhs.org/tivo/programs. E-mail me if you want to help us to maintain this file and fix the Curling genre problem! Back to the old docs ....

When fetch_one doesn't know what genre a program is, it goes off and searches Sofcom for the information. Most of the time, Sofcom gives a pretty rough category, but sometimes it just returns `Other Programs', i.e. we haven't classified it yet. I am using the TiVo genre of `curling' to represent such an unknown genre. So you might end up with entries in the programs file like this:

Africa's Child    Curling

Agony Aunts     Curling

BLOSSOM       Situation

Charmed        Curling,SportsGroup

As you can see, these can be improved on. If you use the doit script to fetch the web pages, then it will start the vi editor for you after the fetch, so that you can immediately edit the programs file to fix some of the Sofcom `guesses' up. Use the categories in the genretypes file to replace or improve the entries in the programs file.

Another thing you will need to do with the programs file is to find and replace all the multiple episode entries with a single entry that the TiVo can use to give you a Season Pass to an on-going series. For example, when you run doit or fetch_one, you might end up with entries like these:

2002 FIFA WORLD CUP - FRANCE vs URUGUAY SportsGroup

2002 FIFA WORLD CUP - GERMANY vs REPUBLIC OF IRELAND    SportsGroup

2002 FIFA WORLD CUP - JAPAN vs BELGIUM  SportsGroup

2002 FIFA WORLD CUP - KNOCKOUT ROUND - GROUP E vs GROUP B       SportsGroup

2002 FIFA WORLD CUP - MEXICO vs ITALY   SportsGroup

2002 FIFA WORLD CUP - PEOPLE'S REPUBLIC OF CHINA vs COSTA RICA  SportsGroup

2002 FIFA WORLD CUP - POLAND vs USA     SportsGroup

2002 FIFA WORLD CUP - POLAND vs USA (Cont.)     SportsGroup

2002 FIFA WORLD CUP - PORTUGAL vs REPUBLIC OF KOREA     SportsGroup

2002 FIFA WORLD CUP - REPUBLIC OF KOREA vs POLAND       SportsGroup

2002 FIFA WORLD CUP - RUSSIA vs TUNISIA SportsGroup

A COUNTRY PRACTICE: ALL FIRED UP - PART 1       DramaGroup

A COUNTRY PRACTICE: ALL FIRED UP - PART 2       DramaGroup

A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 1     DramaGroup

A COUNTRY PRACTICE: NEVER COUNT YER CHOOKS - PART 2     DramaGroup

A COUNTRY PRACTICE: RAKING OVER THE ASHES - PART 1      DramaGroup

CHICAGO HOPE: A COUPLA' STIFFS  DramaGroup

CHICAGO HOPE: EVERY DAY A LITTLE DEATH  DramaGroup

CHICAGO HOPE: FROM SOUP TO NUTS DramaGroup

CHICAGO HOPE: FULL MOON DramaGroup

CHICAGO HOPE: HELLO GOODBYE     DramaGroup

CHICAGO HOPE: LEAVE OF ABSENCE  DramaGroup

CHICAGO HOPE: RISE FROM THE DEAD        DramaGroup

HIGHER GROUND What Remains      Curling

All of these are episodes of programs that form a series, even the HIGHER GROUND entry. The latter is an episode of a weekly program, so there is only one entry in the programs file now. However, next week you will find two entries.

If you leave the entries above as is, you won't see these as episodes of a series, and you won't be able to set up a Season Pass for the series. To fix this, replace these entries with lines that will catch all episodes, e.g

2002 FIFA WORLD CUP  Football,SportsGroup

A COUNTRY PRACTICE:  Soap,DramaGroup

CHICAGO HOPE:        Crimedrama,DramaGroup

HIGHER GROUND       Curling

Because the information from Sofcom doesn't include details about episodes of series, it's up to you to fix the information in the programs file each week. Yes it is a chore, but there is no alternative yet.

CONSTRAINING SEASON PASSES

As fetch_one runs, it creates new series numbers in the `episodes' file, so that you can set up a Season Pass on the TiVo for a series. So let's assume that you have set a Season Pass for a series on your TiVo. But for some reason, you want to skip certain episodes of this series e.g

The `episodes' file can be set up to contain constraints for your Season Passes. All you need to do is to add extra columns in this file which have:

A program which belongs to a Season Pass will only be recorded if all the constraints you have put in actually match. Here are some examples. Firstly note that vertical bars (|) separate columns in this file, and you have to put the extra columns in after fetch_one has set up the series and episode numbers.

Star Trek  | 100014872 | 2 | Seven
will only record Star Trek if it is on channel Seven.

Red Dwarf  | 100035461 | 23 | 2030
will only record Red Dwarf if it starts at 20:30 or 8:30pm. Note that this is the advertised start time, and the fudge factors below do not apply.

The Movie Show  |  100011223 | 15 | Wednesday
will only record The Movie Show on Wednesdays.

TENNIS    | 1000445567 | 102 | Ten | Friday
will only record TENNIS on channel Ten on Fridays, assuming you have such a short title in your `programs' file. This is probably a catch-all title for anything with the word TENNIS in it.

PREVENTING SERIES AND EPISODES

The fetch_one program tries to make everything in the programs file a series, so that you can always set up Season Passes. However, not everything should be a series. To prevent a certain title from being made into a series, put the word Never as an extra field in the episodes file, just like a station name as shown in the constraints section above.

Here is an example of what you will find in the provided run-time files. In the epsiodes file, you will find:

DOCUMENTARY SERIES: | 1 | 1 | Never

DOCUMENTARY SPECIAL | 1 | 1 | Never

DOCUMENTARY/DRAMA: | 1 | 1 | Never

DOCUMENTARY: | 1 | 1 | Never

DRAMA SERIES: | 1 | 1 | Never

DRAMA: | 1 | 1 | Never

MOVIE: | 1 | 1 | Never

MUSIC DOCUMENTARY: | 1 | 1 | Never

Movie: | 1 | 1 | Never

SPECIAL: | 1 | 1 | Never

Special: | 1 | 1 | Never

Any programs which match these titles will not be treated as episodes of a series. Similarly, in trimtitles you will find:

DOCUMENTARY SERIES:

DOCUMENTARY/DRAMA:

DOCUMENTARY:

DRAMA SERIES:

DRAMA:

MUSIC DOCUMENTARY:

Movie:

SPECIAL:

SPORT SPECIAL:

which removes these words from the beginning of a program's title. Finally, in the programs file, you will find:

DOCUMENTARY SERIES:     Documentary,DocumentaryGroup

DOCUMENTARY SPECIAL     Documentary,DocumentaryGroup

DOCUMENTARY/DRAMA:      Documentary,DocumentaryGroup

DOCUMENTARY:    Documentary,DocumentaryGroup

DRAMA SERIES:   Drama,DramaGroup

DRAMA:  Drama,DramaGroup

MOVIE:  Movie,Movies            S0E5

Movie:  Movie,Movies            S0E5

SPECIAL:        Specials

Special:        Specials

which set up blanket genre types for programs which have these words in their titles. Also note that all movies are recorded with an extra 5 minutes at the end, using the S0E5 fudge factor.

FUDGE FACTORS

I find that certain program always start late and end late. It's very annoying to continually miss the end of a program because it didn't start on time. So there is config file called fudge which has two tab-separated columns. The first column holds a program name as per the programs file. The second column in the file allows you to put in start and end time fudge factors. These will alter the start and end times, compared to the values given from the web.

The syntax for the second column is SnnnEnnn, where nnn are integers representing minutes. So the value S3E5 means that the program starts 3 minutes late and ends 5 minutes late.

You can use positive or negative values for the End number. If the program goes over the original end time, fetch_one will adjust the start time of the following program. I would recommend that you stick to positive values for the Start number, as fetch_one won't alter the previous program's end time; therefore, the TiVo will have overlapping programs on the same channel, and I don't know how it deals with that.

Quick examples: a program normally starts at 10pm and ends at 11pm:

STOPPING THE TIVO FROM NAGGING

Each week, the TiVo nags you about being nearly out of program data. fetch_one can overcome the nag problem. It does so by generating data for an unused channel for 3 weeks ahead of now. Thus, the TiVo thinks it has 3 more weeks of guide data, and it won't nag you.

To make use of this feature, you need to create a channel on the TiVo that you will never use. Just choose a channel number which you can't receive, and create a TiVo station using that frequency. Then in the TiVo setup, make it a channel that you don't receive.

Next, put an extra line in the Files/stations file which has a station named `NoNag'. Put in the Station ID which identifies this channel in the TiVo, and in the last column put the name of a real station that you watch, e.g

# List of stations, the internal TiVo id and the URL to fetch data

Ten             Station/1/7111010       10&state=Brisbane&fta=1&fox=0&opt=0

ABC             Station/1/7111002       2&state=Brisbane&fta=1&fox=0&opt=0

Seven           Station/1/7111007       7&state=Brisbane&fta=1&fox=0&opt=0

#TenGC          Station/1/7111055       10&state=Brisbane&fta=1&fox=0&opt=0

Nine            Station/1/7111009       9&state=Brisbane&fta=1&fox=0&opt=0

SBS             Station/1/7111028       SBS&state=Brisbane&fta=1&fox=0&opt=0

#Prime          Station/1/7111064       7&state=Brisbane&fta=1&fox=0&opt=0

#NBN            Station/1/7111067       9&state=Brisbane&fta=1&fox=0&opt=0

#AV             Station/1/7111099       2&state=Brisbane&fta=1&fox=0&opt=0

NoNag           Station/1/7111099       Ten

Each week, fetch_one will re-use the Channel Ten data, but set the date for 3 weeks in the future and create slice data for the NoNag channel. Thus, the TiVo will be fooled into thinking that it always has program guide data.

LOADING SLICE FILES

NOTE: This is out of date for the 2.5.x systems. Use dbload25 by itself to do all the work.

Once you have generated your weekly output.slice, you now need to move the resulting slice file over to your TiVo. I use rsync as described elsewhere in the TiVo Hack FAQ, and place files in /var/hack. Assuming that I have just copied output.slice to the TiVo, from the Bash shell there I load this file by doing:

# dbload output.slice

# reindex

Each operation takes several minutes, and overall it can take upwards of 10 minutes to complete. You should also have a look at the Scripts/loadguide script which helps to automate the job.

BUGS AND OMISSIONS

Probably plenty. Let me know if you find any. The script fetch_one should have a more relevant name.

I do some conversion of Sofcom program ratings to what the TiVo expects, but it could be further refined. I try to separate the list of actors in a program from the program's description, but it is only heuristic.

The raw binary slice output generator doesn't produce output which is identical to the text generator. I will be working on this soon.

Because a channel/day's worth of data can only be inserted once on the TiVo, some data from the last day on the command-line will be discarded, i.e the programs from 12am to 6am the next day. If we inserted these programs, then when we fetched the rest of the day from 6am (next time), the TiVo would complain that the channel/day had already been done. However, if you cache the web files with -w, then next week these 6 hours of programs will be included in the next slice file.

If you have any questions or comments, please e-mail me at wkt@tuhs.org.

Warren Toomey, June 2003.


File translated from TEX by TTH, version 2.78.
On 8 Jun 2003, 23:10.