We need some fun, too, so let's download a few strips for some well known comics. To simplify things we will use a tool called netcomics to get the comics and then use a local description file to build the database. How to install netcomics is beyond this tutorial, but it is a Perl script and might work on any platform that have Perl support (for Linux users there exists pre-built packages). After you have installed netcomics, you should create a small shellscript called netcomics.sh to be used by the parser,
#!/bin/sh netcomics -D -d /tmp/Comics/ -c "ch dilbert dilbertcl uf" ( cd /tmp/Comics ; \ mv Dilbert-*.gif Dilbert.gif ; \ mv Dilbert_Classics-*.gif Dilbert_Classics.gif ; \ mv Calvin_and_Hobbes-*.gif Calvin_and_Hobbes.gif ; \ mv User_Friendly-*.gif User_Friendly.gif )
On OS/2 and Windows this will look like the follwing. On OS/2 it should be named netcomics.cmd whereas on Windows it should be named netcomics.bat:
perl netcomics.pl -D -d \temp\Comics\ -c "ch dilbert dilbertcl uf" cd \temp\Comics move Dilbert-*.gif Dilbert.gif move Dilbert_Classics-*.gif Dilbert_Classics.gif move Calvin_and_Hobbes-*.gif Calvin_and_Hobbes.gif move User_Friendly-*.gif User_Friendly.gif
This script will download Calvin & Hobbes, Dilbert, Dilbert Classic and UserFriendly to a separate directory (/tmp/Comics/) and rename the date specific files into a general format that can be used in the local description file,
<HTML> <BODY> <H1>Comics Home Page</H1> <A HREF="file:/tmp/Comics/Dilbert.gif">Dilbert</A><P> <A HREF="file:/tmp/Comics/Dilbert_Classics.gif">Dilbert Classic</A><P> <A HREF="file:/tmp/Comics/Calvin_and_Hobbes.gif">Calvin & Hobbes</A><P> <A HREF="file:/tmp/Comics/User_Friendly.gif">UserFriendly</A><P> </BODY> </HTML>
To simplify things even further we will also add a new section for the comics,
[comics] bpp = 4 home_url = plucker:/HTML/comics.html maxwidth = 600 maxheight = 200 db_file = DB/Comics before_command = "netcomics.sh"
NOTE: On OS/2 or Windows you can use the before_command to the set the name of your batch file.
As you can see we have added the shellscript as a command that should
be run before the description file is parsed. Everyday (except
on Sunday when the strips are too large for these options -- we will
show a solution to that later in the section) we now only have to run,
% Spider.py -v -s comics
Executing 'before_command': "netcomics.sh" Working for pluckerdir /home/pilot/.plucker Processing file:/home/pilot/.plucker/HTML/comics.html. 0 collected, 0 still to do Retrieved ok Processing file:/tmp/Comics/Dilbert.gif. 1 collected, 3 still to do Retrieved ok Processing file:/tmp/Comics/Dilbert_Classics.gif. 2 collected, 2 still to do Retrieved ok Processing file:/tmp/Comics/Calvin_and_Hobbes.gif. 3 collected, 1 still to do Retrieved ok Processing file:/tmp/Comics/User_Friendly.gif. 4 collected, 0 still to do Retrieved ok Writing out collected data... Writing db 'Comics' to file /home/pilot/.plucker/DB/Comics.pdb Converted file:/home/pilot/.plucker/HTML/comics.html Wrote 1 <= plucker:/~special~/index Wrote 2 <= file:/home/pilot/.plucker/HTML/comics.html Wrote 3 <= plucker:/~special~/pluckerlinks Wrote 11 <= file:/tmp/Comics/Calvin_and_Hobbes.gif Wrote 12 <= file:/tmp/Comics/Dilbert.gif Wrote 13 <= file:/tmp/Comics/Dilbert_Classics.gif Wrote 14 <= file:/tmp/Comics/User_Friendly.gif Wrote 15 <= plucker:/~special~/links1 Done!
To be able to use it also on Sundays we add yet another section to the configuration file,
[sunday] bpp = 2 maxwidth = 550 maxheight = 400 db_file = DB/SundayComics
Using a lower bit depth for the images we are now able to include
larger versions of the comics. Each Sunday we would run,
% Spider.py -s comics -s sunday
and since the parser applies the sections in the given order the changed values in sunday will override the ones in comics.