Next: 4.2 Invoking the parser
 Up: 4. The Parser
 Previous: 4. The Parser
     Contents 
     Index 
4.1 Description file
Before we take a closer look at the parser itself, we will describe
the format of the description file also known as the Home Page document
(default home.html, but that can be changed). On a Unix/Linux
system this file will be stored by default in $HOME/.plucker.
OS/2 will use the environment-variable HOME to find the
location of your home-directory (you can also use drive letters).
The installer should set the necessary environment variable for
you and also add the necessary directories to your system. You
may check the location by simply typing set home at a
command prompt.
The description file is a valid HTML document with extra optional
tags added for the link references.
- MAXDEPTH=n: This specifies how
 deep the parser should follow the links embedded in a web page.
 If MAXDEPTH is not given the parser will default to a depth
 of 1, that is only download the page itself but do not follow any
 links in it. To follow links in the current page you would use
 MAXDEPTH=2 and to follow links also in those pages you would
 use MAXDEPTH=3 and so on. Too high values without using any
 of the available filtering mechanisms could result in an excessive
 amount of data.
Hint:  MAXDEPTH=2 can be very useful if you have a page that
 contains only the headlines that are links to the full text version
 of the articles. Many newstickers use this format.
 
- NOIMAGES: If you are not interested
 in downloading images
 then you use this tag. If specified all images will be replaced with
 the ALT-tag for the image if available, otherwise [img].
Hint:  NOIMAGES is an effective way to decrease the size of
 databases.
 
- STAYONHOST: Most web sites contains
 references to both locally stored articles and to articles stored on
 other hosts. Using a MAXDEPTH of 2 or higher could result in a
 lot of unwanted data.  To prevent this you may specify the STAYONHOST
 tag for your link. The parser will now only download content that resides
 on the same server as the one that contained the top page. Together with
 exclusionlist.txt this is a quite handy way to prevent the download
 of links referred to by banners.
 
- STAYBELOW=text: Similar to
 STAYONHOST this tag tells the parser to only fetch pages that
 start with text.  For example, it could be used if the
 articles on a page are listed on another server in which case
 STAYONHOST would not work properly.  Or you can grab certain
 articles out of a large listing so you would get all headlines but
 only articles referring to specific subjects (provided the web server
 offering the information is set up correctly).
NOTE: If =text is not given, it will default to the content
 of the href-attribute (the URL you are pointing to).
 
- BPP=n: This option is used to specify
 the bit depth that should be used for images. Valid values are 0 (i.e.
 no images), 1, 2, 4, and 8.
NOTE:  BPP=8 is currently only supported when the parser is used
 on a Windows system.
 
- MAXWIDTH=width: Used to set the maximum
 width of images.
 
- MAXHEIGHT=height: Used to set the maximum
 height of images.
 
An simple example of a description file is:
<HTML>
<HEAD>
  <TITLE>Plucker Home Page</TITLE>
</HEAD>
<BODY>
  <A HREF="http://plucker.gnu-designs.com" MAXDEPTH=2 STAYONHOST NOIMAGES>Plucker home page</A>
</BODY>
</HTML>
This would download the front page of our web site and also follow any
links on the page if they are local to the host. No images would be
downloaded.
The description file (home.html) that is installed when your
Plucker directory is set up, also contains a few examples.
 
 
 
 
 
 Next: 4.2 Invoking the parser
 Up: 4. The Parser
 Previous: 4. The Parser
     Contents 
     Index 
The Plucker Team