To: Fall 2005, COMP421 students
From: Prof. Jeff Wiegley
Subject:Project 2 Specifications: Rugs
Date: Monday, November 7th, 2005
Due: Monday, November 28th, 2005
________________________________________________________________________________________________

1 Task

You need to help Prof. Wiegley find the perfect rug for his office. (if he was allowed to have a rug that is. Unfortunately only big-wigs get carpeting in their office. *sigh*...)

There exists an online area rug store called http://www.arearugs2u.com/ that has a wide selection of area rugs but their search abilities are atrocious.

We don’t want to hammer that site though because we aren’t going to actually buy anything. Instead, use

http://rugs.cyte.com/

(Don’t attempt any other shenanigans against this site. It runs other services that other companies depend on for survival. And, on a teacher’s salary, I can only afford pitiful bandwidth.)

You are to write a spider program, written in Perl, that will search through the http://rugs.cyte.com/area-rugs.html site and builds up a list of standard SQL queries that will build and populate a database that can be searched for the perfect rugs.

The schema for the database consists of a single table named “rugs”. The SQL CREATE commands for properly creating this schema is:
    DROP TABLE IF EXISTS rugs;  
    CREATE TABLE IF NOT EXISTS rugs (  
      url VARCHAR(256) NOT NULL,  
      collection VARCHAR(256) NOT NULL,  
      model VARCHAR(256) NOT NULL,  
      thickness INT NOT NULL,  
      density INT NOT NULL,  
      width INT NOT NULL,  
      length INT NOT NULL,  
      cost DECIMAL(7,2) NOT NULL  
    );

For instance, one insert that should be generated from the page

http://rugs.cyte.com/aubusson/493-Cannes.html

would be:
    INSERT INTO rugs  
      (url, collection, model, thickness, density, width, length, cost)  
    VALUES  
      (  
        ’http://rugs.cyte.com/aubusson/493-Cannes.html’,  
        ’Aubusson’, ’493 Cannes’,  
        ’4’, ’5’, ’60’, ’96’, ’817.97’  
      );

More inserts from this page would, of course, be generated. One for each of the sizes available. They would all share the same thickness and density as this information is obtained from the parent page that leads to the example given.

The last SQL command output should be:
  SELECT url FROM rugs  
  WHERE width>’96’ AND length>’120’ AND density>=’4’ AND COST<’600.00’  
  ORDER BY url;

2 Specifications

For a quick tutorial a good site is http://www.perl.com/pub/a/2002/08/20/perlandlwp.html. It is the instructor’s belief that the entire contents of the O’Reilly book “Perl & LWP” is available online for free through O’Reilly’s “Safari” service as long as you access Safari from a campus IP (i.e. use the campus VPN or be on campus).

You’ll probably need to install the LWP Perl modules.

If you are on a Linux distribution then your distribution probably has a package for LWP. (there is another “LWP” package for providing “light weight” processes. Don’t install that! Make sure you are getting the Library for WWW in Perl package.) Under Gentoo, Debian or Ubuntu the libwww-perl package is what you want. Under RedHat/Fedora it is the perl-libwww package.

If you fail to having a pre-built package, Perl has a system called the Comprehensive Perl Archive Network (CPAN). CPAN is an semi-automatic system that is platform independent. It’s sort of like a mini-distribution system that takes care of just Perl in a platform independent sort of way. Most Perl installations have CPAN installed by default so to install LWP you should be able to just do:
    perl -MCPAN -e ’install Bundle::LWP’

Required dependencies and such will be pulled in and installed.

3 Deliverables

Email your files to jeffw@csun.edu

Files should be attached as individual attachments to the message.

Include the string “[COMP421] Project 2” in the subject line.