| To: | Fall 2005, COMP421 students |
| From: | Prof. Jeff Wiegley |
| Subject: | Project 2 Specifications: Rugs |
| Date: | Monday, November 7th, 2005 |
| Due: | Monday, November 28th, 2005 |
You need to help Prof. Wiegley find the perfect rug for his office. (if he was allowed to have a rug that is. Unfortunately only big-wigs get carpeting in their office. *sigh*...)
There exists an online area rug store called http://www.arearugs2u.com/ that has a wide selection of area rugs but their search abilities are atrocious.
We dont want to hammer that site though because we arent going to actually buy anything. Instead, use
(Dont attempt any other shenanigans against this site. It runs other services that other companies depend on for survival. And, on a teachers salary, I can only afford pitiful bandwidth.)
You are to write a spider program, written in Perl, that will search through the http://rugs.cyte.com/area-rugs.html site and builds up a list of standard SQL queries that will build and populate a database that can be searched for the perfect rugs.
The schema for the database consists of a single table named rugs. The SQL CREATE commands for properly creating this schema is:
DROP TABLE IF EXISTS rugs;
CREATE TABLE IF NOT EXISTS rugs ( url VARCHAR(256) NOT NULL, collection VARCHAR(256) NOT NULL, model VARCHAR(256) NOT NULL, thickness INT NOT NULL, density INT NOT NULL, width INT NOT NULL, length INT NOT NULL, cost DECIMAL(7,2) NOT NULL ); |
For instance, one insert that should be generated from the page
http://rugs.cyte.com/aubusson/493-Cannes.html
would be:
INSERT INTO rugs
(url, collection, model, thickness, density, width, length, cost) VALUES ( ’http://rugs.cyte.com/aubusson/493-Cannes.html’, ’Aubusson’, ’493 Cannes’, ’4’, ’5’, ’60’, ’96’, ’817.97’ ); |
More inserts from this page would, of course, be generated. One for each of the sizes available. They would all share the same thickness and density as this information is obtained from the parent page that leads to the example given.
The last SQL command output should be:
SELECT url FROM rugs
WHERE width>’96’ AND length>’120’ AND density>=’4’ AND COST<’600.00’ ORDER BY url; |
# Early in your program:
use LWP; # Loads all important LWP classes, and makes # sure your version is reasonably recent. my $browser = LWP::UserAgent->new; ... # Then later, whenever you need to make a get request: my $url = ’http://freshair.npr.org/dayFA.cfm?todayDate=current’; my $response = $browser->get( $url ); die "Can’t get $url -- ", $response->status_line unless $response->is_success; die "Hey, I was expecting HTML, not ", $response->content_type unless $response->content_type eq ’text/html’; # or whatever content-type you’re equipped to deal with # Otherwise, process the content somehow: if($response->content =~ m/jazz/i) { print "They’re talking about jazz today on Fresh Air!\n"; } else { print "Fresh Air is apparently jazzless today.\n"; } |
For a quick tutorial a good site is http://www.perl.com/pub/a/2002/08/20/perlandlwp.html. It is the instructors belief that the entire contents of the OReilly book Perl & LWP is available online for free through OReillys Safari service as long as you access Safari from a campus IP (i.e. use the campus VPN or be on campus).
Youll probably need to install the LWP Perl modules.
If you are on a Linux distribution then your distribution probably has a package for LWP. (there is another LWP package for providing light weight processes. Don’t install that! Make sure you are getting the Library for WWW in Perl package.) Under Gentoo, Debian or Ubuntu the libwww-perl package is what you want. Under RedHat/Fedora it is the perl-libwww package.
If you fail to having a pre-built package, Perl has a system called the Comprehensive Perl Archive Network (CPAN). CPAN is an semi-automatic system that is platform independent. Its sort of like a mini-distribution system that takes care of just Perl in a platform independent sort of way. Most Perl installations have CPAN installed by default so to install LWP you should be able to just do:
perl -MCPAN -e ’install Bundle::LWP’
|
Required dependencies and such will be pulled in and installed.
Email your files to jeffw@csun.edu
Files should be attached as individual attachments to the message.
Include the string [COMP421] Project 2 in the subject line.