Tails has asked for the wisdom of the Perl Monks concerning the following question:
I know it's pretty messy, I was going to clean up the REGEX after I figured out how to search through the webpages.#!/usr/bin/perl -w use strict; use warnings; use diagnostics; print "$ARGV[0]\n"; my $first = $ARGV[0]; print "$first This is my First Argument!\n"; my $string = do { local $/; <> }; $string=~ s/[\n\r]//g; $string=~ s/.*(<title>.*?<\/title>).*?(<body.*?<\/body>).*/$1,$2/gsi; $string=~ s/<title>(.*?)<\/title>/$1/gsi; $string=~ s/<body.*?>(.*?)<\/body>/$1/gsi; $string=~ s/24°//gsi; $string=~ s/<!--.*?-->//gsi; $string=~ s/<a.*?<\/a>//sgi; $string=~ s/<form.*?<\/form>//sgi; $string=~ s/<iframe.*?<\/iframe//sgi; $string=~ s/<noscript.*?<\/noscript>//sgi; $string=~ s/<script.*?<\/script>//sgi; $string=~ s/<select .*?<\/select>//sgi; $string=~ s/<textarea.*?<\/textarea>//sgi; $string=~ s/<li.*?<\/li>//sgi; $string=~ s/<IMG.*?>//gsi; $string=~ s/<div.*?>//gsi; $string=~ s/<\/div.*?>//gsi; $string=~ s/<b.*?>|<\/b>//gsi; $string=~ s/<h1.*?>|<\/h1>//gsi; $string=~ s/<h2.*?>|<\/h2>//gsi; $string=~ s/<h3.*?>|<\/h3>//gsi; $string=~ s/<h4.*?>|<\/h4>//gsi; $string=~ s/<h5.*?>|<\/h5>//gsi; $string=~ s/<h6.*?>|<\/h6>//gsi; $string=~ s/<head.*?>|<\/head>//gsi; $string=~ s/<html.*?>|<\/html>//gsi; $string=~ s/<li.*?>|<\/li>//gsi; $string=~ s/<option.*?>|<\/option>//gsi; $string=~ s/<script.*?>|<\/script>//gsi; $string=~ s/<p.*?>|<\/p>//gsi; $string=~ s/<span.*?>//gsi; $string=~ s/<\/span.*?>//gsi; $string=~ s/<\/ul.*?>//gsi; $string=~ s/<ul.*?>//gsi; $string=~ s/<hr.*//gsi; $string=~ s/<input.*?>//gsi; $string=~ s/[^\x{00}-\x{7E}]//gsi; $string=~ s/ | / /gsi; $string=~ s/'/'/gsi; $string=~ s/>/>/; $string=~ s/&/&/gsi; $string=~ s/</</gsi; $string=~ s/CClear//gsi; my @list = split(/\s+/, $string); my $word_count = $#list; my @sentence = split (/\.|\?|\!/, $string); print "@list\n"; print "There are $#sentence sentences in the list\n"; print "There are $#list words.\n";
I was thinking of using something like this to count the lines and find out where the word is located, but to no avail. I also don't know how to match in an if statement. Any and all information or direction would be highly appreciated. I've hit a cap for today's work with perl lol. Thanks!my $count; foreach (@sentence){ $count++; if (@sentence=~ m/$first/gsi){ print "Matched! at line $count\n"; print "@sentence[10]\n"; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Searching through a document and reporting results.
by GrandFather (Saint) on Jan 30, 2011 at 01:22 UTC | |
|
Re: Searching through a document and reporting results.
by mvaline (Friar) on Jan 30, 2011 at 05:40 UTC |