EagerforPerl has asked for the wisdom of the Perl Monks concerning the following question:
use strict; use warnings; use LWP::Simple; use File::Compare; use File::Copy; $| = 1; sub main { #Create a file with current content, compare with all present file +s in directory if same, delete, if not, keep. unless(-e('filesaves') or mkdir('filesaves')) { die("Directory Couldn't Be Created.\n"); } #create directory if it does not already exist my $fileName; print("Enter Site Directory: "); #Test input: http://caveofprogram +ming.com #Gather site URL with directory my $siteDirectory = <STDIN>; print("Number of Times to Run: "); #Test input: 10 my $runAmount = <STDIN>; #Gather the number of times to check the web address unless(opendir(DIR, 'C:\\Program Files\\OSNE')) { die("Unable to open directory 'C:\\Program Files\\OSNE'\n"); } for(my $i = 0; $i <= $runAmount; $i++) { my $file = readdir(DIR); closedir(DIR); $file = grep(/\.txt$/i, $file); #Filter as to only look for .t +xt files my $searchTable = get($siteDirectory); #Get HTML code from web +site if(defined($searchTable)) { $fileName = localtime() . '.txt'; #Set file name to the ti +me it will be created $fileName =~ s/:/-/g; #remove the disallowed characters an +d replace them so that it can be the file name open(my $outputFile, '>', $fileName) or die("Couldn't Crea +te File.\n"); while($searchTable =~ m|<\s*a\s+[^>]*href\s*=\s*['"]([^>"' +]+)['"][^>]*>\s*([^<>]*)</|sig) { #HTML code title filter regex print $outputFile ("$2: $1\n"); #print the titles to t +he text file } if(compare($fileName, $file) == 0) { close($outputFile); #close output unlink($fileName); #delete file } else { close($outputFile); move("C:\\Program Files\\OSNE\\'$file'","C:\\Program F +iles\\OSNE\\filesaves\\'$file'"); #Move the old file to filesave folder +and keep the new file in the same directory as the script print("Change Detected.\n"); } } else { print("URL Unaccessible: $siteDirectory\n"); } } } main();
I'm new to Perl, and I am trying to make a program that reads a sites html(specifically the titles) continuously as long as the user has specified and compares it with the other scan of the website by comparing files. If the file is the same as the other, delete the newer file. If the file is different, move the old file into the filesaves folder and keep the newer file in the same directory as the script. The program runs, but doesn't create the amount of files specified by the for loop, doesn't move them to the correct file, and doesn't delete them. For example, if you specify the number of times to run as 10, then you will only have 7 text files. Console Log: readdir() attempted on invalid dirhandle DIR at C:\Program Files\OSNE\OSNE.pl line 23, <STDIN> line 2. closedir() attempted on invalid dirhandle DIR at C:\Program Files\OSNE\CPMonitor.pl line 23, <STDIN> line 2. Use of uninitialized value $_ in pattern match (m//) at C:\Program Files\OSNE\CPMonitor.pl line 24, <STDIN> line 2. Change Detected.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Help with Web Scraping Script
by 1nickt (Canon) on Oct 19, 2017 at 11:07 UTC | |
by EagerforPerl (Novice) on Oct 19, 2017 at 19:17 UTC | |
|
Re: Help with Web Scraping Script
by stevieb (Canon) on Oct 19, 2017 at 01:18 UTC | |
by EagerforPerl (Novice) on Oct 19, 2017 at 01:56 UTC | |
|
Re: Help with Web Scraping Script
by marto (Cardinal) on Oct 19, 2017 at 10:24 UTC | |
by hippo (Archbishop) on Oct 19, 2017 at 10:33 UTC | |
by marto (Cardinal) on Oct 19, 2017 at 10:40 UTC |