use strict; use warnings; use LWP::Simple; use File::Compare; use File::Copy; $| = 1; sub main { #Create a file with current content, compare with all present file +s in directory if same, delete, if not, keep. unless(-e('filesaves') or mkdir('filesaves')) { die("Directory Couldn't Be Created.\n"); } #create directory if it does not already exist my $fileName; print("Enter Site Directory: "); #Test input: http://caveofprogram +ming.com #Gather site URL with directory my $siteDirectory = <STDIN>; print("Number of Times to Run: "); #Test input: 10 my $runAmount = <STDIN>; #Gather the number of times to check the web address unless(opendir(DIR, 'C:\\Program Files\\OSNE')) { die("Unable to open directory 'C:\\Program Files\\OSNE'\n"); } for(my $i = 0; $i <= $runAmount; $i++) { my $file = readdir(DIR); closedir(DIR); $file = grep(/\.txt$/i, $file); #Filter as to only look for .t +xt files my $searchTable = get($siteDirectory); #Get HTML code from web +site if(defined($searchTable)) { $fileName = localtime() . '.txt'; #Set file name to the ti +me it will be created $fileName =~ s/:/-/g; #remove the disallowed characters an +d replace them so that it can be the file name open(my $outputFile, '>', $fileName) or die("Couldn't Crea +te File.\n"); while($searchTable =~ m|<\s*a\s+[^>]*href\s*=\s*['"]([^>"' +]+)['"][^>]*>\s*([^<>]*)</|sig) { #HTML code title filter regex print $outputFile ("$2: $1\n"); #print the titles to t +he text file } if(compare($fileName, $file) == 0) { close($outputFile); #close output unlink($fileName); #delete file } else { close($outputFile); move("C:\\Program Files\\OSNE\\'$file'","C:\\Program F +iles\\OSNE\\filesaves\\'$file'"); #Move the old file to filesave folder +and keep the new file in the same directory as the script print("Change Detected.\n"); } } else { print("URL Unaccessible: $siteDirectory\n"); } } } main();

I'm new to Perl, and I am trying to make a program that reads a sites html(specifically the titles) continuously as long as the user has specified and compares it with the other scan of the website by comparing files. If the file is the same as the other, delete the newer file. If the file is different, move the old file into the filesaves folder and keep the newer file in the same directory as the script. The program runs, but doesn't create the amount of files specified by the for loop, doesn't move them to the correct file, and doesn't delete them. For example, if you specify the number of times to run as 10, then you will only have 7 text files. Console Log: readdir() attempted on invalid dirhandle DIR at C:\Program Files\OSNE\OSNE.pl line 23, <STDIN> line 2. closedir() attempted on invalid dirhandle DIR at C:\Program Files\OSNE\CPMonitor.pl line 23, <STDIN> line 2. Use of uninitialized value $_ in pattern match (m//) at C:\Program Files\OSNE\CPMonitor.pl line 24, <STDIN> line 2. Change Detected.


In reply to Help with Web Scraping Script - Updated by EagerforPerl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.