Hi. Please read Site How To before you submit code next time and save the editors and yourself a lot of work. Thanks.

I just appended the data lines from above to the end of the code I gave you at pulling by regex and it parsed it correctly.

Ouput

C:\test>218961.pl The line '10/Dec/2002:18:05:09 -0500 http://www.indystar.com/help/help +/available.html' was logged 7290.05 minutes ago. Discarding previous line The line '10/Dec/2002:18:08:13 -0500 http://www.indystar.com/help/help +/available.html' was logged 7286.98 minutes ago. Discarding previous line The line '10/Dec/2002:18:11:19 -0500 http://www.indystar.com/help/help +/available.html' was logged 7283.88 minutes ago. Discarding previous line The line '' was logged 120127291175.20 minutes ago. Discarding previous line The line '15/Dec/2002:14:52:12 -0500 http://www.indystar.com/print/art +icles/?S=D' was logged 283.00 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:12 -0500 http://www.indystar.com/print/art +icles/6/008596-6466-040.html' was logged 283.00 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.fark.com/' was logged +282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/6/008596-6466-040.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/6/008596-6466-040.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/forums/sh +owthread.php?s=&postid=177044' was logged 282.98 minutes ago The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/forums/sh +owthread.php?s=&postid=177044' was logged 282.98 minutes ago The previous line is within the window. Counting... The line '15/Dec/2002:14:52:13 -0500 http://www.indystar.com/print/art +icles/2/008227-9652-031.html' was logged 282.98 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:14 -0500 -' was logged 282.97 minutes ago. The previous line is within the window. Counting... The line '15/Dec/2002:14:52:14 -0500 http://www.indystar.com/forums/sh +owthread.php?s=&postid=177044' was logged 282.97 minutes ago The previous line is within the window. Counting... These where the referrers logged: $VAR1 = { 'http://www.indystar.com/forums/showthread.php?s=&postid=177 +044' => '3', 'http://www.indystar.com/print/articles/2/008227-9652-031.ht +ml' => '6', 'http://www.indystar.com/print/articles/6/008596-6466-040.ht +ml' => '3', '-' => '1', 'http://www.fark.com/' => '1', 'http://www.indystar.com/print/articles/?S=D' => '1' }; C:\test>

Then I looked at your version of the code and noticed this:

open LOGFILE, "datafile.html" || die "Can't open file";

The problem with this line is that because you are not using brackets around the parameters to open combined with the relatively high presedence of ||, this is being parsed as

open( LOGFILE, ("datafile.html" || die "Can't open file") );

which as the first part of the || statement is always true, the second part ('die die "Can't open file"') is simply being optimised away meaning that even if the open fails (because input file does not exist or is not in the current subdirectory etc), you will never see any error msg. Could this be your problem?

The fix is to use either

open(LOGFILE, "datafile.html") || die "Can't open file$!";

or

open LOGFILE, "datafile.html" or die "Can't open file$!"

Please also note the inclusion of $! in the error message. This will tell you why the open failed if it does, not just if. See Error Indicators for further details.

The second thing I noted was the name of the file: "datafile.html"?? If this is a logfile, why is it named .html? If the file conatains html tags, the regex supplied will not parse the data.

Your not by any chance viewing and saving the logs via a web interface are you? If so, you need to cut&paste from the screen to a file or use "Save as...type *.txt" if your browser has that option in order to remove the html tags from the file.

If that doesn't explain and allow you to fix the problem come back and post the error message or otherwise describe what you are seeing (eg. No output, wrong output, etc).

No need to re-post the code or data again unless it has changed substantially.

Good luck.


Examine what is said, not who speaks.


In reply to Re: Re: Re: pulling by regex by BrowserUk
in thread pulling by regex by mkent

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.