Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

by wrog (Friar)
on Jun 17, 2015 at 04:58 UTC ( [id://1130746]=note: print w/replies, xml ) Need Help??


in reply to Re: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
in thread Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

split is fine as long as the character you're splitting on does not occur in your various fields.

If it can, which then means there then has to be a scheme for quoting such fields or escaping such characters, then we're pretty much beyond what split can do and at the point where you need to be building the Regular Expression From Hell — which can be plenty fast if you do it right, but you have to do it right — or using Text::CSV or somesuch.

Replies are listed 'Best First'.
Re^3: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
by karlgoethebier (Abbot) on Jun 17, 2015 at 09:46 UTC
    ...we're pretty much beyond what split can do...

    Mh, we know the format:

    use Data::Dump; use feature qw(say); my $line =qq(127.0.0.1 - - [22/Apr/2015:13:35:04 +1000] "GET /bin/admi +n.pl HTTP/1.1" 401 509); my @bits = split /\s/, $line; dd\@bits; say qq(Host: $bits[0]); say qq(Logname: $bits[1]); say qq(User: $bits[2]); say qq(Time: $bits[3] $bits[4]); say qq(Request: $bits[5] $bits[6] $bits[7]); say qq(Status: $bits[8]); say qq(Size: $bits[9]); __END__ monks>apache.pl [ "127.0.0.1", "-", "-", "[22/Apr/2015:13:35:04", "+1000]", "\"GET", "/bin/admin.pl", "HTTP/1.1\"", 401, 509, ] Host: 127.0.0.1 Logname: - User: - Time: [22/Apr/2015:13:35:04 +1000] Request: "GET /bin/admin.pl HTTP/1.1" Status: 401 Size: 509

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Thanks for your reply!

      A quick question dealing with the internal workings of what you wrote:

      I understand that the split function can take any expression as its element then operate on the scalar, but what would be the more nuanced differences, particularly with memory usage and processing speed, if any, between using split and a general pattern match?

      Thanks!

        You can measure the speed of your code with the time command. Or use Time::HiRes. Or Benchmark. See also Devel::Size and Devel::NYTProf.

        And don't forget to try Super Search. I'm sure that you will find many examples that use time, Benchmark, Time::HiRes, Devel::Size and Devel::NYTProf.

        Regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

      I'm not sure I'd want to bet my life that none of logname, user or the request URI can have spaces in them.
        "I'm not sure I'd want to bet my life..."

        I guess a beer would be fair. Please see also URI scheme ;-)

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re^3: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
by BrowserUk (Patriarch) on Jun 17, 2015 at 10:08 UTC

    He did ask for a learning exercise; not a pre-solved solution.

    Plus, chances are the he'll need to break the composite fields down further anyway, before he can do any analysis or storage.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      "...He did ask for a learning exercise..."

      Yes, but i didn't reply to the OP.

      "...break the composite fields down..."

      Yes, sure. Perhaps like this:

      karls-mac-mini:monks karl$ perl -E ' say split /[\[\]]/, qq([22/Apr/20 +15:13:35:04 +1000])' 22/Apr/2015:13:35:04 +1000 karls-mac-mini:monks karl$ perl -E ' say split /"/, qq("GET /bin/admin +.pl HTTP/1.1")' GET /bin/admin.pl HTTP/1.1 karls-mac-mini:monks karl$ perl -E 'say join "\t", split /\s/, qq(GET +/bin/admin.pl HTTP/1.1)' GET /bin/admin.pl HTTP/1.1 # usw...

      I just wanted to show wrog that a solution that only uses split is possible.

      Another question is this it is desirable if this is desirable. I guess some may call it abuse.

      Edit: Better wording.

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Sorry Karl. My reply was intended as a reply to wrog.

        I must have clicked the wrong link; which from memory is a first for me. I make plenty of other stupid user errors here, but never (that I recall) replying to the wrong post.

        Effectively, I was make the same point as you already did.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1130746]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-29 00:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found