in reply to Splitting squid log lines with perl

Put your cursor on the u in vim and type ga (mnemonic: "get ascii") to have the character code displayed in the status line in a number of formats.

Can you also post a short(!) sample of Squid log lines? What's important is to pay attention to whether any of the fields can have embedded whitespace - in that case you have to do more precise work than just simply splitting.

Makeshifts last the longest.

  • Comment on Re: Splitting squid log lines with perl

Replies are listed 'Best First'.
Re: Re: Splitting squid log lines with perl
by blm (Hermit) on Sep 16, 2002 at 11:13 UTC

    By typing ga while the cursor was positioned over the micro in

    @cache = split 'µ';

    I get  <µ>  <|5>  <M-5>  181,  Hex b5,  Octal 265 down the bottom (in the ruler?) So that makes it a byte of value 0xb5?

    Anyway my squid logs look like this:

    1031902298.709 609 10.0.14.117 TCP_MISS/302 376 GET http://ad.doubl +eclick.net/ad/max.starwarskids/ros;sz=468x60;num=443509536434963200 f +red DIRECT/204.253.104.95 -

    There are one or more spaces between feilds (OT: cut -f2 -d' ' doesn't work :-( ). I was using:

    while (<LOG>) { @line_elements = split(' '); ... }
    but it seems to work better with
    @line_elements = split(/\s+/);

    Is this bad? \s is whitespace (tabs as well)? I am actually reading the Friedl book (Mastering Regular Expressions) atm.

      That would be 0xB5, yes. I have no idea how one arrives at using that as a separator though..

      If there are one or more spaces between fields, but none inside fields, then /\s+/ is indeed what you want to use and probably better than ' ' which is a special case. It means almost the same as /\s+/ - with a subtle difference.

      #!/usr/bin/perl -wl use strict; sub joinprint { print join " ", map q/"$_"/, @_ } $_ = " blah blah"; joinprint split ' '; joinprint split /\s+/; __END__ "blah" "blah" "" "blah" "blah"
      The split " " will omit an empty initial field. perldoc -f split carefully points this out. I recommend you write split /$char/ in the future, since that's what really happens to all literal strings other than the single blank. If you don't, you can easily confuse yourself with something like split "." which is the same as split /./ and as such most certainly not what you wanted.

      Makeshifts last the longest.