nanouk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, new to the forum and rather a noob in perl matters. Here's the problem. I'd like to process /var/log/messages on a linux system and substitute all whitespaces for commas up to a certain position (to be able to export the file to a csv format). Here's a sample from /var/log/messages:
Oct  1 13:23:25 smoothwall-swe3 kernel: Allocating PCI resources starting at 10000000 (gap: 08000000:f6c00000)
Oct  1 13:23:25 smoothwall-swe3 kernel: Built 1 zonelists
Oct  1 13:23:25 smoothwall-swe3 kernel: Kernel command line: BOOT_IMAGE=SmoothWall ro root=804 ramdisk_size=8192 no-scroll panic=30
Oct  1 13:23:25 smoothwall-swe3 kernel: Enabling fast FPU save and restore... done.
Oct  1 13:23:25 smoothwall-swe3 smoothd: Loading Plugins for Module "/usr/lib/smoothd/sysinstall.so
I want to replace all whitespaces up to the one after the bold words included (after kernel: and smoothd:). What I have tried so far has proved useless, namely
#!/usr/bin/perl open (FILE,"</var/log/messages") || die 'Unable to open log file'; open (TMP, ">/tmp/messages.csv") || die 'Unable to write to file'; while (<FILE>) { $_ =~ s/\s{1,6}/,/; print TMP $_; }
It just replaces the first occurrence, after the leading Oct. Any help would be greatly appreciated, since googleing has proved rather frustrating. TIA

Replies are listed 'Best First'.
Re: perl substitution difficulty
by graff (Chancellor) on Oct 15, 2007 at 12:11 UTC
    I think split and join would make a better approach here:
    while (<FILE>) { print join ",", split( " ", $_, 6 ); }
    The first arg in the split call is a "magic" space -- which means break on one or more whitespace characters (update: for the split to use a "magic" space, it needs to be quoted, not a regex -- thanks and apologies, guys!). The third arg to split says that only 6 pieces should be returned -- that means upon seeing the sixth non-whitespace string, it will ignore all remaining whitespace characters, and the sixth element will contain everything to the end of the string.

    Update: you might need to watch out for lines in the log file that contain commas; in those cases, double-quotes will be needed around each field that contains a comma. (And if any field contains a double-quote, that needs to be escaped by doubling it:

    # input: Oct 1 13:23:25 smoothwall-swe3 kernel: Enabling fast FPU save, restor +e and "foo"... done. # should produce as output: Oct,1,13:23:25,smoothwall-swe3,kernel:,"Enabling fast FPU save, restor +e and ""foo""... done."
    Not hard to do if you split into an array, then loop over the array elements. But you may want to look at a module for that (e.g. Text::xSV). Or you could just delete commas and quotes from the input before doing the split/join. ;)
      It's " " that triggers the magic behavior, as in split(" ", $str). Your code, split(/ /, $str), is just a split on a single space.
      Hi,
      I cant't rebuild that magicness on my machine though - it joins on every space and therefore adds to many commas. Sure you dont mean /\s+/?
      svenXY
Re: perl substitution difficulty
by McDarren (Abbot) on Oct 15, 2007 at 12:32 UTC
    Of course, if you wanted to avoid this regular expression sillyness altogether, then you could draw upon the power of CPAN.

    Parse::Syslog looks like it might be just the ticket.

    Cheers,
    Darren :)

Re: perl substitution difficulty
by svenXY (Deacon) on Oct 15, 2007 at 12:12 UTC
    Hi,
    better use split for this type of stuff. It lets you specify the number of fields that you want to read and fills up the last one (field 6 here) with the rest of the line.
    #!/usr/bin/perl use strict; use warnings; open (DATA,"</var/log/messages") || die 'Unable to open log file'; open (TMP, ">/tmp/messages.csv") || die 'Unable to write to file'; while (<DATA>) { chomp; my (@cols) = split(/\s+/, $_, 6); print TMP join(',',@cols), "\n"; } close TMP; close DATA; __DATA__ Oct 1 13:23:25 smoothwall-swe3 kernel: Allocating PCI resources start +ing at 10000000 (gap: 08000000:f6c00000) Oct 1 13:23:25 smoothwall-swe3 kernel: Built 1 zonelists Oct 1 13:23:25 smoothwall-swe3 kernel: Kernel command line: BOOT_IMAG +E=SmoothWall ro root=804 ramdisk_size=8192 no-scroll panic=30 Oct 1 13:23:25 smoothwall-swe3 kernel: Enabling fast FPU save and res +tore... done. Oct 1 13:23:25 smoothwall-swe3 smoothd: Loading Plugins for Module "/ +usr/lib/smoothd/sysinstall.so
    prints:
    Oct,1,13:23:25,smoothwall-swe3,kernel:,Allocating PCI resources starti +ng at 10000000 (gap: 08000000:f6c00000) Oct,1,13:23:25,smoothwall-swe3,kernel:,Built 1 zonelists Oct,1,13:23:25,smoothwall-swe3,kernel:,Kernel command line: BOOT_IMAGE +=SmoothWall ro root=804 ramdisk_size=8192 no-scroll panic=30 Oct,1,13:23:25,smoothwall-swe3,kernel:,Enabling fast FPU save and rest +ore... done. Oct,1,13:23:25,smoothwall-swe3,smoothd:,Loading Plugins for Module "/u +sr/lib/smoothd/sysinstall.so

    Regards,
    svenXY
Re: perl substitution difficulty
by moritz (Cardinal) on Oct 15, 2007 at 12:16 UTC
    Your regex reads like this "find 1 or up to 6 whitespaces in a row, and substitute them by a comma".

    What you want to do looks like this: substr($_, 0, 6) =~ s/\s/,/g;

Re: perl substitution difficulty
by Krambambuli (Curate) on Oct 15, 2007 at 12:31 UTC
    Adding just a thought to the solutions already given.

    The output .csv file might be in error/unusable if some fields written to it (the actual message part) can contain commas too. No problem if you know for sure that that cannot happen, but otherwise, later problems are lurking.

    Take care.
Re: perl substitution difficulty
by svenXY (Deacon) on Oct 15, 2007 at 12:14 UTC
    ++graff for being faster and shorter ;-)
    didn't know of the magic space either, thanks!