Persib has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need to parse httpd access_log, do some regexes, then i need to sort the records by cellphones number and then save it into CSV File, But seems Text::CSV_XS failed for some characters, here's example from access_log.gz (1 record) :

198.168.0.4 - - [13/Nov/2006:03:51:36 +0700] "GET /pool-apps.php?mesg= +%00C%00o%00c%00o%00k%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+% +00+%00+%00+%00+%00+%00+%00+%00+%00+%00i%00k%00b%00a%00l%00+%00+%00+%0 +0+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00 ++%00+%00+%00n%00a%00s%00t%00i%00t%00i&src=%2B218544448366&dest=3422&t +s=2006-11-12+20:51:36&smsc=as3&service=default&udh=&tid= HTTP/1.1" 20 +0 5

I do not know why Text::CSV_XS failed to combine this string ...
I need advices..

And here's my code:

#!/usr/bin/perl -w use strict; use Text::CSV_XS qw(combine); use URI::Escape; die "Usage: perl $0 [input_file.gz] \n" unless @ARGV; my($input_file) = @ARGV; open(my $FH, "zless $input_file| grep as3|") or die $!; my $csv = Text::CSV_XS->new; print map { if($csv->combine(@$_)) { $csv->string, "good\n"} else { "failed \n +" }; } sort { $a->[2] <=> $b->[2] } map { $_ = uri_unescape(scalar $_); #tr/\x00//d; #tr/\xE4\xF1\xEC//d; #tr/\xA7//d; #tr/\xF6\xF6\xF9\xE9\xC7//d; #because Text::CSV_XS "fai +led to combine" [ m{ (\[ # start date [^\]]+ # date \]) # end of date [^=]+ # stuff which is not a = = (.+?) # The message &src=\+ (\d+) # The number }x ] } <$FH>;

Thanks - Zak

Replies are listed 'Best First'.
Re: Text::CSV_XS failed to combine
by clinton (Priest) on Nov 23, 2006 at 14:58 UTC
    From the docs:

    This module is based upon a working definition of CSV format which may not be the most general.
    1. Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode all characters are accepted, at least in quoted fields:
    2. A field within CSV may be surrounded by double-quotes. (The quote char)
    3. A field within CSV must be surrounded by double-quotes to contain a comma. (The separator char)
    4. A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double-quotes. In binary mode you may additionally use the sequence "0 for representation of a NUL byte.
    5. A CSV string may be terminated by 0x0A (line feed) or by 0x0D,0x0A (carriage return, line feed).

    But if you do this, it works:

    my $csv = Text::CSV_XS->new ({binary=>1});
Re: Text::CSV_XS failed to combine
by madbombX (Hermit) on Nov 23, 2006 at 15:50 UTC