Hi Monks,

I need to parse httpd access_log, do some regexes, then i need to sort the records by cellphones number and then save it into CSV File, But seems Text::CSV_XS failed for some characters, here's example from access_log.gz (1 record) :

198.168.0.4 - - [13/Nov/2006:03:51:36 +0700] "GET /pool-apps.php?mesg= +%00C%00o%00c%00o%00k%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+% +00+%00+%00+%00+%00+%00+%00+%00+%00+%00i%00k%00b%00a%00l%00+%00+%00+%0 +0+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00+%00 ++%00+%00+%00n%00a%00s%00t%00i%00t%00i&src=%2B218544448366&dest=3422&t +s=2006-11-12+20:51:36&smsc=as3&service=default&udh=&tid= HTTP/1.1" 20 +0 5

I do not know why Text::CSV_XS failed to combine this string ...
I need advices..

And here's my code:

#!/usr/bin/perl -w use strict; use Text::CSV_XS qw(combine); use URI::Escape; die "Usage: perl $0 [input_file.gz] \n" unless @ARGV; my($input_file) = @ARGV; open(my $FH, "zless $input_file| grep as3|") or die $!; my $csv = Text::CSV_XS->new; print map { if($csv->combine(@$_)) { $csv->string, "good\n"} else { "failed \n +" }; } sort { $a->[2] <=> $b->[2] } map { $_ = uri_unescape(scalar $_); #tr/\x00//d; #tr/\xE4\xF1\xEC//d; #tr/\xA7//d; #tr/\xF6\xF6\xF9\xE9\xC7//d; #because Text::CSV_XS "fai +led to combine" [ m{ (\[ # start date [^\]]+ # date \]) # end of date [^=]+ # stuff which is not a = = (.+?) # The message &src=\+ (\d+) # The number }x ] } <$FH>;

Thanks - Zak


In reply to Text::CSV_XS failed to combine by Persib

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.