rlastinger has asked for the wisdom of the Perl Monks concerning the following question:

I have an issue where I need to parse data and sort data based on a string of numbers that starts at character 28, but the number is between 1 and 5 digits. I'm not quite sure where to start with something like that.

Below is a sample of the data

FULL_TRACE,1,1343599221,114,2.0.51.96,1:195.176.255.130:0.42|2:130.59. +36.138:0.95|3:130.59.36.129:0.47|4:213.248.79.189:0.33|5:213.155.133. +214:13.21|6:213.155.133.141:38.21|7:213.248.93.114:13.64|8:0.0.0.0:-1 +.00|9:193.251.128.210:15.88|10:0.0.0.0:-1.00|11:0.0.0.0:-1.00|12:0.0. +0.0:-1.00 FULL_TRACE,2,1343599221,118,2.0.52.77,1:195.176.255.130:0.38|2:130.59. +36.138:0.80|3:130.59.36.129:1.78|4:213.248.79.189:0.31|5:80.91.249.11 +5:11.86|6:80.91.252.254:12.10|7:213.248.77.206:16.36|8:0.0.0.0:-1.00| +9:0.0.0.0:-1.00|10:0.0.0.0:-1.00 FULL_TRACE,3,1343599221,114,2.0.70.40,1:195.176.255.130:0.54|2:130.59. +36.138:0.55|3:130.59.36.129:114.58|4:213.248.79.189:0.46|5:213.155.13 +3.214:11.86|6:80.91.246.225:12.07|7:213.248.93.114:14.85|8:0.0.0.0:-1 +.00|9:0.0.0.0:-1.00|10:0.0.0.0:-1.00 FULL_TRACE,4,1343599221,114,2.0.79.129,1:195.176.255.130:0.58|2:130.59 +.36.138:0.47|3:130.59.36.129:0.43|4:213.248.79.189:0.30|5:80.91.249.1 +15:11.85|6:80.91.246.219:15.89|7:213.248.93.114:13.39|8:0.0.0.0:-1.00 +|9:0.0.0.0:-1.00|10:0.0.0.0:-1.00

In this data set, the number is 114, but could be 1 or 17000.

#!/opt/local/bin/perl -w $file = "trace.csv"; $file1 = "rgextract.txt"; open (FH, "< $file") || die ("Cannot open file!"); while ($line = <FH>) { $line2 = substr $line, 28, 3; print "$line2\n"; if ($line2 =~ /d++/) { open (FH2,">>$file1") || die ("Cannot open file!"); print FH2 "$line\n"; close (FH2); } } close (FH);

Please help

Replies are listed 'Best First'.
Re: substring/regex question
by rjt (Curate) on Aug 06, 2013 at 05:37 UTC

    Update: I may have misread your question. I missed the one little word in your description that changes the whole premise: "sort". I now believe you actually want to print the input file sorted on the 3rd column, not just print the 3rd column. I'd use the Schwartzian Transform for a pure-Perl solution:

    print for map { pop @$_ } sort { $a->[0] <=> $b->[0] } map { [ (split /,/)[3], $_ ] } <>;

    Of course, the following non-Perl solution works just as well on your UNIX-ish OS:

    sort -t, -k4 -n <trace.csv >rgextract.txt
    use strict; use warnings; omitted for brevity.
Re: substring/regex question
by Loops (Curate) on Aug 06, 2013 at 04:12 UTC

    Since your data is in comma separated format, using an absolute offset to extract the number you want doesn't make sense. You can use the Text::CSV module to parse the data into separate fields:

    use Text::CSV; my $file = "trace.csv"; my $file1 = "rgextract.txt"; open my $io, '<', $file or die "$file: $!"; open my $out, '>>', $file1 or die "$file1: $!"; my $csv = Text::CSV->new({ binary => 1, eol => $/ }); while (my $row = $csv->getline ($io)) { my $number = $row->[3]; print $out "$number\n"; }
    With the 4 data lines you posted, this will create the rgextract.txt file:
    114 118 114 114