substring/regex question

rlastinger has asked for the wisdom of the Perl Monks concerning the following question:

I have an issue where I need to parse data and sort data based on a string of numbers that starts at character 28, but the number is between 1 and 5 digits. I'm not quite sure where to start with something like that.

Below is a sample of the data

FULL_TRACE,1,1343599221,114,2.0.51.96,1:195.176.255.130:0.42|2:130.59.
+36.138:0.95|3:130.59.36.129:0.47|4:213.248.79.189:0.33|5:213.155.133.
+214:13.21|6:213.155.133.141:38.21|7:213.248.93.114:13.64|8:0.0.0.0:-1
+.00|9:193.251.128.210:15.88|10:0.0.0.0:-1.00|11:0.0.0.0:-1.00|12:0.0.
+0.0:-1.00
FULL_TRACE,2,1343599221,118,2.0.52.77,1:195.176.255.130:0.38|2:130.59.
+36.138:0.80|3:130.59.36.129:1.78|4:213.248.79.189:0.31|5:80.91.249.11
+5:11.86|6:80.91.252.254:12.10|7:213.248.77.206:16.36|8:0.0.0.0:-1.00|
+9:0.0.0.0:-1.00|10:0.0.0.0:-1.00
FULL_TRACE,3,1343599221,114,2.0.70.40,1:195.176.255.130:0.54|2:130.59.
+36.138:0.55|3:130.59.36.129:114.58|4:213.248.79.189:0.46|5:213.155.13
+3.214:11.86|6:80.91.246.225:12.07|7:213.248.93.114:14.85|8:0.0.0.0:-1
+.00|9:0.0.0.0:-1.00|10:0.0.0.0:-1.00
FULL_TRACE,4,1343599221,114,2.0.79.129,1:195.176.255.130:0.58|2:130.59
+.36.138:0.47|3:130.59.36.129:0.43|4:213.248.79.189:0.30|5:80.91.249.1
+15:11.85|6:80.91.246.219:15.89|7:213.248.93.114:13.39|8:0.0.0.0:-1.00
+|9:0.0.0.0:-1.00|10:0.0.0.0:-1.00
[download]

In this data set, the number is 114, but could be 1 or 17000.


#!/opt/local/bin/perl -w

$file = "trace.csv";

$file1 = "rgextract.txt";

open (FH, "< $file") || die ("Cannot open file!");

while ($line = <FH>) {

        $line2 = substr $line, 28, 3;

        print "$line2\n";

        if ($line2 =~ /d++/) {

                open (FH2,">>$file1") || die ("Cannot open file!");

                print FH2 "$line\n";

                close (FH2);

}
}

close (FH);
[download]

Please help

Comment on substring/regex question Select or Download Code

Replies are listed 'Best First'.
Re: substring/regex question by rjt (Curate) on Aug 06, 2013 at 05:37 UTC
Update: I may have misread your question. I missed the one little word in your description that changes the whole premise: "sort". I now believe you actually want to print the input file sorted on the 3^rd column, not just print the 3^rd column. I'd use the Schwartzian Transform for a pure-Perl solution: `print for map { pop @$_ } sort { $a->[0] <=> $b->[0] } map { [ (split /,/)[3], $_ ] } <>;` [download] Of course, the following non-Perl solution works just as well on your UNIX-ish OS: `sort -t, -k4 -n <trace.csv >rgextract.txt` Read more... Original answer (868 Bytes) `use strict; use warnings;` omitted for brevity.	[reply] [d/l] [select]
Re: substring/regex question by Loops (Curate) on Aug 06, 2013 at 04:12 UTC
Since your data is in comma separated format, using an absolute offset to extract the number you want doesn't make sense. You can use the Text::CSV module to parse the data into separate fields: `use Text::CSV; my $file = "trace.csv"; my $file1 = "rgextract.txt"; open my $io, '<', $file or die "$file: $!"; open my $out, '>>', $file1 or die "$file1: $!"; my $csv = Text::CSV->new({ binary => 1, eol => $/ }); while (my $row = $csv->getline ($io)) { my $number = $row->[3]; print $out "$number\n"; }` [download] With the 4 data lines you posted, this will create the rgextract.txt file: `114 118 114 114` [download]	[reply] [d/l] [select]