aash08 has asked for the wisdom of the Perl Monks concerning the following question:

My first time posting something here.

I am facing a little problem. I have made some codes that assembles data from different files into an output files. Problem is i want to write another code that can being the data in this output file in a sequence w.r.t. to a field.
for instance my data looks like


......
2.225:0:1248266065752:Y:282
2.232:0:1248266069770:Y:500
2.225:1:1248266072861:Y:438
2.232:1:1248266075785:Y:328
2.225:1:1248266081283:Y:297
2.232:1:1248266082035:Y:328
2.232:1:1248266087410:Y:281
2.225:1:1248266088768:Y:296
2.232:1:1248266091426:Y:281
....

what i would want is keeping the first field in mind; I would want the lines belonging to the ID 2.225 in sequence ... and when finished; it will be followed by the field 2.232 ... and so on ... In short i want to sort then according to the first field which you can say its the ID ...

I had some ideas for them but am afraid am not able to succeed with them; can anyone out there provide me a effective solution ... (: i would be so grateful!!

Replies are listed 'Best First'.
Re: a little problem with sorting my data
by ikegami (Patriarch) on Jul 27, 2009 at 17:17 UTC
    From the prompt,
    sort data > sorted_data

    Alternatively, the following would be an efficient Perl solution:

    my %grouped; while (...) { my @fields = ...; push @{ $grouped{$fields[0]} }, \@fields; } for my $group (values %grouped) { ... }

      On your second one: That will group by ID, but I don't believe it will sort by ID. That is to say: You will get all the records with a certain ID together, but you won't get the ID's themselves in any particular order. (Unless values does some subtle sorting I'm not aware of.)

        I had written a line about that, but I must have deleted it by accident.

        Indeed, it groups but it doesn't sort. I had already provided a solution for sorting. If the OP also wants to sort the ids and then do something with them, he can do:

        for ( sort <$fh> ) { my @fields = ...; ... }
Re: a little problem with sorting my data
by moritz (Cardinal) on Jul 27, 2009 at 17:18 UTC
    It would be helpful to know what exactly you have tried, and how it failed. Just feeding the data as-is line by line to sort groups them by ID (although it doesn't sort them numerically if the ID is of variable width).
      Ok. here is what makes my output file;
      #!usr/bin/perl $k = 1; $file_name = "QoEWeb_DB_Log.txt"; #An input LOG file which holds infor +mation about different users. open(SW,$file_name) or die "ERROR OPENING IN FILE"; open FILE, ">output.txt" or die "ERROR..unable to write" ; #Will write + the result into an OUTPUT file. while (<SW>) { chomp($eachline); @file_name1 = ("Carlo_Arnold_2.232_Final.txt", "Sohaib_Ahmad_2.225 +_Final.txt"); #Input LOG files; Each file holds informaton about Indi +vidual user. I will be adding about 30 files here. @logarray = split(/:/,$_); # Taking required fields from the first + input file. $field1 = @logarray[2]; $field2 = @logarray[4]; $key1 = @logarray[8]; $field5 = @logarray[6]; $x = scalar(@file_name1); for($j=0;$j<$x;$j++) { open(RW,@file_name1[$j]) or die "ERROR OPENING IN FILE"; while (<RW>) { chomp($eachline); @ff_array = split(/:/,$_); #Taking required fields from the s +econd set of input files. $key2 = @ff_array[0]; $field3 = @ff_array[1]; if( $key1 == $key2) # Finding a match between the two input f +iles { print FILE "$field1:$field2:$key1:$field5:$field3"; #P +rinting the desired result from both batch of input files. } } } } close FILE; close SW or die "Cannot close"; close RW or die "Cannot close";

      Problem is
      - i want the data in output file to be in sequence according to the field $field1
      - let me assure you that all the $field1 is not of variabe length ... its between the range of 2.221 and ends at 2.252 ... which means that 2.XXX .. while the XXX changes.

      - it would be so much better if these lines are in arranged manner in my OUTPUT file. any views ?
        Instead of while (<RW>) you can write for (sort <RW>) and be done.

        But I strongly recommend to use strict; use warnings; and to declare your variables with my.

Re: a little problem with sorting my data
by bichonfrise74 (Vicar) on Jul 27, 2009 at 21:41 UTC
    Based on your question, I thought of using the Schwartzian Transform to solve the problem. I'm not sure if this is an overkill.

    Here's the code.
    #!/usr/bin/perl use strict; my $string; while( <DATA> ) { $string = $string . join " ", split( /\:/ ); } my $data = join "\n", map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (split)[0]] } split( /\n/, $string); print $data; __DATA__ 2.225:0:1248266065752:Y:282 2.232:0:1248266069770:Y:500 2.225:1:1248266072861:Y:438 2.232:1:1248266075785:Y:328 2.225:1:1248266081283:Y:297 2.232:1:1248266082035:Y:328 2.232:1:1248266087410:Y:281 2.225:1:1248266088768:Y:296 2.232:1:1248266091426:Y:281
Re: a little problem with sorting my data
by i-blis (Novice) on Jul 28, 2009 at 00:03 UTC

    The use of a Schwartzian Transform gives you indeed more flexibility : you can perform the sort on any field, handle cases were sort is not trivial, sort on many fields etc. It is certainly an idiom you won't regret to have learnt.

    A common way to read a whole file to a scalar, is to "slurp" it by getting rid of the input record separator's ($/) value (newline by default).

    I rewrote it with clean file opening, slurp and sort on the first and fourth field, in order to help you get the logic, in case you did not already.

    #!/usr/bin/env perl use strict; use warnings; open my $fh, '<', 'file.txt' or die "$!\n"; my $raw = do { local $/; <DATA>}; my $sorted = join "\n", map { $_->[0] } sort { $a->[1] <=> $b->[1] } sort { $a->[2] <=> $b->[2] } map { [$_, (split /:/)[0,4]] } split( /\n/, $raw); print $sorted;
Re: a little problem with sorting my data
by ig (Vicar) on Jul 28, 2009 at 18:39 UTC

    Re-reading all your user data files once for each record in the log file is inefficient. It would be more efficient to build a hash of the data on individual users first, then process the log file, pulling user data from the hash.