Sorting, arrays and other problems

TrinityInfinity has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sorting, arrays and other problems by mirod (Canon) on Mar 06, 2001 at 23:14 UTC
It seems like we are answering this kind of question a lot these days... So here is a simplified Rosler-Guttman manoeuver, which sticks the field to sort on at the beginning of the data, sort on it (using the speed of the native sort) and then remove the field from the data: `# this should be read from the last line to the top one my @sorted= map { substr( $_, 8) } # [3]: remove the + date sort # [2]: sort aplph +abetically on the date map { $_= @{[split(/\s/, $_)]}[3] . $_} # [1]: get the da +te and add it at the beginning of the string <DATA>; print join "\n", @sorted; __DATA__ <b><a href="url">text</a></b> 20000407 text <b><a href="url">text</a></b> 20000409 text <b><a href="url">text</a></b> 20000408 text <b><a href="url">text</a></b> 20000507 text` [download] Note that this breaks _really_ easily, especially using split to extract the date. If I wanted to use this for anything else than a one-shot script I would definitelly use some kind of HTML parser here.	[reply] [d/l]
Re: Sorting, arrays and other problems by arturo (Vicar) on Mar 06, 2001 at 23:01 UTC
I advocate this a lot these days: if you can, use `lynx -dump "http://site.i.have.to.use/CHARTS/etc"` [download] And work on plain text. The rest of your problem sounds a little vague. Using `$a` and `$b` is pretty simple, unless you're playing with references (which one might do in this situation). `@dates = sort { $a <=> $b } @dates; #numerical sort of dates` [download] It would help us a lot more if you could post your code and tell us the error message or how the output your getting diverges from what you want. Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l] [select]
I got it! by TrinityInfinity (Scribe) on Mar 07, 2001 at 00:37 UTC
I got it! I had the right idea, but didn't get things split right!! here's my code now `#!/usr/bin/perl # first we open the temp file w/the records to be sorted, place in an +array $input = 'frecord.tmp'; $i = 0; open (input) \|\| die "Can't open file $input: $!\n"; @data = <input>; $datalength = @data; # array length close input; @sorted = sort { @a_fields = split/\s+/, $a; @b_fields = split/\s+/, $b; $a_fields[3] <=> $b_fields[3]; } @data; $sortedlength = @sorted; for ($i=0; $i<=$sortedlength; $i++) { print "$sorted[$i] \n"; }` [download] Thanks for your help!! I do appreciate it! =)	[reply] [d/l]
(dkubb) Re: (2) Sorting lines in a file using the Schwartzian Transform by dkubb (Deacon) on Mar 07, 2001 at 15:17 UTC
You're performing the split for each comparison of the elements in `@data`. This could add a significant amount of time to your file processing. Since the data does not change, it makes sense to do the split one time and cache the results. The Schwartzian Transform is one solution. It combines the sorting on the 4th column that you need, with a more efficient one time split: `#!/usr/bin/perl -w use strict; open INPUT, "< $ARGV[0]" or die "Could not open file $ARGV[0]: $!"; my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (split /\s+/)[3] ] } <INPUT>; close INPUT; print "$_\n" for @sorted; __END__` [download] Here's an explanation of how this works. Please remember you'll need to read the algorithm backwards, from the bottom to the top, to follow this explanation: `<INPUT>` is turned into an array and processed through map. map build a temporary anonymous array, by placing the original line, held in `$_`, into position 0 of the anon. array. It then proceeds to split the line, and pull out the 4th column, which we then place into position 1. Once every line in `<INPUT>` has been processed by map, they are all passed onto the sort function. Remember that as the elements come into sort, they are an anonymous array we built in the previous step. Now, we dereference these, and compared the date (in position 1), to each other. Now, all of the lines are sorted in memory, they are passed into map a second time. map then proceeds to pull out every position 0 element, which were the original lines, and assigns these to `@sorted`.	[reply] [d/l] [select]