VladP has asked for the wisdom of the Perl Monks concerning the following question:
Hi - How can I modify the code below to sort and output the numerials+alpha characters at the bottom of the output.txt file I'm writing to? See example output below of what I require. Code is slopppy. Just learning.
require 5.000;
use warnings;
use strict;
use POSIX;
my %tags = ();
my $input = $ARGV[0];
my $output = $ARGV[1];
open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE>) {
$tag =~ m/<t id=(\d*)(.*)((\d*)([[:alpha:]]*))>/;
if ($tag eq "\n") {
# BLANK LINE
} else {
$tag =~ m/<t id=\((\d*)(.*)((\d*)([[:alpha:]]*))\)>/;
$tags{sprintf("%04d%6s",$1 || 9999,$2)} = $tag;
}
}
close NEWFILE;
close FILE;
open (NEWFILE, "> $output") or die "cannot open $output: $!\n";
foreach my $id ( sort keys %tags )
{
print NEWFILE $tags{$id};
print $tags{$id};
}
close NEWFILE;
close FILE;
input.txt
<t id=2bc>Only the...</t>
<t id=1>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=1>Only the...</t>
<t id=1a>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=35>Only the...</t>
<t id=31>Only the...</t>
<t id=2b>Only the...</t>
<t id=4>Only the...</t>
<t id=42>Only the...</t>
<t id=5>Only the...</t>
<t id=51>Only the...</t>
<t id=2ac>Only the...</t>
<t id=52>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
Required results: output.txt
<t id=1>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=4>Only the...</t>
<t id=5>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=31>Only the...</t>
<t id=35>Only the...</t>
<t id=42>Only the...</t>
<t id=51>Only the...</t>
<t id=52>Only the...</t>
<t id=1a>Only the...</t>
<t id=2ac>Only the...</t>
<t id=2b>Only the...</t>
<t id=2bc>Only the...</t>
Re: Sorting numerials first and then numerials with alpha characters last
by haukex (Archbishop) on Oct 29, 2021 at 13:31 UTC
|
Though there are ways to do it in a single pass, I found it a bit easier to think about if one first splits the list out into two and then sorts them individually. Note I'm using Sort::Key::Natural to sort the mixed values in a hopefully intuitive way (Update: you may note I added the test case "12c" to demonstrate the difference to a simple sort).
use warnings;
use strict;
use Test::More;
use List::Util qw/uniq/;
use Sort::Key::Natural qw/natsort/;
my @in = ( "2bc", "1", "12c", "12", "21", "1", "1a", "2", "2", "2",
"2", "3", "35", "31", "2b", "4", "42", "5", "51", "2ac",
"52", "6", "7" );
my @expect = ( "1", "2", "3", "4", "5", "6", "7", "12", "21", "31",
"35","42", "51", "52", "1a", "2ac", "2b", "2bc", "12c" );
my (@nums, @mixed);
/^\d+$/ and push @nums, $_ or push @mixed, $_ for @in;
my @out = uniq(
sort( { $a <=> $b } @nums),
natsort(@mixed),
);
is_deeply \@out, \@expect or diag explain \@out;
done_testing;
As opposed to your previous thread, these tags look much more like HTML/XML, so you really should use an appropriate parser! | [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
That code assumes sort is stable. Use use sort 'stable'; to guarantee that.
what?
| [reply] |
|
|
|
Re: Sorting numerials first and then numerials with alpha characters last
by hippo (Bishop) on Oct 29, 2021 at 13:43 UTC
|
Code is slopppy. Just learning.
That's fine - I will leave most of your code as it stands and put a Schwartzian Transform into your sort line. This produces your specified output - let's hope that's representative. :-)
require 5.000;
use warnings;
use strict;
use POSIX;
my %tags = ();
my $input = $ARGV[0];
my $output = $ARGV[1];
open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE>) {
$tag =~ m/<t id=(\d*)(.*)((\d*)([[:alpha:]]*))>/;
if ($tag eq "\n") {
# BLANK LINE
} else {
$tag =~ m/<t id=\((\d*)(.*)((\d*)([[:alpha:]]*))\)>/;
$tags{sprintf("%04d%6s",$1 || 9999,$2)} = $tag;
}
}
close FILE;
open (NEWFILE, "> $output") or die "cannot open $output: $!\n";
foreach my $id ( map { $_ = $_->[2] }
sort { $a->[1] cmp $b->[1] || $a->[0] <=> $b->[0] }
map { /(\d+)([a-z]*)/; $_ = [$1, $2, $_] }
keys %tags )
{
print NEWFILE $tags{$id};
print $tags{$id};
}
close NEWFILE;
| [reply] [d/l] |
|
use strict;
use warnings;
use Data::Dump;
use Test::More;
my @in =
( "2bc", "1", "12c", "12", "21", "1", "1a", "2", "2", "2",
"2", "3", "35", "31", "2b", "4", "42", "5", "51", "2ac",
"52", "6", "7"
);
my @exp =
( "1", "2", "3", "4", "5", "6", "7", "12", "21", "31",
"35","42", "51", "52", "1a", "2ac", "2b", "2bc", "12c",
);
my %in;
@in{@in}=(); # unique
my @got =
map { $_->{orig} }
sort { $a->{alpha} cmp $b->{alpha} || $a->{num} <=> $b->{num} }
map { /(?<num>\d+)(?<alpha>[a-z]*)/; { orig => $_, %+ } }
keys %in;
is_deeply(\@exp,\@got) or ddx [\@exp,\@got];
done_testing;
| [reply] [d/l] [select] |
Re: Sorting numerials first and then numerials with alpha characters last
by johngg (Canon) on Oct 29, 2021 at 15:45 UTC
|
An alternative to the Schwartzian Transform shown by hippo is the Guttman Rosler Transform where the sort keys are pulled from the text and then keys and text are formed into a single string that can be sorted alphanumerically. Once sorted the original text can be extracted from the end of the string. The string can be formed using something like sprintf or pack, which I use here.
In the first map the regular expression captures one or more digits in $1 then zero or more letters in $2; note that no error checking for bad lines is done here. Then $2 is checked to see if there were letters, an indicator of their presence (1) or not (0) being the first sort key, pack'ed as an unsigned char in 1 byte using the "c" template; the second key is the numerical value packed in big-endian order (which sorts correctly) and the third is the letters, if present, packed with NULL padding to the right. Finally the original text is tacked on the end, also NULL padded. Then the string is sort'ed and the original line is unpack'ed in the second map, using the "x" template to skip over the keys and the "a" template to strip off any NULL padding. This code:-
use strict;
use warnings;
open my $inFH, q{<}, \ <<__EOD__ or die $!;
<t id=2bc>Only the...</t>
<t id=1>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=1>Only the...</t>
<t id=1a>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=17d>Only the...</t>
<t id=35>Only the...</t>
<t id=31>Only the...</t>
<t id=2b>Only the...</t>
<t id=4>Only the...</t>
<t id=42>Only the...</t>
<t id=5>Only the...</t>
<t id=51>Only the...</t>
<t id=2ac>Only the...</t>
<t id=52>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
__EOD__
print for
map { unpack q{x13a*}, $_ }
sort
map { m{(?x) (?<=id=) ( \d+ ) ( [a-z]* ) (?=>)}
&&
pack q{cNa8a*}, ( length $2 ? 1 : 0 ), $1, $2, $_
}
<$inFH>;
close $inFH or die $!;
Produces this output.
<t id=1>Only the...</t>
<t id=1>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=4>Only the...</t>
<t id=5>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=31>Only the...</t>
<t id=35>Only the...</t>
<t id=42>Only the...</t>
<t id=51>Only the...</t>
<t id=52>Only the...</t>
<t id=1a>Only the...</t>
<t id=2ac>Only the...</t>
<t id=2b>Only the...</t>
<t id=2bc>Only the...</t>
<t id=17d>Only the...</t>
I hope this is of interest.
Update: Expanded wording about the primary sort key.
| [reply] [d/l] [select] |
Re: Sorting numerials first and then numerials with alpha characters last
by tybalt89 (Monsignor) on Oct 29, 2021 at 16:26 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11138205
use warnings;
open my $fh, '<', \<<END or die;
<t id=2bc>Only the...</t>
<t id=1>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=1>Only the...</t>
<t id=1a>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=35>Only the...</t>
<t id=31>Only the...</t>
<t id=2b>Only the...</t>
<t id=4>Only the...</t>
<t id=42>Only the...</t>
<t id=5>Only the...</t>
<t id=51>Only the...</t>
<t id=2ac>Only the...</t>
<t id=52>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
END
print sort {
my ($anum, $achars, $bnum, $bchars) = map /<t id=(\d+)(\w*)>/, $a, $
+b;
$achars cmp $bchars || $anum <=> $bnum
} <$fh>;
| [reply] [d/l] |
Re: Sorting numerials first and then numerials with alpha characters last
by ikegami (Patriarch) on Oct 29, 2021 at 18:34 UTC
|
use Sort::Key::Multi qw( iiskeysort );
my @sorted =
iiskeysort { /id=(\d+)(\w*)/ or die; $2 ne "", $1, $2 }
@inputs;
| [reply] [d/l] |
|
|