Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Sorting numerials first and then numerials with alpha characters last

by VladP (Novice)
on Oct 29, 2021 at 12:30 UTC ( [id://11138205]=perlquestion: print w/replies, xml ) Need Help??

VladP has asked for the wisdom of the Perl Monks concerning the following question:

Hi - How can I modify the code below to sort and output the numerials+alpha characters at the bottom of the output.txt file I'm writing to? See example output below of what I require. Code is slopppy. Just learning.

require 5.000; use warnings; use strict; use POSIX; my %tags = (); my $input = $ARGV[0]; my $output = $ARGV[1]; open (FILE, "< $input") or die "cannot open $input: $!\n"; while (my $tag = <FILE>) { $tag =~ m/<t id=(\d*)(.*)((\d*)([[:alpha:]]*))>/; if ($tag eq "\n") { # BLANK LINE } else { $tag =~ m/<t id=\((\d*)(.*)((\d*)([[:alpha:]]*))\)>/; $tags{sprintf("%04d%6s",$1 || 9999,$2)} = $tag; } } close NEWFILE; close FILE; open (NEWFILE, "> $output") or die "cannot open $output: $!\n"; foreach my $id ( sort keys %tags ) { print NEWFILE $tags{$id}; print $tags{$id}; } close NEWFILE; close FILE;

input.txt

<t id=2bc>Only the...</t>
<t id=1>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=1>Only the...</t>
<t id=1a>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=35>Only the...</t>
<t id=31>Only the...</t>
<t id=2b>Only the...</t>
<t id=4>Only the...</t>
<t id=42>Only the...</t>
<t id=5>Only the...</t>
<t id=51>Only the...</t>
<t id=2ac>Only the...</t>
<t id=52>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>

Required results: output.txt

<t id=1>Only the...</t>
<t id=2>Only the...</t>
<t id=3>Only the...</t>
<t id=4>Only the...</t>
<t id=5>Only the...</t>
<t id=6>Only the...</t>
<t id=7>Only the...</t>
<t id=12>Only the...</t>
<t id=21>Only the...</t>
<t id=31>Only the...</t>
<t id=35>Only the...</t>
<t id=42>Only the...</t>
<t id=51>Only the...</t>
<t id=52>Only the...</t>
<t id=1a>Only the...</t>
<t id=2ac>Only the...</t>
<t id=2b>Only the...</t>
<t id=2bc>Only the...</t>
  • Comment on Sorting numerials first and then numerials with alpha characters last
  • Download Code

Replies are listed 'Best First'.
Re: Sorting numerials first and then numerials with alpha characters last
by haukex (Archbishop) on Oct 29, 2021 at 13:31 UTC

    Though there are ways to do it in a single pass, I found it a bit easier to think about if one first splits the list out into two and then sorts them individually. Note I'm using Sort::Key::Natural to sort the mixed values in a hopefully intuitive way (Update: you may note I added the test case "12c" to demonstrate the difference to a simple sort).

    use warnings; use strict; use Test::More; use List::Util qw/uniq/; use Sort::Key::Natural qw/natsort/; my @in = ( "2bc", "1", "12c", "12", "21", "1", "1a", "2", "2", "2", "2", "3", "35", "31", "2b", "4", "42", "5", "51", "2ac", "52", "6", "7" ); my @expect = ( "1", "2", "3", "4", "5", "6", "7", "12", "21", "31", "35","42", "51", "52", "1a", "2ac", "2b", "2bc", "12c" ); my (@nums, @mixed); /^\d+$/ and push @nums, $_ or push @mixed, $_ for @in; my @out = uniq( sort( { $a <=> $b } @nums), natsort(@mixed), ); is_deeply \@out, \@expect or diag explain \@out; done_testing;

    As opposed to your previous thread, these tags look much more like HTML/XML, so you really should use an appropriate parser!

      That code assumes sort is stable. Use use sort 'stable'; to guarantee that.

        That code assumes sort is stable. Use use sort 'stable'; to guarantee that.

        what?

Re: Sorting numerials first and then numerials with alpha characters last
by hippo (Bishop) on Oct 29, 2021 at 13:43 UTC
    Code is slopppy. Just learning.

    That's fine - I will leave most of your code as it stands and put a Schwartzian Transform into your sort line. This produces your specified output - let's hope that's representative. :-)

    require 5.000; use warnings; use strict; use POSIX; my %tags = (); my $input = $ARGV[0]; my $output = $ARGV[1]; open (FILE, "< $input") or die "cannot open $input: $!\n"; while (my $tag = <FILE>) { $tag =~ m/<t id=(\d*)(.*)((\d*)([[:alpha:]]*))>/; if ($tag eq "\n") { # BLANK LINE } else { $tag =~ m/<t id=\((\d*)(.*)((\d*)([[:alpha:]]*))\)>/; $tags{sprintf("%04d%6s",$1 || 9999,$2)} = $tag; } } close FILE; open (NEWFILE, "> $output") or die "cannot open $output: $!\n"; foreach my $id ( map { $_ = $_->[2] } sort { $a->[1] cmp $b->[1] || $a->[0] <=> $b->[0] } map { /(\d+)([a-z]*)/; $_ = [$1, $2, $_] } keys %tags ) { print NEWFILE $tags{$id}; print $tags{$id}; } close NEWFILE;

    🦛

      not sure why you assign to $_ here ...

      >  $_ = [$1, $2, $_]

      But I felt tempted to try a variation with named captures for clarity. (tests borrowed from Hauke's post)

      use strict; use warnings; use Data::Dump; use Test::More; my @in = ( "2bc", "1", "12c", "12", "21", "1", "1a", "2", "2", "2", "2", "3", "35", "31", "2b", "4", "42", "5", "51", "2ac", "52", "6", "7" ); my @exp = ( "1", "2", "3", "4", "5", "6", "7", "12", "21", "31", "35","42", "51", "52", "1a", "2ac", "2b", "2bc", "12c", ); my %in; @in{@in}=(); # unique my @got = map { $_->{orig} } sort { $a->{alpha} cmp $b->{alpha} || $a->{num} <=> $b->{num} } map { /(?<num>\d+)(?<alpha>[a-z]*)/; { orig => $_, %+ } } keys %in; is_deeply(\@exp,\@got) or ddx [\@exp,\@got]; done_testing;

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Sorting numerials first and then numerials with alpha characters last
by johngg (Canon) on Oct 29, 2021 at 15:45 UTC

    An alternative to the Schwartzian Transform shown by hippo is the Guttman Rosler Transform where the sort keys are pulled from the text and then keys and text are formed into a single string that can be sorted alphanumerically. Once sorted the original text can be extracted from the end of the string. The string can be formed using something like sprintf or pack, which I use here.

    In the first map the regular expression captures one or more digits in $1 then zero or more letters in $2; note that no error checking for bad lines is done here. Then $2 is checked to see if there were letters, an indicator of their presence (1) or not (0) being the first sort key, pack'ed as an unsigned char in 1 byte using the "c" template; the second key is the numerical value packed in big-endian order (which sorts correctly) and the third is the letters, if present, packed with NULL padding to the right. Finally the original text is tacked on the end, also NULL padded. Then the string is sort'ed and the original line is unpack'ed in the second map, using the "x" template to skip over the keys and the "a" template to strip off any NULL padding. This code:-

    use strict; use warnings; open my $inFH, q{<}, \ <<__EOD__ or die $!; <t id=2bc>Only the...</t> <t id=1>Only the...</t> <t id=12>Only the...</t> <t id=21>Only the...</t> <t id=1>Only the...</t> <t id=1a>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=3>Only the...</t> <t id=17d>Only the...</t> <t id=35>Only the...</t> <t id=31>Only the...</t> <t id=2b>Only the...</t> <t id=4>Only the...</t> <t id=42>Only the...</t> <t id=5>Only the...</t> <t id=51>Only the...</t> <t id=2ac>Only the...</t> <t id=52>Only the...</t> <t id=6>Only the...</t> <t id=7>Only the...</t> __EOD__ print for map { unpack q{x13a*}, $_ } sort map { m{(?x) (?<=id=) ( \d+ ) ( [a-z]* ) (?=>)} && pack q{cNa8a*}, ( length $2 ? 1 : 0 ), $1, $2, $_ } <$inFH>; close $inFH or die $!;

    Produces this output.

    <t id=1>Only the...</t> <t id=1>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=3>Only the...</t> <t id=4>Only the...</t> <t id=5>Only the...</t> <t id=6>Only the...</t> <t id=7>Only the...</t> <t id=12>Only the...</t> <t id=21>Only the...</t> <t id=31>Only the...</t> <t id=35>Only the...</t> <t id=42>Only the...</t> <t id=51>Only the...</t> <t id=52>Only the...</t> <t id=1a>Only the...</t> <t id=2ac>Only the...</t> <t id=2b>Only the...</t> <t id=2bc>Only the...</t> <t id=17d>Only the...</t>

    I hope this is of interest.

    Update: Expanded wording about the primary sort key.

    Cheers,

    JohnGG

Re: Sorting numerials first and then numerials with alpha characters last
by tybalt89 (Monsignor) on Oct 29, 2021 at 16:26 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11138205 use warnings; open my $fh, '<', \<<END or die; <t id=2bc>Only the...</t> <t id=1>Only the...</t> <t id=12>Only the...</t> <t id=21>Only the...</t> <t id=1>Only the...</t> <t id=1a>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=2>Only the...</t> <t id=3>Only the...</t> <t id=35>Only the...</t> <t id=31>Only the...</t> <t id=2b>Only the...</t> <t id=4>Only the...</t> <t id=42>Only the...</t> <t id=5>Only the...</t> <t id=51>Only the...</t> <t id=2ac>Only the...</t> <t id=52>Only the...</t> <t id=6>Only the...</t> <t id=7>Only the...</t> END print sort { my ($anum, $achars, $bnum, $bchars) = map /<t id=(\d+)(\w*)>/, $a, $ +b; $achars cmp $bchars || $anum <=> $bnum } <$fh>;
Re: Sorting numerials first and then numerials with alpha characters last
by ikegami (Patriarch) on Oct 29, 2021 at 18:34 UTC
    use Sort::Key::Multi qw( iiskeysort ); my @sorted = iiskeysort { /id=(\d+)(\w*)/ or die; $2 ne "", $1, $2 } @inputs;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11138205]
Approved by Ratazong
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 22:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found