G'day harmattan_,
Welcome to the Monastery.
There are some issues with your code which aren't helping you.
-
You haven't used the strict or
warnings pragmata.
You should include these in all of your code.
See "perlintro: Safety net".
-
You've used package variables for filehandles;
choose lexical filehandles instead (and preferably in the smallest scope possible).
If you get into the habit of using names like IN and OUT,
you'll likely continue that usage in larger programs and possibly run into all sorts of problems.
See "perlintro: Files and I/O".
-
You haven't checked if I/O operations have been successful.
The link above (perlintro: Files and I/O)
shows one way to do this (i.e. '... or die "...";').
I find hand-crafting all of those die messages tedious and, frankly, error-prone:
it's very easy to miss them out, not update them when other code changes, and so on.
A far easier way is to let Perl handle all of that for you with the
autodie pragma;
there are some cases where that's not appropriate, but mostly it is:
I use it whenever possible; including in production code.
There are some issues with your post which aren't helping us to help you.
-
It helps us greatly if you provide a small, but representative, input sample;
and exactly the output you expect from that data.
Please post such data within <code>...</code> tags
— as you did with your code —
so we can use the [download] link to get a verbatim copy of your data.
-
Prosaic descriptions of input, processing, and output, are rarely useful:
"a picture paints a thousand words" and so does data!
-
Please show us what you've tried, rather than just saying you tried lots of (unspecified) things.
You may be on the right track and we can nudge you closer to a solution;
you may be working under some misapprehension and going completely down the wrong path
— we can help with that too if we know what you're doing wrong.
-
Also show us excatly the output you're getting,
including all error and warning messages.
Please also provide that within <code>...</code> tags.
-
Have a look at these links: "How do I post a question effectively?" and "Short, Self-Contained, Correct Example".
You have two fundamental flaws in the code you have supplied.
-
You are reading all input records with '@yes_finally = <IN>'
and then attempt to read more records with 'while($in = <IN>)'.
I don't think you need to do this here; however, for future reference,
you'd need to reposition the file pointer back to the start
(see seek)
and possible reset the record counter (see "perlvar: $.").
-
Your comments regarding sorting are all within the while loop.
You won't be able to sort the data until you have the data to sort.
I think you're completely on the wrong track here; although, without any code, I can't tell for certain
— this may be your main stumbling block.
In the code below, I've shown a single pass through the input which collects the data (@data_all)
as well as other information (%data_info) that is used in various places by
sort —
there's no need to recalculate counts, or perform transformation for case-insensitive checks, multiple times.
Note: I used lc
but fc would be a far better choice;
fc requires Perl 5.16 or later — use fc if you have an appropriate Perl version.
You'll note a map-sort-map pattern in the code.
That's called a Schwartzian Transform.
Take a look at "A Fresh Look at Efficient Perl Sorting"
for a description of that and other sorting methods.
I've include example code for each of the four sorts you mentioned.
I believe the first three are what you want.
The fourth may not be exactly what you're after:
this is an example where expected output, as I wrote about above, would have really helped.
#!/usr/bin/env perl
use strict;
use warnings;
use constant {
NO_CASE => 0,
COUNT => 1,
};
my (@data_all, %data_info);
while (<DATA>) {
chomp;
push @data_all, $_;
if (exists $data_info{$_}) {
++$data_info{$_}[COUNT];
}
else {
$data_info{$_} = [lc, 1];
}
}
print "Sort alphabetically - ignore case\n";
print "$_\n" for
map { $_->[0] }
sort {
$a->[1] cmp $b->[1]
}
map { [
$_,
$data_info{$_}[NO_CASE]
] }
@data_all;
print "Sort alphabetically - capitalisation matters\n";
print "$_\n" for
map { $_->[0] }
sort {
$a->[1] cmp $b->[1]
||
$a->[2] cmp $b->[2]
||
$a->[0] cmp $b->[0]
}
map { [
$_,
substr($data_info{$_}[NO_CASE], 0, 1),
substr($_, 0, 1)
] }
@data_all;
print "Sort by frequency - ignore alphabetical order\n";
print "$_->[1]: $_->[0]\n" for
sort {
$b->[1] <=> $a->[1]
}
map { [
$_,
$data_info{$_}[COUNT]
] }
keys %data_info;
print "Sort by frequency - then by alphabetical order\n";
print "$_->[1]: $_->[0]\n" for
sort {
$b->[1] <=> $a->[1]
||
$a->[0] cmp $b->[0]
}
map { [
$_,
$data_info{$_}[COUNT]
] }
keys %data_info;
__DATA__
bb
Aa
CC
dD
bb
AA
dD
aa
BB
aa
dD
aA
Output:
Sort alphabetically - ignore case
Aa
AA
aa
aa
aA
bb
bb
BB
CC
dD
dD
dD
Sort alphabetically - capitalisation matters
AA
Aa
aA
aa
aa
BB
bb
bb
CC
dD
dD
dD
Sort by frequency - ignore alphabetical order
3: dD
2: aa
2: bb
1: Aa
1: BB
1: aA
1: CC
1: AA
Sort by frequency - then by alphabetical order
3: dD
2: aa
2: bb
1: AA
1: Aa
1: BB
1: CC
1: aA
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.