in reply to Help sorting contents of an essay
I'm confused by some of your statements of your requirements.
#1. Sort alphabetically (ignoring capitalization).These seem like two separate requirements. Do you want to do #1 first and then use the result to do #2, or do you want to do both and save both sets of results?
#2. Sort alphabetically with upper case words just in front of lower case words with the same initial characters.
[Emphases added.]
#3. Sort by frequency, from high to low, (any order for equal frequency).Again, these requirements seem at odds. Can you please clarify?
#4. Sort by frequency, with alphabetical order for words with the same frequency.
[Emphases added.]
Please see Short, Self-Contained, Correct Example for info on providing example input and desired output and maybe also the actual code you've got so far. Maybe even see How to ask better questions using Test::More and sample data for a way to posit desired input/output examples.
Be that as it may, here's an approach to extracting words from a multi-line block of text and then sorting first alphabetically (upper-case first) and second by word frequency.
Note that, e.g., 'Spain' and 'Spain.' are extracted and counted separately because of the period at the end of one of them, and punctuation like , ; : ! ? ... will have a similar effect. This effect is due to the naive definition of the $rx_word regex; a better definition could eliminate such punctuation, but just what constitutes a "word" is tricky to define in general.c:\@Work\Perl\monks>perl use strict; use warnings; use Data::Dump qw(dd); # for debug my $text = <<'EOT'; Now is the time, now is the hour. The rain in Spain falls mainly in Spain. The rain in Spain falls mainly in Spain. Foo foo foo Bar bar bar FOO BAR FOO BAR EOT print "[[$text]] \n"; # for debug my $rx_word = qr{ \S+ }xms; my @words = $text =~ m{ $rx_word }xmsg; # dd \@words; # for debug my %word_count; ++$word_count{$_} for @words; # dd \%word_count; # for debug my @sorted = sort { $a->[0] cmp $b->[0] # sort first by alpha ascending or $a->[1] <=> $b->[1] # then by frequency ascending } map [ $_, $word_count{$_} ], keys %word_count ; dd \@sorted; # for debug print "'$_->[0]' ($_->[1]) \n" for @sorted; __END__ [[Now is the time, now is the hour. The rain in Spain falls mainly in Spain. The rain in Spain falls mainly in Spain. Foo foo foo Bar bar bar FOO BAR FOO BAR ]] [ ["BAR", 2], ["Bar", 1], ["FOO", 2], ["Foo", 1], ["Now", 1], ["Spain", 2], ["Spain.", 2], ["The", 2], ["bar", 2], ["falls", 2], ["foo", 2], ["hour.", 1], ["in", 4], ["is", 2], ["mainly", 2], ["now", 1], ["rain", 2], ["the", 2], ["time,", 1], ] 'BAR' (2) 'Bar' (1) 'FOO' (2) 'Foo' (1) 'Now' (1) 'Spain' (2) 'Spain.' (2) 'The' (2) 'bar' (2) 'falls' (2) 'foo' (2) 'hour.' (1) 'in' (4) 'is' (2) 'mainly' (2) 'now' (1) 'rain' (2) 'the' (2) 'time,' (1)
Note also that the entire content of a file can be read to a scalar string with the idiom
my $text = do { local $/; <$filehandle>; };
See perlvar for $/ info.
Update: The idiom used to produce the @sorted array
is known as a Schwartzian Transform (ST). Please see A Fresh Look at Efficient Perl Sorting for more info on this and other sorting idioms. Also see "How do I sort an array by (anything)?" in perlfaq4 and sort.my @sorted = sort { $a->[0] cmp $b->[0] # sort first by alpha ascending or $a->[1] <=> $b->[1] # then by frequency ascending } map [ $_, $word_count{$_} ], keys %word_count ;
Give a man a fish: <%-{-{-{-<
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Help sorting contents of an essay
by tobyink (Canon) on Apr 18, 2020 at 13:15 UTC | |
by AnomalousMonk (Archbishop) on Apr 18, 2020 at 20:31 UTC |