harmattan_ has asked for the wisdom of the Perl Monks concerning the following question:
I need to do the 4 things commented in the code below, then save them to a new file. I have tried everything I've seen online but cant get anywhere. My approach is to open the input essay and the output file handles. Following this, I believe that I should read the essay into an array to make sorting easy. When I did this, it sorted the entire line, lol. Would I have to split first?
#!/usr/bin/perl # read $essay = '/directory/essay.txt'; $new = '/directory/new.txt'; # Open file handle to read open(IN, "<", $essay); # Open file handle to write open(OUT, ">>", $new); #read file into array @yes_finally = <IN>; # Things to do while reading while($in = <IN>){ Sort alphabetically. }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Help sorting contents of an essay
by haukex (Archbishop) on Apr 18, 2020 at 06:34 UTC | |
I have tried everything I've seen online but cant get anywhere. Unfortunately, you don't show what code you've tried to do these things, so we can't help you with the specific problems you had - we're happy to help you learn if you show your efforts (= code). Here are a few general hints: | [reply] [d/l] [select] |
|
Re: Help sorting contents of an essay
by AnomalousMonk (Archbishop) on Apr 18, 2020 at 07:51 UTC | |
I'm confused by some of your statements of your requirements. #1. Sort alphabetically (ignoring capitalization).These seem like two separate requirements. Do you want to do #1 first and then use the result to do #2, or do you want to do both and save both sets of results? #3. Sort by frequency, from high to low, (any order for equal frequency).Again, these requirements seem at odds. Can you please clarify? Please see Short, Self-Contained, Correct Example for info on providing example input and desired output and maybe also the actual code you've got so far. Maybe even see How to ask better questions using Test::More and sample data for a way to posit desired input/output examples. Be that as it may, here's an approach to extracting words from a multi-line block of text and then sorting first alphabetically (upper-case first) and second by word frequency. Note that, e.g., 'Spain' and 'Spain.' are extracted and counted separately because of the period at the end of one of them, and punctuation like , ; : ! ? ... will have a similar effect. This effect is due to the naive definition of the $rx_word regex; a better definition could eliminate such punctuation, but just what constitutes a "word" is tricky to define in general.
Note also that the entire content of a file can be read to a scalar string with the idiom Update: The idiom used to produce the @sorted array is known as a Schwartzian Transform (ST). Please see A Fresh Look at Efficient Perl Sorting for more info on this and other sorting idioms. Also see "How do I sort an array by (anything)?" in perlfaq4 and sort. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
by tobyink (Canon) on Apr 18, 2020 at 13:15 UTC | |
I would suggest /\w+/ would be a pretty sensible place to start for matching words. Hyphenated words will be matched as two separate words, which may or may not be what you want, depending on the task at hand. | [reply] [d/l] |
by AnomalousMonk (Archbishop) on Apr 18, 2020 at 20:31 UTC | |
Ah, what's in a word? In addition to hyphenations, I was thinking of cases like son's sons' wouldn't wouldn't've O'Brien ain't t'ain't etc, etc. And that's just ASCII English! \w+ might be perfect for harmattan_'s application, but I don't know what that application is. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
|
Re: Help sorting contents of an essay
by kcott (Archbishop) on Apr 18, 2020 at 09:00 UTC | |
G'day harmattan_, Welcome to the Monastery. There are some issues with your code which aren't helping you. There are some issues with your post which aren't helping us to help you. You have two fundamental flaws in the code you have supplied. In the code below, I've shown a single pass through the input which collects the data (@data_all) as well as other information (%data_info) that is used in various places by sort — there's no need to recalculate counts, or perform transformation for case-insensitive checks, multiple times. Note: I used lc but fc would be a far better choice; fc requires Perl 5.16 or later — use fc if you have an appropriate Perl version. You'll note a map-sort-map pattern in the code. That's called a Schwartzian Transform. Take a look at "A Fresh Look at Efficient Perl Sorting" for a description of that and other sorting methods. I've include example code for each of the four sorts you mentioned. I believe the first three are what you want. The fourth may not be exactly what you're after: this is an example where expected output, as I wrote about above, would have really helped.
Output:
— Ken | [reply] [d/l] [select] |
|
Re: Help sorting contents of an essay
by BillKSmith (Monsignor) on Apr 18, 2020 at 17:51 UTC | |
The first two things you must do is read the essay and divide it into words. The best way to read it depends on how you plan to divide it. The best way to divide it depends on the format of the essay and your definition of 'word'. You probably think that this is obvious and if there is an occasional problem, you will deal with it later. That is a big mistake. For simplicity, let us assume that the essay consists only of English words (only ASCII letters, no numbers, no hyphenated or foreign words) with standard English punctuation(,.'"?!). I will also assume that the essay is less than 10,000 characters long and that it is divided into lines less than 80 characters long. Lines are separated by newlines. Paragraphs are separated by blank lines. Sentences are separated by two spaces (or a newline). Words are separated by a single space. A program which handles this very well may be extremely difficult to modify, You should let us know which of these assumptions are not true and which are likely to change in the future. You specify four outputs. Do you really want them all written to the same file? If so, how whould they be identified (or at least separated)?
Bill
| [reply] |
|
Re: Help sorting contents of an essay
by leszekdubiel (Scribe) on Apr 18, 2020 at 21:33 UTC | |
| [reply] [d/l] |