In my initial response to your question I mentioned that for robustness in getting through comma delimited text, the Text::CSV module is a good idea. That said, I don't expect that you'll be using it, because it sounds like your problem is homework-related, and thus, unless you really understand the module, it's probably best to stick to the coursework and not start introducing things that you haven't covered in class yet.

Within your problem, there is also the issue of what constitutes a word. I'm going to ignore the fact that a word cannot contain two hyphens next to each other, or two apostrophes, etc. For one thing, once I start down that road, the next thing you know, I'll be looking for spelling errors, and that's just beyond the scope of actual need. For the purposes of my example, I'll just strip anything that doesn't belong in a word out of a word, including punctuation, and assume that what's left is a word.

I decided to interpret your question as saying that you have a set of comma delimited strings, and that each substring might contain multiple words, but that you want to get a total word-count. I realize that you might want phrase-counts instead of word counts, but this is my spoiler, so I'll pick word-counts because doing so adds an extra level of fun.

I took the additional liberty of lower-casing all words, so that comparing "ApPleS" to "apples" and "APPLES" (but not "oranges") will be all the same thing.

In this example, I also made sure that lexical variables all fall out of their narrow a scope as early as possible. That's the sole reason for the outter-most { ... } block. ...It's really not necessary, but I was just fiddling and it came out this way.

If you're ready for the spoiler, read on. If you're not ready for it, don't:

use strict; use warnings; use Text::CSV; my %wordlist; { my $csv = Text::CSV->new(); while ( my $line = <DATA> ) { $csv->parse( $line ) or die "Improperly formatted CSV string: $line"; foreach my $field ( $csv->fields() ) { foreach my $word ( split /\s+/, $field ) { next unless $word; $word =~ s/[^[:alpha:]'-]//g; $wordlist{lc $word}++; } } } } printf "%-16s: $wordlist{$_}\n", $_ for sort keys %wordlist; __DATA__ hi, there, world, how, are you, today? What are you up to? Here's a word with an apostrophe. test3

Enjoy! Thanks for the fun question. Finally I found a reason to install Text::CSV.


Dave


In reply to Re: Comma separated list into a hash (SPOILER) by davido
in thread Comma separated list into a hash by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.