in reply to Comma separated list into a hash

In my initial response to your question I mentioned that for robustness in getting through comma delimited text, the Text::CSV module is a good idea. That said, I don't expect that you'll be using it, because it sounds like your problem is homework-related, and thus, unless you really understand the module, it's probably best to stick to the coursework and not start introducing things that you haven't covered in class yet.

Within your problem, there is also the issue of what constitutes a word. I'm going to ignore the fact that a word cannot contain two hyphens next to each other, or two apostrophes, etc. For one thing, once I start down that road, the next thing you know, I'll be looking for spelling errors, and that's just beyond the scope of actual need. For the purposes of my example, I'll just strip anything that doesn't belong in a word out of a word, including punctuation, and assume that what's left is a word.

I decided to interpret your question as saying that you have a set of comma delimited strings, and that each substring might contain multiple words, but that you want to get a total word-count. I realize that you might want phrase-counts instead of word counts, but this is my spoiler, so I'll pick word-counts because doing so adds an extra level of fun.

I took the additional liberty of lower-casing all words, so that comparing "ApPleS" to "apples" and "APPLES" (but not "oranges") will be all the same thing.

In this example, I also made sure that lexical variables all fall out of their narrow a scope as early as possible. That's the sole reason for the outter-most { ... } block. ...It's really not necessary, but I was just fiddling and it came out this way.

If you're ready for the spoiler, read on. If you're not ready for it, don't:

use strict; use warnings; use Text::CSV; my %wordlist; { my $csv = Text::CSV->new(); while ( my $line = <DATA> ) { $csv->parse( $line ) or die "Improperly formatted CSV string: $line"; foreach my $field ( $csv->fields() ) { foreach my $word ( split /\s+/, $field ) { next unless $word; $word =~ s/[^[:alpha:]'-]//g; $wordlist{lc $word}++; } } } } printf "%-16s: $wordlist{$_}\n", $_ for sort keys %wordlist; __DATA__ hi, there, world, how, are you, today? What are you up to? Here's a word with an apostrophe. test3

Enjoy! Thanks for the fun question. Finally I found a reason to install Text::CSV.


Dave