Yoda_Oz has asked for the wisdom of the Perl Monks concerning the following question:

#!usr/local/bin/perl print ("Enter filename to search to punctuation characters: "); $path=<STDIN>; print ("\n"); open(DATA, "<$path") || die "Couldn't open $path for reading: $!\n"; while (<DATA>) { while (s/([\041-\057]|[\72-\100]|[\133-\140]|[\173-\176])(.*)/$2/) { $char = $1; $wordHash{$char}++; } } while ( ($punctuation, $count) = each(%wordHash) ) { $wordArray[$i] = "$punctuation\t$count"; $i++; print ("$punctuation\t$count\n"); }
hi, the code above searches through a text file and spits out how many of each puntuation character it finds (according to each ASCII octet i have put in the regular expression bit). im after a different way of doing this second loop. so that instead of actually printing out the character it finds it prints out the word equivalent, ie "." equals fullstop, and "," equals comma. i guess id have to use some sort of hash to create a database of ,=comma and .=fullstop and (=open brackets etc. but im not quite sure how to do this... and then relate it to what it finds in the punctuation search. ie - what it does now:
? 1 . 6 , 5 ( 10 ) 10 ; 21 $ 19
what i want it to look like:
question mark 1 fullstop 6 comma 5 open bracket 10 close bracket 10 colon 21 dollar sign 19
cheers, PS sorry about the ambiguity of my last post!

Replies are listed 'Best First'.
Re: punctuation search (different!)
by davido (Cardinal) on Jan 30, 2006 at 03:21 UTC

    Yes, a hash is probably your solution. If you create a hash that looks like this:

    %charnames = ( '?' => 'question mark', '.' => 'fullstop', ',' => 'comma', # and so on );

    Then printing out the results would be as simple as this:

    print ("$charnames{$punctuation}\t$count\n");

    Dave

Re: punctuation search (different!)
by bobf (Monsignor) on Jan 30, 2006 at 03:25 UTC

    You answered your own question. A hash is a good way to go. Just translate the text in your question into code:

    use strict; use warnings; my %punct = ( '?' => 'question mark', ',' => 'comma', '(' => 'open bracket', # per your terminology ); while( my ( $symbol, $descr ) = each %punct ) { print "[$symbol] = $descr\n"; }

    Note the use strict; and use warnings at the top. :-)

    Update: A quick search of CPAN reveals Acme::MetaSyntactic::punctuation, which may already have what you need.

Re: punctuation search (different!)
by thundergnat (Deacon) on Jan 30, 2006 at 18:43 UTC

    Rather than build the hash yourself, let perl do it for you. This returns counts of punctuation but is extremely easy to modify to return other classes of characters.

    Update: added \p{Symbol} to filter regex.

    use warnings; use strict; use charnames(); my %chars; while (my $line = <DATA>) { $chars{chop $line}++ while length $line; } for (keys %chars) { next unless /\p{Punct}|\p{Symbol}/; # Comment out or change this l +ine to suit my $char_count = sprintf "%25s %8s", (lc charnames::viacode(ord $_)), $chars{$_}; print "$char_count\n"; } __DATA__ Is a question that will probably seem absurd to those who are at all familiar with mineral springs or Saratoga waters. Nevertheless, it is a not unfrequent and amusing occur- rence to hear remarks from strangers and greenies who have a preconceived notion that the springs are doctored, and that a mixture of salts, etc., is tipped in every night or early in the morning! Strange that the art should be limited to the village of Saratoga! The incredulity of some people is the most ridiculous credulity known. Such wonders as the spouting springs, the "strongest" in Sara- toga, come from so small an orifice in the ground, as to preclude the least possibility of adulteration. Besides, the manufactured article would be too costly to allow such im- mense quantities to flow away unused. '{{{}}}]][[-0)(*&^^!@#$>>,<<>`~`' __END__
    _____________________________________________________________
    returns:
                        comma        9
               quotation mark        2
               less-than sign        2
            greater-than sign        3
            circumflex accent        2
                    ampersand        1
             left parenthesis        1
                 grave accent        2
                  dollar sign        1
                     asterisk        1
                        tilde        1
                 hyphen-minus        4
                    full stop        5
          left square bracket        2
                commercial at        1
             exclamation mark        3
           left curly bracket        3
         right square bracket        2
                  number sign        1
                     low line        4
            right parenthesis        1
                   apostrophe        2
          right curly bracket        3