Yoda_Oz has asked for the wisdom of the Perl Monks concerning the following question:

hi, have a look at my code below. what i'm trying to do is search through a text file and get it to list all the punctuation characters in it. but it doesnt do it exactly how i want it to. its seems as if it is goes over one line and lists the characters and then it starts over again on the next line and then adds anything to the hash. i want it to only print out the amount of times a punctuation character comes up at the end...
#!usr/local/bin/perl print ("Enter filename to search to punctuation characters: "); $path=<STDIN>; print ("\n"); open(DATA, "<$path") || die "Couldn't open $path for reading: $!\n"; while (<DATA>) { while (s/([\041-\057]|[\72-\100]|[\133-\140]|[\173-\176])(.*)/$2/) + #ASCII octet punctuation + #characters { $char = $1; $wordHash{$char}++; } while ( ($punctuation, $count) = each(%wordHash) ) { $wordArray[$i] = "$punctuation\t$count"; $i++; print ("$punctuation\t$count\n"); } }
i get the right output, just i get it over and over again for some characters.. i dont know why! for example if i put in the filename as "wordHash.pl" which is the actual file (code) above it prints out this:
/ 3 # 1 ! 1 / 3 # 1 ! 1 / 3 : 1 # 1 " 2 ) 1 ( 1 ; 1 ! 1 $ 1 / 3 = 1 : 1 # 1 > 1 " 2 ) 1 < 1 ( 1 ; 2 ! 1 $ 1 \ 1 / 3 = 1 : 1 # 1 > 1 " 4 ) 2 < 1 ( 2 ; 3 ! 1 $ 4 \ 2 / 3 | 2 = 1 : 2 # 1 , 1 > 1 " 8 ) 3 ' 1 < 2 ( 3 ; 4 ! 2 $ 4 \ 2 / 3 | 2 = 1 : 2 # 1 , 1 > 2 " 8 ) 4 ' 1 < 3 ( 4 ; 4 ! 2 $ 4 \ 2 / 3 = 1 : 2 , 1 " 8 < 3 ; 4 ! 2 | 2 { 1 # 1 > 2 ) 4 ' 1 ( 4 $ 5 \ 10 / 6 = 1 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 4 ! 2 ] 4 | 5 { 1 # 1 > 2 ) 7 ' 1 ( 7 $ 5 \ 10 / 6 = 1 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 4 ! 2 ] 4 | 5 { 2 # 1 > 2 ) 7 ' 1 ( 7 $ 7 \ 10 / 6 = 2 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 5 ! 2 ] 4 | 5 { 2 # 1 > 2 ) 7 ' 1 ( 7 $ 9 \ 10 / 6 = 2 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 6 ! 2 ] 4 | 5 { 3 # 1 > 2 + 2 ) 7 ' 1 } 1 ( 7 $ 9 \ 10 / 6 = 2 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 6 ! 2 ] 4 | 5 { 3 # 1 > 2 + 2 ) 7 ' 1 } 2 ( 7 $ 9 \ 10 / 6 = 2 : 2 * 1 , 1 - 4 " 8 . 1 [ 4 < 3 ; 6 ! 2 ] 4 | 5 { 3 # 1 > 2 + 2 ) 7 ' 1 } 2 ( 7 $ 11 \ 10 / 6 = 3 : 2 * 1 , 2 - 4 " 8 . 1 [ 4 < 3 ; 6 ! 2 ] 4 | 5 { 3 % 1 # 1 > 2 + 2 ) 10 ' 1 } 2 ( 10 $ 11 \ 10 / 6 = 3 : 2 * 1 , 2 - 4 " 8 . 1 [ 4 < 3 ; 6 ! 2 ] 4 | 5 { 4 % 1 # 1 > 2 + 2 ) 10 ' 1 } 2 ( 10 $ 15 \ 11 / 6 = 4 : 2 * 1 , 2 - 4 " 10 . 1 [ 5 < 3 ; 7 ! 2 ] 5 | 5 { 4 % 1 # 1 > 2 + 2 ) 10 ' 1 } 2 ( 10 $ 16 \ 11 / 6 = 4 : 2 * 1 , 2 - 4 " 10 . 1 [ 5 < 3 ; 8 ! 2 ] 5 | 5 { 4 % 1 # 1 > 2 + 4 ) 10 ' 1 } 2 ( 10 $ 18 \ 13 / 6 = 4 : 2 * 1 , 2 - 4 " 12 . 1 [ 5 < 3 ; 9 ! 2 ] 5 | 5 { 4 % 1 # 1 > 2 + 4 ) 11 ' 1 } 2 ( 11 $ 18 \ 13 / 6 = 4 : 2 * 1 , 2 - 4 " 12 . 1 [ 5 < 3 ; 9 ! 2 ] 5 | 5 { 4 % 1 # 1 > 2 + 4 ) 11 ' 1 } 3 ( 11 $ 18 \ 13 / 6 = 4 : 2 * 1 , 2 - 4 " 12 . 1 [ 5 < 3 ; 9 ! 2 ] 5 | 5 { 4 % 1 # 1 > 2 + 4 ) 11 ' 1 } 4 ( 11
please help... <EDIT> sorted it, sorry. i had the second while loop within the first. when i put it outside of the 1st loop it worked! </EDIT>

20060130 Janitored by Corion: Added readmore tags

Replies are listed 'Best First'.
Re: punctuation search... using ascii
by bobf (Monsignor) on Jan 30, 2006 at 02:57 UTC

    First, use strict and warnings. Second, since you don't actually declare any of your variables, I can't tell what scope they're in. Are %wordHash and @wordArray supposed to be scoped within the outter while loop (i.e., reset with each line from DATA), or do they store data for the whole input file?

    Where does $i get set? It looks like each time you iterate through %wordHash with while ( ($punctuation, $count) = each(%wordHash) ) the array grows.

    More importantly, do you really intend to print the contents of %wordHash after every line read from DATA? Unless the hash is scoped within the outter while loop, you probably want to move the while/print block outside of that loop.

    You need to give us more information about what you are trying to do in order for us to be of more help. Please see How (Not) To Ask A Question.

    Update: Thank you for adding more information to your original post. It appears several of us were on the right track, but after reading the responses it is clear that there was still quite a bit of confusion. Be as specific as you can when you post - it will allow us to help you more efficiently.

    In addition, please read Writeup Formatting Tips and note the section on readmore tags. Including 400 lines of output without them makes the thread a bit harder to read.

Re: punctuation search... using ascii
by Errto (Vicar) on Jan 30, 2006 at 02:53 UTC

    At first reading, the reason is because you're using one global hash %wordHash whereas what you actually want to do is use a fresh hash for each line you read in the input. At least, I think so. If I'm right then the solution is to put the line

    my %wordHash;
    right inside your outer while loop. I also doubt that your inner loop does what you really want it to, but I'm not sure.

    Which brings me to my main point. I'm sorry to be blunt, but this needs to be said. When you are asking for help on an Internet forum, a terrible way to go about it is to say "you can kinda guess what im trying to do." Instead, say what you're trying to do clearly and concisely, then show the code. When you do that, your friendly readers may notice something about your code that's not going to work, based on your stated purpose. But without that stated purpose, we have no idea whether the code is "working" or not. So please help us out.

    Update: One other thing. The way you're adding elements to the array, using $i, will work but I strongly encourage you to use push instead because it's clearer and less likely to have errors.

Re: punctuation search... using ascii
by GrandFather (Saint) on Jan 30, 2006 at 02:46 UTC

    Perhaps you should tell us what you are trying to do. What you should do though is use strict; use warnings; and then explicitly declare your variables so that you and the compiler and we all understand what the various variables' scopes should be. I suspect that once you have done that you will then add some code to reset various variables, perhaps at the top of the main loop. But because I can't guess exactly what you want to do, you will have to figure that out for yourself :)


    DWIM is Perl's answer to Gödel
Re: punctuation search... using ascii
by japhy (Canon) on Jan 30, 2006 at 02:48 UTC
    My guess is that you're not clearing the %wordHash hash for each line of the file.
    while (<DATA>) { my %wordHash; ... }
    should do the trick, I think.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: punctuation search... using ascii
by McDarren (Abbot) on Jan 30, 2006 at 02:40 UTC
    you can kinda guess what im trying to do...

    erm, wouldn't it be better if you just told us?
    I mean - when was the last time you jumped in a taxi and said "hey, see if you can guess where I want to go!"

    please help...

    Sure... so what output are you expecting to see?

Re: punctuation search... using ascii
by rhesa (Vicar) on Jan 30, 2006 at 02:55 UTC
    What about moving that last while() out of the while(<DATA>) loop?