in reply to Re: Re: What's the best way to do a pattern search like this?
in thread What's the best way to do a pattern search like this?

Hi, you have two options. If you wish to retain ultimate control split on whitespace and filter the elemets in @array_codes using this (as above)

$code_key =~ s/[.?!:;"'()]//g;

This filters out all the stuff in the char class. Alternatively you can just grab alphanumerics in the first place like this:

@array_codes = <DATA> =~ m/\w+/g;

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Re: Re: What's the best way to do a pattern search like this?
by supernewbie (Beadle) on Jul 20, 2001 at 14:16 UTC
    I followed your instructions, and I used:
    open (FILE, "file.txt") || die "Can't open data file."; while (<FILE>) { @array_codes = split /\W+/; foreach $code_key (@array_codes) { $codes{$code_key}++; } } printf "$_\t$codes{$_}\n" for keys %codes;
    everything is outstanding, BUT the function also calculated the blank lines between the paragraphs. It outputed something like:
    12 ab 42 aba 25 .......
    I tried s/\n/ /g, but it still counted the blank lines. How do I get rid of the blank lines in a txt documents, and relace them with a space? Or there is a way to not count the blank lines? Thank you very very much...

      As noted by MeowChow the devil is in the details:

      #!/usr/bin/perl -w use strict; my %codes; open (FILE, "file.txt") || die "Can't open data file, perl says $!"; while (<FILE>) { my @array_codes = /[A-Za-z0-9]+/g; foreach my $code_key (@array_codes) { $codes{$code_key}++; } } print "$_\t$codes{$_}\n" for sort keys %codes;

      This will get groups of chars that match A-Za-z0-9 only and then prints out the sorted hash. I have added the $! var in the die which contains Perl's error message and sort the keys. I have added us strict, -w and the my declarations. Also note that printf was a typo. I meant print.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print