Re: What's the best way to do a pattern search like this?

Here is an example for you using a hash

# declare our vars
my (%codes, @array_codes);

#undef input record sep to get all data at once
local $/;

# make an array of codes by splitting DATA on whitespace
@array_codes = split /\s+/, <DATA>;

# map the codes to a hash, counting duplicates
# using a for loop for efficiency
foreach $code_key (@array_codes) {
   $codes{$code_key}++;
} 

# print it out
printf "$_\t$codes{$_}\n" for keys %codes;


__DATA__
baaba ba abab abab abab baaba baaba babaa. 
abab aaba ba abab ba. bababab abab abab ba aaba. 
ba bababab aaba abab babaa baaba ba baaba. 
aaba ba bababab ba bababab abab ba aaba abab baaba abab. 
ba abab abab ba.
[download]

Note that: map{....}@array is just another way of writing: for (@array) { .. }. To do it to a file all you need to do to use this is do somthing like:

sub count_codes {
  my $file = shift;
  open (FILE, "<$file") or die "Oops, perl says $!\n";
  local $/;
  my @array_codes = split /\s+/, <FILE>;
  close FILE;
  foreach $code_key (@array_codes) {
    $codes{$code_key}++;
  } 
  printf "$_\t$codes{$_}\n" for keys %codes;
}

# call sub
count_codes("/path/to/myfile.txt");
[download]

You have some full stops in there which I have assumed are part of the codes. If they are not you will need to filter them out using a regex in our for loop like this:

foreach $code_key (@array_codes) {
  $code_key =~ s/[.]//g;
  $codes{$code_key}++;
}
[download]

If you want filter out more characters add them to the character class between the [ ]

cheers

tachyon

Update

Removed lazy and inefficient map and replaced with proper for loop. Even typed foreach to remind me not to be so slack.

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Comment on Re: What's the best way to do a pattern search like this? Select or Download Code

Replies are listed 'Best First'.
Re: Re: What's the best way to do a pattern search like this? by MeowChow (Vicar) on Jul 20, 2001 at 11:07 UTC
If I don't mention it, someone else will. Don't suggest the use of map in a void context. You are taking the trouble to build a whole return list, which you just throw away. It is more efficient and idiomatic to use for for such tasks. MeowChow s aamecha.s a..a\u$&owag.print	[reply]
Re: Re: Re: What's the best way to do a pattern search like this? by tachyon (Chancellor) on Jul 20, 2001 at 11:28 UTC
Good point, I'll update the code. It's too much Golf you know, shaving those two chars by using map instead of for. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Re: Re: What's the best way to do a pattern search like this? by supernewbie (Beadle) on Jul 20, 2001 at 13:33 UTC
I tried your method. Everything works great, execpt the program will return something like: `ba. 1 ba 2 ........` [download] Should I do a s/\./ / on file.txt before process it through your function? What if there are other things like ? ! : ; " ' ( ) ...etc..	[reply] [d/l]
Re: Re: Re: What's the best way to do a pattern search like this? by davorg (Chancellor) on Jul 20, 2001 at 13:57 UTC
You just need to adjust the regex a little. `my @array_codes = split /\s+/, <FILE>;` [download] assumes that you're interested in all non-whitespace characters. Changing it to: `my @array_codes = split /\W+/, <FILE>;` [download] means that your're only interested in non-word characters (where word chars are A-Z, 0-9 and '-'). -- <http://www.dave.org.uk> Perl Training in the UK <http://www.iterative-software.com>	[reply] [d/l] [select]
Re: Re: Re: What's the best way to do a pattern search like this? by tachyon (Chancellor) on Jul 20, 2001 at 14:01 UTC
Hi, you have two options. If you wish to retain ultimate control split on whitespace and filter the elemets in @array_codes using this (as above) `$code_key =~ s/[.?!:;"'()]//g;` [download] This filters out all the stuff in the char class. Alternatively you can just grab alphanumerics in the first place like this: `@array_codes = <DATA> =~ m/\w+/g;` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: Re: Re: Re: What's the best way to do a pattern search like this? by supernewbie (Beadle) on Jul 20, 2001 at 14:16 UTC
I followed your instructions, and I used: `open (FILE, "file.txt") \|\| die "Can't open data file."; while (<FILE>) { @array_codes = split /\W+/; foreach $code_key (@array_codes) { $codes{$code_key}++; } } printf "$_\t$codes{$_}\n" for keys %codes;` [download] everything is outstanding, BUT the function also calculated the blank lines between the paragraphs. It outputed something like: `12 ab 42 aba 25 .......` [download] I tried s/\n/ /g, but it still counted the blank lines. How do I get rid of the blank lines in a txt documents, and relace them with a space? Or there is a way to not count the blank lines? Thank you very very much...	[reply] [d/l] [select]
Re: Re: Re: Re: Re: What's the best way to do a pattern search like this? by tachyon (Chancellor) on Jul 20, 2001 at 14:59 UTC