lampros21_7 has asked for the wisdom of the Perl Monks concerning the following question:

Hi to the monks I have looked at previous posts and can't find something similar. I want to do something and i thought Perl would be ideal due to its properties with using text. Basically, i want to be able to compare some text strings and for every pair that are the same add 1. That is if there are no similar strings get a 0 and for every pair of similar strings add 1 so the final output would be a number. I presume i would need a parser to do this but i haven't used one before and haven't got a clue how and if i could do this? Any ideas?

Hopefully what i ve written makes sense. Thanks

2006-01-29 Retitled by g0n, as per Monastery guidelines
Original title: 'Parsing Text Into Numbers'

Replies are listed 'Best First'.
Re: Counting Similar Strings
by McDarren (Abbot) on Jan 28, 2006 at 11:50 UTC
    What you want here is a hash.
    Basically, what you do is iterate through your list of strings, assigning each one as a key to your hash. The value of each key will be the number of times that particular string has been "seen". Here is a very simple example to demonstrate what I mean:
    #!/usr/bin/perl -w use strict; use Data::Dumper::Simple; my %strings; while (<DATA>) { chomp; $strings{$_}++; } print Dumper(%strings); __DATA__ string1 string2 string3 string1 string1 string7 string2
    Which prints..
    %strings = ( 'string3' => 1, 'string7' => 1, 'string1' => 3, 'string2' => 2 );

    Have a look in the Q&A section under hashes for some more examples.

    Hope this helps,
    Darren :)

Re: Counting Similar Strings
by TedPride (Priest) on Jan 29, 2006 at 10:51 UTC
    Yes, but that doesn't give the output which he wants, which is the total number of pairs. For each string, you can use the formula n(n-1)/2 to get the number of pairs (4 copies of a string = 4(3)/2, or 6 pairs), or just add all the pairs as you go:
    use strict; use warnings; my (%hash, $pairs); while (<DATA>) { chomp; $pairs += $hash{$_}++; } print $pairs; __DATA__ string1 string2 string3 string1 string1 string7 string2