Re: Newbie Q:How do I compare items within a string?

Instead of splitting into an intermediate array, you can extract the word positions directly from a regex scan. It goes like this:

my $string =
    q(So anyway, I basically need to check now across my
string whether any elements in my string are repeated, and 
if so, how many times. I've read alot about manipulating 
arrays, but they're all based on arrays that you create 
yourself, rather than arrays created by opening a textfile, 
so I'm not sure how to manipulate my array. Any help would 
be much appreciated.);

my %positions;

push @{$positions{lc($1)}}, pos() - length($1)
    while $string =~ /([A-Za-z']+)/g;

{
    local $_;
    print "$_\t@{$positions{$_}}\n"
        for keys %positions;
}
[download]

That hash gives you a reference to a sorted array of string positions for each word found. In scalar context, the referenced arrays give the word count.

Another thing that gives you is that you get to say directly what a word character is, instead of defining what splits them. I used that to include contractions (while messing up any single-quoted passages).

After Compline,
Zaxo

Comment on Re: Newbie Q:How do I compare items within a string? Download Code

Replies are listed 'Best First'.
Re^2: Newbie Q:How do I compare items within a string? by johngg (Canon) on May 09, 2006 at 09:16 UTC
I am not sure that you need to subtract the length of the word you have just matched from the position in your `pos() - length($1)`. I have been playing around combining elements of your solution and TedPride's to come up with text annotated with occurrence no., total occurrences and offset. My suspicions were raised when the first word "I" came up with an offset of -1. Here's the code without the subtraction Read more... (1535 Bytes) and here's the output Read more... (3 kB) Empirically, this seems to work giving zero-based offsets. The documentation is rather terse but says that it returns the position where the last match left off, implying that your subtraction would be necessary. Strange. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Newbie Q:How do I compare items within a string? by Zaxo (Archbishop) on May 09, 2006 at 09:35 UTC
A tidier alternative to my `pos() - length($1)` is to consult `@-` . `push @{$positions{lc($1)}}, $-[1] while $string =~ /([A-Za-z']+)/g;` [download] The difference in indexing is that your code is matching on seperator characters instead of word characters. The end of your first match is the start of my second. After Compline, Zaxo	[reply] [d/l]
Re^4: Newbie Q:How do I compare items within a string? by johngg (Canon) on May 09, 2006 at 10:23 UTC
I don't think that's the difference. I `split` on separator characters when forming the array `@words` but I negate the character class when doing the `s{ ... }{ ...}xeg` to add the annotation. Thus, like you, I am pulling out words but by capturing one or more non-separator characters. Cheers, JohnGG Update: I substituted your pattern `([A-Za-z']+)(?{++ $found{lc $1}})` [download] for my pattern `([^.,;:?! \n]+)(?{++ $found{lc $1}})` [download] and the results were identical.	[reply] [d/l] [select]