in reply to Re: Newbie Q:How do I compare items within a string?
in thread Newbie Q:How do I compare items within a string?

I am not sure that you need to subtract the length of the word you have just matched from the position in your pos() - length($1). I have been playing around combining elements of your solution and TedPride's to come up with text annotated with occurrence no., total occurrences and offset. My suspicions were raised when the first word "I" came up with an offset of -1.

Here's the code without the subtraction

use strict; use warnings; my $string; { local $/; $string = <DATA>; } my @words = split /[.,;:?! \n]+/, $string; my $rhWords = {}; my $order = 0; foreach my $word (@words) { my $lcWord = lc $word; push @{$rhWords->{$lcWord}->{order}}, ++ $order; $rhWords->{$lcWord}->{count} ++; } my %found = (); $string =~ s { ([^.,;:?! \n]+)(?{++ $found{lc $1}}) } { $1 . ( $rhWords->{lc $1}->{count} > 1 ? "[$^R/$rhWords->{lc $1}->{count}/@{[pos()]}]" : "" ) }xeg; print "\n$string\n"; __END__ I need to know how to compare items within a string... I have dropped a textfile into an array, but now I need to check whether words in that text are repeated throughout. I have split the text; as I only want the text to be manipulated. Maybe it's better to split it like this; So anyway, I basically need to check now across my string whether any elements in my string are repeated, and if so, how many times. I've read a lot about manipulating arrays, but they're all based on arrays that you create yourself, rather than arrays created by opening a textfile, so I'm not sure how to manipulate my array. Any help would be much appreciated.

and here's the output

I[1/6/0] need[1/3/2] to[1/7/7] know how[1/3/15] to[2/7/19] compare ite +ms within a[1/4/43] string[1/3/45]... I[2/6/55] have[1/2/58] dropped a[2/4/71] textfile[1/2/73] into an array[1/2/90], + but[1/2/97] now[1/2/101] I[3/6/105] need[2/3/107] to[3/7/112] check[ +1/2/115] whether[1/2/122] words in[1/2/136] that[1/2/139] text[1/3/144] are[1/2 +/149] repeated[1/2/153] throughout. I[4/6/174] have[2/2/176] split[1/ +2/181] the[1/2/188] text[2/3/192]; as I[5/6/201] only want the[2/2/213] text[ +3/3/217] to[4/7/222] be[1/2/225] manipulated. Maybe it's better to[5/7/260] split[2/2/263] it like this; So[1/3/283] anyway, I[ +6/6/294] basically need[3/3/306] to[6/7/311] check[2/2/314] now[2/2/321] across my[1/3/332] string[2/3/335] whether[2/2/342] any[1 +/2/350] elements in[2/2/363] my[2/3/366] string[3/3/369] are[2/2/376] + repeated[2/2/380], and if so[2/3/398], how[2/3/402] many times. I've read a[3/4/428] lot +about manipulating arrays[1/3/453], but[2/2/462] they're all based on arrays[2/3/487] that[2/2/494] you cr +eate yourself, rather than arrays[3/3/533] created by opening a[4/4/559] textfile[2/2/561], +so[3/3/571] I'm not sure how[3/3/587] to[7/7/591] manipulate my[3/3/606] array[2/2/609]. Any[2/2/616] help would be[2/2/ +631] much appreciated.

Empirically, this seems to work giving zero-based offsets. The documentation is rather terse but says that it returns the position where the last match left off, implying that your subtraction would be necessary. Strange.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^3: Newbie Q:How do I compare items within a string?
by Zaxo (Archbishop) on May 09, 2006 at 09:35 UTC

    A tidier alternative to my pos() - length($1) is to consult @- .

    push @{$positions{lc($1)}}, $-[1] while $string =~ /([A-Za-z']+)/g;
    The difference in indexing is that your code is matching on seperator characters instead of word characters. The end of your first match is the start of my second.

    After Compline,
    Zaxo

      I don't think that's the difference. I split on separator characters when forming the array @words but I negate the character class when doing the s{ ... }{ ...}xeg to add the annotation. Thus, like you, I am pulling out words but by capturing one or more non-separator characters.

      Cheers,

      JohnGG

      Update: I substituted your pattern

      ([A-Za-z']+)(?{++ $found{lc $1}})

      for my pattern

      ([^.,;:?! \n]+)(?{++ $found{lc $1}})

      and the results were identical.