I am not sure that you need to subtract the length of the word you have just matched from the position in your pos() - length($1). I have been playing around combining elements of your solution and TedPride's to come up with text annotated with occurrence no., total occurrences and offset. My suspicions were raised when the first word "I" came up with an offset of -1.

Here's the code without the subtraction

use strict; use warnings; my $string; { local $/; $string = <DATA>; } my @words = split /[.,;:?! \n]+/, $string; my $rhWords = {}; my $order = 0; foreach my $word (@words) { my $lcWord = lc $word; push @{$rhWords->{$lcWord}->{order}}, ++ $order; $rhWords->{$lcWord}->{count} ++; } my %found = (); $string =~ s { ([^.,;:?! \n]+)(?{++ $found{lc $1}}) } { $1 . ( $rhWords->{lc $1}->{count} > 1 ? "[$^R/$rhWords->{lc $1}->{count}/@{[pos()]}]" : "" ) }xeg; print "\n$string\n"; __END__ I need to know how to compare items within a string... I have dropped a textfile into an array, but now I need to check whether words in that text are repeated throughout. I have split the text; as I only want the text to be manipulated. Maybe it's better to split it like this; So anyway, I basically need to check now across my string whether any elements in my string are repeated, and if so, how many times. I've read a lot about manipulating arrays, but they're all based on arrays that you create yourself, rather than arrays created by opening a textfile, so I'm not sure how to manipulate my array. Any help would be much appreciated.

and here's the output

I[1/6/0] need[1/3/2] to[1/7/7] know how[1/3/15] to[2/7/19] compare ite +ms within a[1/4/43] string[1/3/45]... I[2/6/55] have[1/2/58] dropped a[2/4/71] textfile[1/2/73] into an array[1/2/90], + but[1/2/97] now[1/2/101] I[3/6/105] need[2/3/107] to[3/7/112] check[ +1/2/115] whether[1/2/122] words in[1/2/136] that[1/2/139] text[1/3/144] are[1/2 +/149] repeated[1/2/153] throughout. I[4/6/174] have[2/2/176] split[1/ +2/181] the[1/2/188] text[2/3/192]; as I[5/6/201] only want the[2/2/213] text[ +3/3/217] to[4/7/222] be[1/2/225] manipulated. Maybe it's better to[5/7/260] split[2/2/263] it like this; So[1/3/283] anyway, I[ +6/6/294] basically need[3/3/306] to[6/7/311] check[2/2/314] now[2/2/321] across my[1/3/332] string[2/3/335] whether[2/2/342] any[1 +/2/350] elements in[2/2/363] my[2/3/366] string[3/3/369] are[2/2/376] + repeated[2/2/380], and if so[2/3/398], how[2/3/402] many times. I've read a[3/4/428] lot +about manipulating arrays[1/3/453], but[2/2/462] they're all based on arrays[2/3/487] that[2/2/494] you cr +eate yourself, rather than arrays[3/3/533] created by opening a[4/4/559] textfile[2/2/561], +so[3/3/571] I'm not sure how[3/3/587] to[7/7/591] manipulate my[3/3/606] array[2/2/609]. Any[2/2/616] help would be[2/2/ +631] much appreciated.

Empirically, this seems to work giving zero-based offsets. The documentation is rather terse but says that it returns the position where the last match left off, implying that your subtraction would be necessary. Strange.

Cheers,

JohnGG


In reply to Re^2: Newbie Q:How do I compare items within a string? by johngg
in thread Newbie Q:How do I compare items within a string? by PerlGrrl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.