Here's a solution I came up with. I don't know that it's elegant, and I'm not sure how efficient it is. It gets the job done, though, so you may find it helpful.

The code works by checking each word in the string, and seeing if it could be the start of a match. In cases where the same word could start several matches, the longer matches are tried first. When a match is found, the link is inserted, and the search continues at the next word after the new link.

Enjoy!

#!/usr/local/bin/perl -w use strict; my %table = ( "lines_of_text" => "foo.html", "this" => "bar.html", "its_full" => "foobar.html", "this_thing" => "baz.html", ); # create a lookup table, based on the first words of each key my %lookup; while (my($key, $val) = each %table) { # clean lookup keys, just in case $key =~ tr/A-Za-z_//cd; $key =~ tr/_//s; my($first) = split /_/, $key; push @{$lookup{$first}}, [$key, $val]; } # sort each lookup array by length of matching text while (my($first, $aref) = each %lookup) { $lookup{$first} = [ sort {length $b->[0] <=> length $a->[0]} @{$lookup{$first}} ]; } my $string; { local $/; $string = <DATA>; } $string =~ /^\s*/g; # Here's where the fun starts! my $begin = pos($string); # for each word, see if it's the start of any of the matching texts FIRST: while ($string =~ /(\S+)(\s*)/g) { my $end = pos($string) - length $2; my $first = lc $1; $first =~ tr/A-Za-z//cd; my $matches = $lookup{$first}; next unless $matches; # for each possible matching text, see if a match occurs MATCHES: for my $m (0 .. $#$matches) { my $match = $matches->[$m]; my $words = $first; my $space = 0; # get the appropriate number of next words for (1 .. $match->[0] =~ tr/_//) { last unless $string =~ /(\S+)(\s*)/g; $words .= "_$1"; $space = length $2; } if ($words eq $match->[0]) { # match found: put a link around the text my $text = substr($string, $begin, pos($string) - $begin - + $space); my $link = qq{<A HREF="$match->[1]">$text</A>}; substr($string, $begin, pos($string) - $begin - $space) = +$link; pos($string) = $begin + length $link; # remove this match, so only the first occurence will be l +inked splice(@$matches, $m, 1); next FIRST; } pos($string) = $end; } } continue { $begin = pos($string); } print $string; __DATA__ This is just lines of text here, and also there. Consider this human readable text; it's full of letters and punctuation.

In reply to Re: Mass Text Replacement by chipmunk
in thread Mass Text Replacement by tedv

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.