comment on

Here's a solution I came up with. I don't know that it's elegant, and I'm not sure how efficient it is. It gets the job done, though, so you may find it helpful.

The code works by checking each word in the string, and seeing if it could be the start of a match. In cases where the same word could start several matches, the longer matches are tried first. When a match is found, the link is inserted, and the search continues at the next word after the new link.

Enjoy!

#!/usr/local/bin/perl -w

use strict;

my %table = (
    "lines_of_text" => "foo.html",
    "this" => "bar.html",
    "its_full" => "foobar.html",
    "this_thing" => "baz.html",
);


# create a lookup table, based on the first words of each key

my %lookup;

while (my($key, $val) = each %table) {

    # clean lookup keys, just in case
    $key =~ tr/A-Za-z_//cd;
    $key =~ tr/_//s;

    my($first) = split /_/, $key;

    push @{$lookup{$first}}, [$key, $val];

}


# sort each lookup array by length of matching text

while (my($first, $aref) = each %lookup) {

    $lookup{$first} = [ sort {length $b->[0] <=> length $a->[0]}
                        @{$lookup{$first}} ];

}


my $string;
{
    local $/;
    $string = <DATA>;
}

$string =~ /^\s*/g;



# Here's where the fun starts!

my $begin = pos($string);


# for each word, see if it's the start of any of the matching texts

FIRST:
while ($string =~ /(\S+)(\s*)/g) {

    my $end = pos($string) - length $2;

    my $first = lc $1;
    $first =~ tr/A-Za-z//cd;

    my $matches = $lookup{$first};
    next unless $matches;


    # for each possible matching text, see if a match occurs

   MATCHES:
    for my $m (0 .. $#$matches) {

        my $match = $matches->[$m];

        my $words = $first;

        my $space = 0;


        # get the appropriate number of next words

        for (1 .. $match->[0] =~ tr/_//) {
            last unless $string =~ /(\S+)(\s*)/g;
            $words .= "_$1";
            $space = length $2;
        }

        if ($words eq $match->[0]) {

            # match found: put a link around the text

            my $text = substr($string, $begin, pos($string) - $begin -
+ $space);
            my $link = qq{<A HREF="$match->[1]">$text</A>};

            substr($string, $begin, pos($string) - $begin - $space) = 
+$link;

            pos($string) = $begin + length $link;
  

            # remove this match, so only the first occurence will be l
+inked

            splice(@$matches, $m, 1);

            next FIRST;

        }

        pos($string) = $end;

    }

} continue {

    $begin = pos($string);

}

print $string;

__DATA__
This is just lines of text here, and also there.  Consider
this human readable text; it's full of letters and
punctuation.
[download]

In reply to Re: Mass Text Replacement by chipmunk
in thread Mass Text Replacement by tedv

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.