comment on

This is what I've ended up with:

    if ($field eq "comments") {        

        # Remove any links (because they break URL to link conversion)
        $$field =~ s/<A.*?HRef.*?>//isg; $$field =~ s/<\/A>//isg;

        # Extract any image links and add them to an array for safe-ke
+eping, replace them with placeholders
        $image_database = 0;
        while ($$field =~ /<Img(.*?)>/) {
            $$field =~ s/(<Img(.*?)>)/\[My_Image=$image_database\]/iso
+;
            $images[$image_database] = $1;
            $image_database ++;
        }

        # If HTML is not allowed, strip any remaining HTML
        if ($allow_html != 1) { $$field =~ s/<(?:[^>'"]*|(['"]).*?\1)*
+>//gs; }

        # Convert URL's and e-mail addresses to links (with regex)
        $$field =~ s/(((ht|f)tp):(\/\/)[a-z0-9%&_\-\+=:@~#\/.\?]+(\/|[
+a-z]))/<A HRef="$1" Target="_blank">$1<\/A>/isg;
        $$field =~ s/(^\W|\s)([a-z0-9_\-.]+\@[a-z0-9_\-]+\.[a-z]+)(.*?
+$)/$1<A HRef="mailto:$2">$2<\/A>$3/mig;

        # Replace the image placeholders with their corresponding imag
+es
        $image_database = 0;
        while ($$field =~ /\[My_Image=(\d*)\]/) {
            $img_src = $images[$1];
            $$field =~ s/\[My_Image=(\d*)\]/$img_src/iso;
            $image_database ++;
        }

    }
[download]

(Yes, I know I'm not using "strict" - this is a prototype only).

Anyone see any problems with this code?

In theory, there is no difference between theory and practise. But in practise, there is.

Jonathan M. Hollin
Digital-Word.com

In reply to Re: HTML Parsing by DarkBlue
in thread HTML Parsing by DarkBlue

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.