Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

For the record I am new to perl. I just started working in it because it is supposed to have the best string manipulation of any language.
For a programing project I am attempting to make a psudo html language for some office workers who cannot seem to learn the language. So I have developed a dumb version for them, but I really have no idea how to parse them out.
I read up on regular expressions and tried the following bit of code
$text =~ s/(.*?)<link=(".*?")>(.*?)</link>/$1<a href=$2>$3</a>/gi;
It does not work. I really have no idea where to start trying to fix it, or what could even be wrong with it.
I really don't want to sound like I am asking or expecting you to do my work for me, but I could use some direction......

Replies are listed 'Best First'.
Re: capture psudo tags
by jeffa (Bishop) on Jul 23, 2004 at 20:13 UTC

    Not a Perl solution, but i use it in conjunction with Perl all the time, is HTML Area. It turns a text box into a WYSIWYG HTML editor. It was written exactly for problems like yours. ;)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: capture psudo tags
by dimar (Curate) on Jul 23, 2004 at 17:26 UTC
    For the record I am new to perl.

    That's good. I wish I knew about perlmonks.org when I first started. Already you are on good footing.

    for some office workers who cannot seem to (learn HTML), I have developed a dumb version for them

    Woaah nelly! You mean to say you are 1)learning perl; and 2)writing a new language intended to 'simplify' HTML; and 3)unleashing this new language on office workers; and 4)writing a parser for this language yourself by hand using Regular Expressions; and 5)learning Regular Expressions along the way? Please realize that such a multi-faceted plan of attack, even if not done simultaneously, smells like "biting off more than he could chew" ... This is a huge red flag.

    I could use some direction......

    If you really want advice on your "new simpler-than HTML" language, let others comment on that ... but you might want to explain a little more clearly what the syntax of your 'language' is, and then consider already-available options, like XML, YAML, or some other thing where you do not have to both invent the language and invent the parser yourself.

    As far as advice ... (my two cents) save the perl project for your own personal exploration, and give the office workers who cannot learn HTML a 'fill in the blank' style form. You can use a Spreadsheet, or a Web-Based form, or a program like Filemaker or MSFT Access to get the information from them. Then if you really want, you can use Perl to pretty-fy the user input into HTML or whatever else strikes your fancy.

Re: capture psudo tags
by Fletch (Bishop) on Jul 23, 2004 at 17:14 UTC

    a) You can't parse arbitrary (HT|SG|X)ML with just regular expressions; you need a full blown parser. Your approach is pretty much doomed to failure to begin with.

    b) You'd probably be better served as a learning experience using one of the already existing templating solutions (TT2, Mason, one of the wiki formatters; see Template, HTML::Mason, Text::WikiFormat) than attempting to roll your own from scratch.

Re: capture psudo tags
by LassiLantar (Monk) on Jul 23, 2004 at 17:25 UTC
    First, I agree, it's probably a bad idea to parse your own pseudo-html tags.

    I will note, however, that the main problem I see in this regex is the unescaped front slashes, "/", which should be written "\/" when they're inside the regex. Also (correct me if I'm wrong I'm not so good at regex myself), doesn't * include 0 instances of a character, making the ? unnecessary?

    LassiLantar

      Or you could use something other than / for your delimiters, if you're going to have a lot of slashes in your patterns. For instance,
      $text =~ s!(.*?)<link=(".*?")>(.*?)</link>!$1<a ref=$2>$3</a>!gi;
      would work. Additionally, there's no need for the first match, so it could just be
      $text =~ s!<link=(".*?")>(.*?)</link>!<a ref=$2>$3</a>!gi;
      As for the ?, it is meaningful. *? is a non-greedy match. But in general, I agree with the others that the whole attempt could be an interesting playpen for learning, but really shouldn't be attempted seriously.
Re: capture pseudo-tags
by pbeckingham (Parson) on Jul 23, 2004 at 18:00 UTC

    Ignoring the ill-advised venture of parsing HTML with regular expressions, the following is a reduced version of your code that works better. You really only need to replace small <link> tags, not the entire text every time.

    $text =~ s/<link=("[^"]*")>(.*?)<\/link>/<a href=$1>$2/gi;
    The reason your code isn't working is that the / characters aren't escaped as \/ or is that Perlmonks not faithfully reproducing your input?

    Update: Code improved - thank you Eimi Metamorphoumai.

      But that doesn't accurately replace the closing </link> tags with </a>.

        Oops - thanks for pointing that out. I was careless.

Re: capture psudo tags
by jcpunk (Friar) on Jul 23, 2004 at 19:30 UTC
    not that I can add any input beyond what has been said already, but speaking as someone who was dumb enough to try it...
    UNDER NO CIRCUMSTANCES IS IT WORTH THE EFFORT
    there is enough good stuff in CPAN start there.

    jcpunk
    all code is tested, and doesn't work so there :p (varient on common PM sig for my own ammusment)