Peamasii has asked for the wisdom of the Perl Monks concerning the following question:

What regexp would you apply to return a substring enclosed by "<" and ">? I want to parse an url like this:

http://www.com/index.pl?id=<idparm>&etc

and return <idparm>.

thanks
peamasii

Replies are listed 'Best First'.
Re: Matching enclosed expression
by jeffa (Bishop) on Sep 15, 2003 at 01:40 UTC
    If you know the name of the param ahead of time, you can let URI::QueryParam do the work instead:
    use URI; use URI::QueryParam; my $u = URI->new('http://www.com/index.pl?id=<idparm>&etc'); print $u->query_param('id'), "\n";
    UPDATE:
    Actually, you don't even have to know the name of the param ahead of time - UPDATE, oops - i didn't quite do this right the first time, here goes again:
    my $u = URI->new('http://www.com/index.pl?id=<idparm>&foo=bar&etc'); my @val = grep /<([^>]+)/, map $u->query_param($_), $u->query_param(); print $_,$/ for @val;
    But BrowserUk has a point ... this is waaay overkill! At this point we might as well have just used:
    @value = $URL =~ /(<[^>]+>)/;
    But what if it's the key that you want and not the value? What if both keys and values are surrounded with brackets and you only want the keys? Sometimes importing 400 lines of code is better than spending 400 minutes trying to get you own solution correct.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      300 lines of URI

      PLUS 100 of URI::QueryParam

      PLUS grep

      PLUS a regex,

      when the regex alone (slightly modified) does the job? I don't get the logic?

      If you need to parse a URI--ie extract the protocol, hostname, path, etc. etc,--then use these well-written, RFC-complient solutions--but if you need to extract one, small, easily defined, nicely delimited piece of text from another piece of text, then do what the Practical Extraction & Reporting Language is good at, use the highly optimised, much-copied & envied, incredibly powerful, highly-prized regex engine to extract that one piece of text from the other.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Matching enclosed expression
by LazerRed (Pilgrim) on Sep 14, 2003 at 23:43 UTC
    While you will undoubtedly be reminded not to parse html with a regex... Here's something simple, and fragile :)

    my $pattern = 'http://www.blah.com/cgi.pl?id=<blah>&blah'; my $matched =($pattern =~ /(\<\w+\>)/,$1); print "$matched";

    <blah>
    Whip me, Beat me, Make me use Y-ModemG.
      Well, they _did_ ask for "<idparm>", but maybe they'll say oops they meant "idparm" ?   Let's throw 'em some additional pointers and say m/<([^>]*?)>/ will get what's inside the   '<'  '>'   pair.

      And yes, the "use CGI;  or die" crew hasn't weighed in yet.   (I'm a light-weight and don't count, but ...   Hey, why aren't you using CGI.pm?)

      He was asking to parse a URL, so why would anyone remind him not to parse HTML with a regex?

      Abigail

Re: Matching enclosed expression
by Abigail-II (Bishop) on Sep 15, 2003 at 11:46 UTC
    I'm not convinced your question is your real question. URI may not contain < and > signs, so there's no need to extract stuff like <idparm> out of an URI, because such substrings cannot exist.

    So, what's your real question?

    Abigail

      URI may not contain < and > signs

      But it may contain the escaped versions, still the same question.

        Not the same question, and not the same answer. Suppose the question was, how do I match the substring between < and > in a string, the efficient answer would be:
        /<([^>]*)>/

        but for the URI escaped version, the answer certainly isn't:

        /%3C([^%3E]*)%3E/i

        Abigail

Re: Matching enclosed expression
by LordWeber (Monk) on Sep 15, 2003 at 16:22 UTC