mikeblatter has asked for the wisdom of the Perl Monks concerning the following question:

I need help. How do you get the meta tag into a varible.
I understand :
<meta name="description" content="..."/> $description =~ s/?/$1/; I was hoping you guys know the reg expression for that.

Replies are listed 'Best First'.
Re: Reg Expressions
by silent11 (Vicar) on Feb 01, 2003 at 01:08 UTC
    Look into the HTML::TokeParser module...

    -Silent11
Re: Reg Expressions
by BrowserUk (Patriarch) on Feb 01, 2003 at 01:41 UTC

    If, and only if, your data contains only the meta tag, and you are not attempting to extract these from a larger set of html data, then this should work for you.

    $string ='<meta name="description" content="..."/>'; print $1 if $string =~ m[^<meta .*?name\s*=\s*"([^"]+)".*?/>$]i; description

    It will only work if $string contains only meta tag and nothing else.

    Next comes the question of how would you isolate the meta tag from a larger body of HTML. The answer is that you almost certainly would need to use one of the HTML::* modules, at which point, the above becomes redundant as they will allow you to get at the attributes of the meta tags (and every other tag) without needing a regex.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Reg Expressions
by Anonymous Monk on Feb 01, 2003 at 20:55 UTC
    Is it something like this:
    $html = "<html><meta name=\"description\" content=\"sf\"></html>"; $html =~ /<meta.+?name\s*=\s*("|')description\1\.+?content\s*=\s*("|') +(.*?)\2/; print $3;
Re: Reg Expressions
by cLive ;-) (Prior) on Feb 01, 2003 at 02:03 UTC
    Dirty, but if you know the name always comes before the content and that all meta tags contain a name and content, you can use:
    /<meta.+?name\s*=\s*("|')description\1\.+?content\s*=\s*("|')(.*?)\2/; my $description_content = $3;

    .02

    cLive ;-)

    --

Re: Reg Expressions
by Anonymous Monk on Feb 01, 2003 at 01:24 UTC
    Yeah I saw that I was wondering if you guys know a reg expression for getting description from the meta tag.
      There have been plenty of working regexes posted for this thread, i thought there should be at least one reply that uses a parser. I'll assume that when you say you want the description, you really want the content from the meta tag.
      use strict; use warnings; use HTML::TokeParser::Simple; my $description; my $p = HTML::TokeParser::Simple->new(*DATA); while (my $token = $p->get_token) { if ($token->is_start_tag('meta')) { my $attr = $token->return_attr; if (defined $attr->{name}) { $description = $attr->{content}; last; } } } print "TokeParser got '$description'\n"; __DATA__ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="description" content="Hello HTML Parsing!" /> <meta name="keywords" content="make me the top hit!" /> <meta name="generator" content="Perl, baby. Perl." /> </head> <body> Hello World </body> </html>

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)