skazat has asked for the wisdom of the Perl Monks concerning the following question:

Hello All,

Was wondering if anyone has a subroutine that looks at a file and extracts the form element names?

So if I have this:

<input type=text name=hello> <select name=you>

I'll get back

hello
you

I'm trying to find particularly a BBEdit Filter that does this, but those filters pretty much look like this

while(<>){ # do your dirty work here }

Just wondering :)

 

-justin simoni
!skazat!

Replies are listed 'Best First'.
Re: Perl HTML Form elements Filter
by Trimbach (Curate) on Nov 28, 2000 at 10:18 UTC
    I think this question is a little strange. I've written alot of CGI's and I haven't ever needed to find out the parameter names from a static file... I have, however, needed to know the names of all the parameters in the code for a CGI itself. That's easy using CGI.pm. From the documentation:
    use CGI; @names = $query->param;
    Where @names is an array containing the name of every parameter item submitted to the CGI. If you really must pick out the HTML attributes (because really, that's all parameter names are) your best bet is to use HTML::Parse or somesuch.

    Gary Blackburn
    Trained Killer

Re: Perl HTML Form elements Filter
by repson (Chaplain) on Nov 28, 2000 at 07:18 UTC
    If you are just looking for every name=foo in the document you could use a regex like this (untested):
    my @names; while(<>){ my $foo; while ($foo = /name=([^> ]*)/gi) {$foo =~ s/^"|"$//g; push @names, $f +oo} }
    However using regex's to parse html is NOT advised and it is much better to use a module such as HTML::Parser.

      HTML::Parser is the best way to go for any kind of HTML extraction task, but there's often a subclass which is more useful for a particular task. In this case I think you'd be better off looking at HTML::TokeParser or HTML::TrreBuilder.

      --
      <http://www.dave.org.uk>

      "Perl makes the fun jobs fun
      and the boring jobs bearable" - me

Re: Perl HTML Form elements Filter
by dash2 (Hermit) on Nov 28, 2000 at 17:13 UTC
    Everyone else is quite right that the sensible person uses a module for this. However, if you aren't sensible, or if you're writing for people who can't necessarily install modules on their machine, you might do something like this:

    First split the HTML into chunks, and don't process bits of HTML outside < and >. Again, the wise person would use a module for this, but the rash fool would just split on ">". Then, for each chunk:

    if (/name\s*=\s*['"]?(\w+)/i) { $name = $1; }

    The only interesting thing here is the ['"]? which makes sure that you don't grab any quotes by accident. As far as I recall, form names should only have word characters, but I am ready to be corrected. I also don't know whether the spaces between html attribute names and values are legit or not. But unless you are writing the html yourself, you can trust some foolish user to put them in.

    If you just want _form_ element names, you'll also first need to test for what sort of html tag you have:

    if (/<input|<select|<textarea/i)

    Cheers Dave

      here, I gots it:

      while(<>) { while (/(input|textarea|select).*?name="?(.*?)"?\s+/gsi) { print " +$2\n" } }

      thats from markjugg.

       

      -justin simoni
      !skazat!

        I'm not sure what I was thinking when I put the while loop instead the while loop. I like it better like this:
        while(<>) { if (/(input|textarea|select).*?name="?(.*?)"?[\s>]/gsi) { prin +t "$2\n" } }
        update: This has the known bug of failing if the tag spans multiple lines. Maybe some other ones, too. :)

        -mark