Cap'n Steve has asked for the wisdom of the Perl Monks concerning the following question:

I need to count the number of form fields in a string full of HTML. The problem is that this actually needs to be done in PHP, so no modules are allowed. Select, textarea, text, file and password elements are no big deal, but checkboxes and radio buttons are a problem.

So I need a regular expression that counts all input elements that have both the same name and the same type attributes as one field. I've made some progress, but it's hideously ugly so far and requires four separate calls because I can't seem to avoid hard-coding the order the attributes are expected in. Has anyone done this before?
  • Comment on Counting HTML form elements with a regular expression

Replies are listed 'Best First'.
Re: Counting HTML form elements with a regular expression
by tphyahoo (Vicar) on May 27, 2005 at 09:17 UTC
    Ah, may already know about why you shouldn't be trying to parse html in regexes, and that you should be having a look at HTML::TokeParser::Simple.

    Except you need this done in php. Well, I believe there is a way to call perl from php, though it may be a bit of a pain. (See also just plain google on that.) I would just do that. This html from regex stuff is such a pain that I would do that, call perl from php and use a module.

    If it has to be regexes, no perl, post some sample input data so the monks can have a shot at it without having to create their own test data. Good luck!

Re: Counting HTML form elements with a regular expression
by wazoox (Prior) on May 27, 2005 at 10:49 UTC
    I need to count the number of form fields in a string full of HTML. The problem is that this actually needs to be done in PHP, so no modules are allowed.

    It's hard following you. Is your program perl, or PHP? If it's PHP, asking for help on a PHP site/newsgroup may bring better answers!
    Anyway, what I understand is that you plan to use "perl compatible regexes" within PHP; but "perl compatible regexes" AREN'T perl. Regexes are a tiny part of perl really, and PHP so-called "compatible" implementation may vary considerably from the "Real Thing" (i.e. perl regexes). That's the first thing...

    So I need a regular expression that counts all input elements

    No, you probably don't. Parsing HTML with pure regexes is almost always a bad idea, unless you know precisely what limited HTML subset you'll actually have at input. Either you're working from within PHP, then you'll have to find some PHP module /library /tool /whatever to parse the HTML code, or you may call a perl program from your PHP code, and that perl program may use HTML::Parser or whatever perl module you may need.

Re: Counting HTML form elements with a regular expression
by BUU (Prior) on May 27, 2005 at 09:16 UTC
    Yes, here in perl land we use a magical invention called *modules*. These modules contain bits of prewritten code that perform hard tasks, so that everyone doesn't have to rewrite the code to do the same thing. More to the point, why the heck are you asking on a PERL WEBSITE for php help?
Re: Counting HTML form elements with a regular expression
by gregor42 (Parson) on May 27, 2005 at 12:46 UTC

    I see that what you're really asking for is the regular expression(s) needed to do this task.

    It would help if you posted your 'hideously ugly' regex code so that we could have a look at what you've already done. That would also help to illustrate the issue you are having with attributes.

    Since your objective is to use this with PHP you also probably should have marked this as an Off Topic thread. I do not personally know the differences in syntax that might exist between PHP & Perl with regard to regex. Are there any? That would certainly effect any conversations we might have on this topic.



    Wait! This isn't a Parachute, this is a Backpack!
Re: Counting HTML form elements with a regular expression
by Cap'n Steve (Friar) on May 28, 2005 at 05:08 UTC
    Well, I see by the -2 rating that the monks aren't too happy with my question. I just assumed that since regular expressions are such an integral part of Perl, this would be the best place to find someone knowledgeable in that area. For those interested, I'll probably just end up grabbing the name and type of every input element, throwing them into arrays and then removing duplicates manually.
Re: Counting HTML form elements with a regular expression
by TedPride (Priest) on May 28, 2005 at 07:56 UTC
    PHP has Perl-compatible regex support, so this isn't quite as stupid a request as one might think. In answer to your problem, what you need is a nested regex - one level to get the tag, the next to parse the various parts. You could do it something like this:
    use strict; use warnings; my ($text, $tag, %tag, %c); $text = join '', <DATA>; while ($text =~ /<input (.*?)>/g) { $tag = $1; while ($tag =~ /(\w+)="(.*?)"/g) { $tag{lc($1)} = $2; } $c{"$tag{type}$tag{name}"} = 1; } for (sort keys %c) { print "$_\n"; } print scalar keys %c; __DATA__ <input type="radio" name="booger" value="bingo"> <input type="radio" name="booger" value="bongo"> <input type="radio" name="snot" value="bingo">
    Note - this may be unforgiving to some forms of sloppily coded HTML, but it should give you the general idea of how to go about parsing your form fields. I'd write you a PHP version too, but I need sleeeeeep...