This was what the person wanted:
to turn into:
This is a matter of taking a badly formed html stream and making it well formed. This is sensible, and easily done in cases like your initial example, where there are no nested tags involved in the bad forms. (Your first attempt simply stopped after doing the first tag in the stream, and used a while loop for no purpose). The following would work over a series of non-nested tags:
update: I'm using >[^<]*? instead of >.*? so that it won't corrupt streams that include properly nested tags.s{(<(\w+.*?)>[^<]*?)</.+?>}{$1</$2>}g
But working across nested tags would take more code and more care. You'd need to work through the stream tag by tag, pushing each open-tag name onto a stack, and popping the last name off the stack each time you hit a close-tag, to make sure the output was well formed (though it might still have other problems, depending on how bad the input was).
But other stuff in your post makes little or no sense:
The person wanted the finishing tag to be the first paramater in the html tag. So if i had < font size=2 > I would have to end it with < /size > and not < /font >. My first instinct was to do a while loop...
My first instinct would be to say "No, you don't really want that. You're asking to have ill-formed html as the output. What makes you think you want that?"
Then, looking at your last example, I think I understood the idea; you don't want well-formed html as output. You want a form where a person reading the stream can figure out more easily what the scope is for a given tag in a densely nested html structure. Is that it?
If so, there are better ways to do this than corrupting the html tags in the odd way your friend suggested. What if the name of the first attribute is the least important information? Why have a "human-readable" form that can't be used reliably as input to a browser?
For instance, one thing that can aid human readability of html is to simply place the tags and the text content on separate lines; something like this:
More code and more care could be used to good effect, e.g. to indent the tag lines to reflect nesting depth, to eliminate new-lines from within long open-tags, etc.s/>\s*</>\n</g; # normalize whitespace between adjacent tags s/([^\n])</$1\n</g; # make sure every tag begins a new line s/>([^\n])/>\n$1/g; # make sure every tag is followed by newline
In reply to Re: regex and html tags
by graff
in thread regex and html tags
by Parham
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |