comment on

Monks,

I've recently been reading with interest some of the previous discussions on the use of .* and .*? that are scattered around the Monastery (Death to Dot Star!, Dot star okay, or not? and Ovid, Long Live .*? (dot star question-mark), among others). These have gone into why .* and its friends are considered bad, and I think I understand the reasoning behind this point of view.

I've recently had to write some code at work, though, which got me thinking about this. The code is simple enough - it parses some XML tags to grab data from a file. (Aside: See Production Environments and "Foreign" Code for why I can't just use the XML modules, which I'd much rather do). The code, however, implements a regex to grab the data from the file - and uses as part of this the dreaded .*, albeit in a non-greedy fashion.

I've thought about this long and hard, and I don't think that I can see a straightforward, easy-to-read way of implementing the same code without the .*?, for which the regex I wrote and an example are below.

my $example = "<ClientID type="String">A1234BX</ClientID>";
$example =~ /^\s*\<(\w+)\s[\w\"\=]+\>(.*?)\<\//;

my $tag  = $1;
my $data = $2;

# do something with the data
[download]

I've considered using character classes and look-aheads to pull the data between the two XML tags (which can include a wide and interesting array of alphanumeric and other characters), but I can't see how these would be either beneficial or efficient for a large set of data.

I guess I'm interested to know what the general consensus for the use of .* is. Is it something to be avoided at all costs, or is it a powerful, oft-misused tool that can be useful and beneficial in carefully controlled circumstances?

While I'm at it *grin*, does anyone have a "better idea" for pulling the data out of the tags? Would this count as an acceptable exception to the "Don't Use Dot Star" rule that seems to be prevalent throughout the Monastery?

Any opinions, suggestions and comments are welcome :)

-- Foxcub
#include www.liquidfusion.org.uk

In reply to An "ethical" use of dot-star ..? by Tanalis

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.