Re: Pattern matching with \t

Your problem is the greedy matching of the (.+) construct. This 'eats up' too much of the whole string. The reason is that you are eating up anything which is not an eol character (see also Death to Dot Star!). As your last part (tab and 2nd number) is completely optional, it is not matched.

The easiest solution - if you are sure that in your normal text ($1) is no tab character - is to write the following:

$_[0] =~ m/([a-zA-Z0-9\-\_\/\.\,]+)
           \s*\t
           ([^\t]+)
           \t
           ([0-9\.]+)
           \t?
           ([0-9\.]+)?/x; # note the x modifier to allow for
                          # multiline format
[download]

I also changed your '*' in some cases to '?' which means: match zero or one time, I think this should be the correct multiplicity.

If the tab is allowed in the text, you have to use the non-greedy variant of '+' instead:

$_[0] =~ m/([a-zA-Z0-9\-\_\/\.\,]+)
           \s*\t
           (.+?)
           \t
           ([0-9\.]+)
           \t?
           ([0-9\.]+)?/x;
[download]

Update: Fixed link.

-- Hofmator

Comment on Re: Pattern matching with \t Select or Download Code

Replies are listed 'Best First'.
Re: Re: Pattern matching with \t by professa (Beadle) on Dec 10, 2001 at 17:55 UTC
The tabs are only used as a delimiter of the single 'cells' in the line, not in the text parts themself. I tried both methods suggested here, the `strip`-method by Masem and yours, both work fine! The advantage of the pattern-matching-method is that $1-$4 are free of \n's and \r's, which I had to cut out by substitution when using 'strip' afterwards, but it's just a question of personal favour. ;-) Thanks to everyone here for the advice! Bye, Micha	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Pattern matching with \t
by professa (Beadle) on Dec 10, 2001 at 17:55 UTC

strip

Masem

Bye, Micha

[reply]
[d/l]