selena has asked for the wisdom of the Perl Monks concerning the following question:

I made this change to Text::ParseWords to accomodate an unfortunate input file.. are there any side effects to be worried about? I needed Text::ParseWords to find & return null fields. I have tested it in a couple different contexts and haven't found any unexpected horrors yet.
--- /usr/lib/perl5/5.8.0/Text/ParseWords.pm 2002-09-01 21:14:22.00 +0000000 -0700 +++ /home/selena/perl/Text/ParseWords.pm 2003-10-21 19:00:43.00 +0000000 -0700 @@ -67,6 +67,8 @@ (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["'])) # plus EOL, delimiter, + or quote ([\000-\377]*) # the rest + | # --OR-- + ($delimiter) /x; # extended layout return() unless( $quote || length($unquoted) || length($delim) +);
In case you aren't interested in applying the patch, the whole regex - change is at the second --OR-- comment:
($quote, $quoted, undef, $unquoted, $delim, undef) = $line =~ m/^(["']) # a $quote ((?:\\.|(?!\1)[^\\])*) # and $quoted text \1 # followed by the same quote ([\000-\377]*) # and the rest | # --OR-- ^((?:\\.|[^\\"'])*?) # an $unquoted text (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["'])) # plus EOL, delimiter, or quote ([\000-\377]*) # the rest | # --OR-- ($delimiter) /x; # extended layout

Edit by tye, mention module in title, link to module

Replies are listed 'Best First'.
Re: small change - evil consequences?
by etcshadow (Priest) on Oct 22, 2003 at 03:46 UTC
    Hmm... maybe I'm being dumb here, but I think that if this is doing anything at all, it is doing the wrong thing. Let me start nit-picking:
    1. you drop $delimiter into a /x regexp, but it is elsewhere de-/x'ed as (?-x:$delimiter)... so if someone passes in a $delimiter containing whitespace, it will be interpretted differently by your addition than elsewhere.
    2. you capture your new $delimiter, but you aren't storing the value? Odd, you could just drop the parenthes or make them (?:$delimiter) or, as to the previous point, (?-x:$delimiter)
    3. the other primary sub-regexes are anchored at the beginning of the line, so they can tell what came before the ending sub-regex, but your new one is not.
    4. excepting, of course, for the /x-or-not-/x question surrounding $delimiter, anything caught by your new regex should have been caught by the regex for unquoted stuff (unless, of course, it had an open quote with no matching close quote)

    Can you give me the calling context (the value of $delimiter and $line)?


    ------------
    :Wq
    Not an editor command: Wq

      ok - let me start with, I did not completely understand the original regex's and your explanation helped me there. I changed the $delimiter section as you suggested.

      My goal was to capture all fields implied in a tab-delimited line, as in something that would do the right thing when given this:

      aabel Allyn Abel EL0 20030612182307 10.5.4.166 abeeman Ali Beeman EL1 ajens@ttsd.k12.or.us 200309 +02191509 10.8.0.219

      There are actually 8 fields in both lines, based on the tabbing.

      The problem with the previous code (sans the '| ($delimiter)' ) was that when I specified '\t' as a field seperator, the regex did not recognize multiple tabs correctly.

      Ugh. and now that i go off and test this with the original module (because I was going to show you the problem exactly..), its working with the original regex. *sigh*

      Perhaps reading Text::ParseWords caused me to clean up some code elsewhere that was causing the issue.

      Thanks for taking the time to read through this.

        Maybe I'm not understanding your problem, but wouldn't a split(/\t+/, $line) get the tab-delimited fields?

        Arjen

Re: small change - evil consequences?
by menolly (Hermit) on Oct 22, 2003 at 21:47 UTC
    I don't see the need for this.
    use Text::ParseWords; $str = "a b c"; @f = split /\t/, $str; print scalar @f, "\n"; @f = parse_line('\t', 0, $str); print scalar @f, "\n";
    output:
    [root@localhost root]# perl temp.pl 4 4
    Both parse_line and split handle the null field.