small change to Text::ParseWords

selena has asked for the wisdom of the Perl Monks concerning the following question:

I made this change to Text::ParseWords to accomodate an unfortunate input file.. are there any side effects to be worried about? I needed Text::ParseWords to find & return null fields. I have tested it in a couple different contexts and haven't found any unexpected horrors yet.

--- /usr/lib/perl5/5.8.0/Text/ParseWords.pm     2002-09-01 21:14:22.00
+0000000 -0700
+++ /home/selena/perl/Text/ParseWords.pm        2003-10-21 19:00:43.00
+0000000 -0700
@@ -67,6 +67,8 @@
                      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))  
                                                # plus EOL, delimiter,
+ or quote
                       ([\000-\377]*)          # the rest
+                      |                       # --OR--
+                       ($delimiter)
                      /x;                      # extended layout
        return() unless( $quote || length($unquoted) || length($delim)
+);
[download]

In case you aren't interested in applying the patch, the whole regex - change is at the second --OR-- comment:

 ($quote, $quoted, undef, $unquoted, $delim, undef) =
    $line =~ m/^(["'])                 # a $quote
              ((?:\\.|(?!\1)[^\\])*)   # and $quoted text
              \1                       # followed by the same quote
              ([\000-\377]*)           # and the rest
              |                        # --OR--
              ^((?:\\.|[^\\"'])*?)     # an $unquoted text
              (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
                                       # plus EOL, delimiter, or quote
              ([\000-\377]*)           # the rest
              |                        # --OR--
              ($delimiter)
              /x;                      # extended layout
[download]

Edit by tye, mention module in title, link to module

Comment on small change to Text::ParseWords - evil consequences? Select or Download Code

Replies are listed 'Best First'.
Re: small change - evil consequences? by etcshadow (Priest) on Oct 22, 2003 at 03:46 UTC
Hmm... maybe I'm being dumb here, but I think that if this is doing anything at all, it is doing the wrong thing. Let me start nit-picking: you drop $delimiter into a /x regexp, but it is elsewhere de-/x'ed as (?-x:$delimiter)... so if someone passes in a $delimiter containing whitespace, it will be interpretted differently by your addition than elsewhere. you capture your new $delimiter, but you aren't storing the value? Odd, you could just drop the parenthes or make them (?:$delimiter) or, as to the previous point, (?-x:$delimiter) the other primary sub-regexes are anchored at the beginning of the line, so they can tell what came before the ending sub-regex, but your new one is not. excepting, of course, for the /x-or-not-/x question surrounding $delimiter, anything caught by your new regex should have been caught by the regex for unquoted stuff (unless, of course, it had an open quote with no matching close quote) Can you give me the calling context (the value of $delimiter and $line)? ------------ :Wq Not an editor command: Wq	[reply]
Re: Re: small change - evil consequences? by selena (Acolyte) on Oct 22, 2003 at 04:34 UTC
ok - let me start with, I did not completely understand the original regex's and your explanation helped me there. I changed the $delimiter section as you suggested. My goal was to capture all fields implied in a tab-delimited line, as in something that would do the right thing when given this: `aabel Allyn Abel EL0 20030612182307 10.5.4.166 abeeman Ali Beeman EL1 ajens@ttsd.k12.or.us 200309 +02191509 10.8.0.219` [download] There are actually 8 fields in both lines, based on the tabbing. The problem with the previous code (sans the '\| ($delimiter)' ) was that when I specified '\t' as a field seperator, the regex did not recognize multiple tabs correctly. Ugh. and now that i go off and test this with the original module (because I was going to show you the problem exactly..), its working with the original regex. sigh Perhaps reading Text::ParseWords caused me to clean up some code elsewhere that was causing the issue. Thanks for taking the time to read through this.	[reply] [d/l]
Re: Re: Re: small change - evil consequences? by Aragorn (Curate) on Oct 22, 2003 at 07:52 UTC
Maybe I'm not understanding your problem, but wouldn't a `split(/\t+/, $line)` get the tab-delimited fields? Arjen	[reply] [d/l]
Re: Re: Re: Re: small change - evil consequences? by etcshadow (Priest) on Oct 22, 2003 at 15:41 UTC
Re: small change - evil consequences? by menolly (Hermit) on Oct 22, 2003 at 21:47 UTC
I don't see the need for this. `use Text::ParseWords; $str = "a b c"; @f = split /\t/, $str; print scalar @f, "\n"; @f = parse_line('\t', 0, $str); print scalar @f, "\n";` [download] output: `[root@localhost root]# perl temp.pl 4 4` [download] Both `parse_line` and `split` handle the null field.	[reply] [d/l] [select]