WebmastTroy has asked for the wisdom of the Perl Monks concerning the following question:

I've got a tab delimited data file and having an issue.

Ok, in short, this is the problem code:

foreach $line (@file) { ($field1value, $field2value, $field3value, $field4value, $field5value, + $field6value, $field7value, $field8value, $field9value, $field10valu +e) = split(/\t+/, $line);
Using the above code, if, for example, the 7th spot for information is the database is blank, it puts whatever's in the 8th spot as 7th (it shifts them to the left. If 2 fields are blank, you come up 2 fields short at the end, etc.). In the database, the tabs are there, but it appears that my split command groups multiple ones together.

I'm taking the information from the file and importing it into a mySQL database, and obviously this causes problems.

Any suggestions are greatly appreciated. Thanks.

Replies are listed 'Best First'.
Re: Tab delimited code problem?
by Paladin (Vicar) on Jan 25, 2003 at 06:33 UTC
    Your code groups multiple tabs together because you told it to. /\t+/ says to use 1 or more tabs to split on. If you want to split on a single tab, then use a single tab: /\t/. Also note, that by default, trailing empty fields are discarded. You can avoid this by using the 3 argument version of split.

    Also, when you have multiple scalars named something like $foo1, $foo2, $foo3, etc, that is usually a sign that you want an array. Something like:

    @values = split /\t/, $line, -1;
      This is a point that deserves more than the cursory mention it got: passing -1 as the number of desired fields to split is important when dealing with the possibility of empty fields. If you don't pass a number of desired fields, and the last fields in the input string are empty, then split will drop them from the output. This is desirable in some cases, but not so in others. At any rate, one should be aware of this.

      Makeshifts last the longest.

Re: Tab delimited code problem?
by Wonko the sane (Curate) on Jan 25, 2003 at 06:30 UTC
    Try limiting your split to just one tab at a time. It's being too greedy.

    split( /\t{1}/, $line )
    That should give you what you want.

    Best Regards,
    Wonko

    Update: doh! of course ya dont need the {1}

Re: Tab delimited code problem?
by atcroft (Abbot) on Jan 25, 2003 at 06:32 UTC

    Try changing /\t+/ to /\t/ -the plus (+) causes it to grab and split on 1 or more instances of the \t, rather than each one individually. I believe that might help.

Re: Tab delimited code problem?
by Aristotle (Chancellor) on Jan 25, 2003 at 23:50 UTC
    If any of your fields is ever quoted to contain a tab as data, you will find the naive split approach will fail for that case. You may want to have a look at Text::CSV_XS if so.

    Makeshifts last the longest.

Re: Tab delimited code problem?
by vek (Prior) on Jan 25, 2003 at 20:13 UTC
    You've already received some good answers to your question. FWIW, I usually like to use an array when splitting. Makes the code a little easier to read:
    foreach my $line (@file) { my @fields = split(/\t/, $line); # do stuff with @fields... }
    -- vek --