A few points. You will need a chomp($line). That deletes the trailing end of line character. Otherwise the trailing \n will wind up at the end of the last token parsed by the split on tabs. The default split /\s+/ (split on any whitespace character) doesn't need a "chomp" because \t is one of the 5 whitespace chars (\n\r\f\t\s).

I don't know if you will need to trim trailing spaces or not. But you should consider the following code...

#!/usr/bin/perl -w use strict; my $line = "tok1 \t \t\t tok4\n"; chomp ($line); #try running without this! my @x = my ($tok1, $tok2, $tok3, $tok4) = split(/\t/,$line); foreach my $token (@x) { print "token = $token..\n"; #.. is there to show blanks } __END__ prints: token = tok1 .. token = .. token = .. token = tok4..
I don't know what $line =~ s/&+/&/; equates to but I think this should be: chomp($line);. I hope that you've come to see the power of multiple variables to the left of the equals sign!! In many languages you have to write a bunch of stuff that essentially means something like thing 3 in the array is a "postal code". In Perl, we can just assign these variables names straight from the "get-go".

Now we come the question about "undef" vars resulting from split. You have a lengthy section like $isbn='' unless defined($isbn);.

Run the above code with this line, adding $tok5:

my @x = my ($tok1, $tok2, $tok3, $tok4, $tok5) = split(/\t/,$line);
You will see that you get a runtime warning about an undefined var. "Use of uninitialized value $token in concatenation (.)". This happens in the print and Perl keeps going and this is normally what you would want. You get some info that your database is corrupted and Perl does the best that it can.

The split() will not generate intermediate undef's, if that happens, the undef will be at the end (ie not a position 3 or whatever). In the above $tok5 is "undef" because we have exceeded the number of things returned by the split().Let's say that you want to detect "undef's" in the split and do something on your own.
Here is one way:

my @x = split(/\t/,$line); die "I don't have enough stuff..need 5 tokens\n" if @x <5; my ($tok1, $tok2, $tok3, $tok4, $tok5) = @x;
We see how many things that split() comes up with and assign that to @x. There won't be any "undef" values there. Then we see if we have enough defined values to satisfy the $var assignments (scalar value of the @x variable), if not then do what you want. This is just an example.

In general if some field is completely "MIA" in the DB, it is field 3? 2? I mean if we are expecting to get 5 things and only get 4, then who knows what is missing and dealing with that can be very problematic! but the split/\t/ will generate a "", null for a "blank field", not an undef value.

Good luck and happy Perling! A fantastic language.

Update:Perl has an operator that I've never seen in another other language, |=, $varA |= "some text"; This statement means if $varA evaluates as *logical* false, then "some text" is assigned to it. In Perl, numeric 0, undef, "" all mean logical false. In some situations this "logical true OR" gizmo is a very nice thing, mainly dealing with undef or Null text strings.


In reply to Re^3: input tab delimited file by Marshall
in thread input tab delimited file by libmonk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.