in reply to Checking the length of all row elements

Ok. let me correct my question.

The requirement is that I need to upload the data from a flat file into table.

Before uploading I need to make sure that none of field elements of each row are more than 255 characters.

If there is a single element greater than 255 character, I need to report that data as improper data.

So only way I can think of in Perl is to get the length of each element and do a check using length function.

Other way would be to upload the data using Oracle's sqlldr and then check the length using a sql query. But not sure which one would be more optimized - Oracle or Perl?

  • Comment on Re: Checking the length of all row elements

Replies are listed 'Best First'.
Re^2: Checking the length of all row elements
by Marshall (Canon) on Jul 24, 2010 at 23:36 UTC
    So you have some delimiter between fields in this flat file and you just want to know whether this particular line of input "meets the 255 char" rule of each field or not?

    Split on the delimiter or regex into an array (the first arg of split is a regex). Then use that array as input to a grep in a scalar context which will give you a value of 0,1,2,3,4... If the value is 0, then there are no fields >255 and this is a "good line".

    while (<IN>) { chomp; # needed if not splitting on whitespace my @fields = split; # default split - splits on whitespace: /\s+/ # otherwise use: # my @fields = split(/regex_here/, $_); if ( grep{length($_) >255}@fields ) { print "BAD LINE: $_\n"; #do what you need here } else { #...its a "good line" here } }
    The above code will be extremely performant. Do what you can do before putting the record into the DB.

    updated: minor formatting tweak

Re^2: Checking the length of all row elements
by oko1 (Deacon) on Jul 24, 2010 at 22:02 UTC

    Given that with the "sqlldr" method you would have to delete the "bad" data after it was discovered to be bad, it seems like you'd be wasting all that effort. Using Perl to validate the data before you load it from the flat file would be trivially easy and would avoid the wasted "round trips".

    # Checks comma-separated elements perl -F, -wane'for (@F){die "Improper data\n" if length($_)>255}' file +name.txt

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
Re^2: Checking the length of all row elements
by jwkrahn (Abbot) on Jul 25, 2010 at 03:06 UTC

    Assuming that your delimiter is in $delimiter then:

    if ( $row =~ /(?:^|\Q$delimiter\E)[^\Q$delimiter\E]{256,}(?:\Q$delimit +er\E|$)/ ) { print "Field greater than 255 characters in Row:\n\t$row\n"; }
Re^2: Checking the length of all row elements
by moritz (Cardinal) on Jul 24, 2010 at 21:33 UTC

    I guess Oracle's import and checking is faster than if you do it in Perl. But in the end only comparison will show if that's true.

    Perl 6 - links to (nearly) everything that is Perl 6.