paragkalra has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirement where I need to check the length of each row of all the columns of my table.

I was planning to use following algorithm

1. Get the data from the table using DBI and store it in file 2. Split the each row using the delimiter and store it in array 3. Loop the through the array and check the length 4. Repeat this step for all the rows. Thus Parse the file row by row

But to me this might be slow process since we would be processing each and every element of the table.

Is there any better optimized solution.

Replies are listed 'Best First'.
Re: Checking the length of all row elements
by moritz (Cardinal) on Jul 24, 2010 at 19:39 UTC
    Why not let the database do the job? just sum up the string lengths for each column, and then group by that sum.
    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Checking the length of all row elements
by oko1 (Deacon) on Jul 24, 2010 at 20:24 UTC

    I'm assuming that the reason you asked that question here is because you may be working with that data in a Perl script - so I'm going to answer with a Perl snippet (the correct solution, as several people have already pointed out, is still to let the DB do it):

    my @col_names = qw/first_name middle_name last_name address phone_numb +er/; my $total_len = join '+', map { "char_length($_)" } @col_names; my $sql = "SELECT $total_len AS 'Row Length' FROM table ORDER BY $tota +l_len";

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
Re: Checking the length of all row elements
by ahmad (Hermit) on Jul 24, 2010 at 19:50 UTC

    Why would you do something like that? why do you think you need the length of each row?

    In MySQL you can do it like this

    SELECT ( CHAR_LENGTH( `field1` ) + CHAR_LENGTH( `field2` ) ) AS `result` FROM `table`
Re: Checking the length of all row elements
by jwkrahn (Abbot) on Jul 24, 2010 at 19:40 UTC

    Do you need the total length of the row or the lengths of each individual field in the row?

    If you just need the total length then you don't need to split the row into an array.

Re: Checking the length of all row elements
by paragkalra (Scribe) on Jul 24, 2010 at 21:29 UTC

    Ok. let me correct my question.

    The requirement is that I need to upload the data from a flat file into table.

    Before uploading I need to make sure that none of field elements of each row are more than 255 characters.

    If there is a single element greater than 255 character, I need to report that data as improper data.

    So only way I can think of in Perl is to get the length of each element and do a check using length function.

    Other way would be to upload the data using Oracle's sqlldr and then check the length using a sql query. But not sure which one would be more optimized - Oracle or Perl?

      So you have some delimiter between fields in this flat file and you just want to know whether this particular line of input "meets the 255 char" rule of each field or not?

      Split on the delimiter or regex into an array (the first arg of split is a regex). Then use that array as input to a grep in a scalar context which will give you a value of 0,1,2,3,4... If the value is 0, then there are no fields >255 and this is a "good line".

      while (<IN>) { chomp; # needed if not splitting on whitespace my @fields = split; # default split - splits on whitespace: /\s+/ # otherwise use: # my @fields = split(/regex_here/, $_); if ( grep{length($_) >255}@fields ) { print "BAD LINE: $_\n"; #do what you need here } else { #...its a "good line" here } }
      The above code will be extremely performant. Do what you can do before putting the record into the DB.

      updated: minor formatting tweak

      Given that with the "sqlldr" method you would have to delete the "bad" data after it was discovered to be bad, it seems like you'd be wasting all that effort. Using Perl to validate the data before you load it from the flat file would be trivially easy and would avoid the wasted "round trips".

      # Checks comma-separated elements perl -F, -wane'for (@F){die "Improper data\n" if length($_)>255}' file +name.txt

      --
      "Language shapes the way we think, and determines what we can think about."
      -- B. L. Whorf

      Assuming that your delimiter is in $delimiter then:

      if ( $row =~ /(?:^|\Q$delimiter\E)[^\Q$delimiter\E]{256,}(?:\Q$delimit +er\E|$)/ ) { print "Field greater than 255 characters in Row:\n\t$row\n"; }

      I guess Oracle's import and checking is faster than if you do it in Perl. But in the end only comparison will show if that's true.

      Perl 6 - links to (nearly) everything that is Perl 6.