Ma has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone, I would like to split Tab separated text file into valid and invalid records (files). For example:

Name RentDue Rent
xyz 09/30/2013 500
abc 09/30/2013 YYN
del hfhfh 200

In the above sample file, since record 1 is good, should go to valid file. Record 2 is invalid as the rent should be numeric, should goto invalid file. Record 3 is again invalid as the RentDue is not a valid date, it should also go to invalid file. Can we accomplish this stuff using perl. Many thanks in advance for providing some hints. regards Ma
  • Comment on Split Tab separated text file into valid and invalid records

Replies are listed 'Best First'.
Re: Split Tab separated text file into valid and invalid records
by toolic (Bishop) on Sep 23, 2013 at 12:26 UTC
Re: Split Tab separated text file into valid and invalid records
by hdb (Monsignor) on Sep 23, 2013 at 12:29 UTC

    • Use Text::CSV to read your file and to split lines into individual fields.
    • Use regular expressions to test for valid field values.
    • For testing the dates use something like DateTime.

      this is very helpful. Is it possible that I can get some code to start with, I am just starting Perl and have done this so far.
      #!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] or die "Need to get CSV file on the command line\n +"; open (FILE, $file); while (<FILE>) { chomp; ($emp, $dt, $account, $baldue, $credit, $netarrears, $locno, $devno, +$parkbal, $rentnoretro, $maintbal, $legbal, $otherbal, $retrobal, $chu, $mk, $bu +ildno, $mo veoutdt, $status, $legal, $legdt) = split("\t"); print "$dt $account $baldue $credit $netarrears $locno\n"; }
        Close!

        (Assuming parent post is by OP, Ma): you have a space inside the name of one of your variables, $mo veoutdt. But that's harder than need be to spot, since you failed to follow the guidance at the text-input box, namely -- use code and para tags.

        As to your other code problems, you've invoked strict,(and, very wisely), but without honoring one of its key strictures, declaring variables with a my (or our etc). The error messages should have given you a direction to repair the problem, though there were so many you may have found them overwhelming (go forth and sin no more).

        Here's your code, modified to use data rather than a file and truncated for simplicity and clarity:

        #!/usr/bin/perl use strict; use warnings; my ($dt, $account, $baldue); while (<DATA>) { chomp; ($dt, $account, $baldue) = split /\t/; # shorter, more standard. + was: split("\t"); print "$dt | $account | $baldue \n"; } =head OUTPUT 4-11-13 273.13 273.13 20130512 11.17 36.14 09222013 2,479.00 6,481.16 =cut __DATA__ 4-11-13 273.13 273.13 20130512 11.17 36.14 09222013 2,479.00 6,481.16

        Check the documentation of the modules I recommended. There is some sample code there.

        Also please use code tags to make your posts more readible.

Re: Split Tab separated text file into valid and invalid records
by ww (Archbishop) on Sep 23, 2013 at 17:08 UTC
    An update (as a new post to avoid confusion):

    It took a while but eventually it sank in: Ma and the AM who wrote Split Tab separated text file into valid and invalid records (presumed to be the same person) asked two quite distinct questions. My past node replied to the AM, whilst ignoring the specs in the top node.

    So (JBIC & because I have time-on-my-hands awaiting a particular event), herewith, a solution that satisfies both (I think) in one (didactic) script:

    #!/usr/bin/perl use strict; use warnings; # 1055276 (see also my splitTOscalars.pl) # OP valid: $dt (AKA 'name')is alpha , datefield using slashes, balanc +edue field is numeric # second post as AM: split problems my ($dt, $account, $baldue); while (<DATA>) { chomp; ($dt, $account, $baldue) = split/\t/; # was: split("\t"); if ($dt =~ /[A-Z]+/i && $account =~ /^[\d\/]+$/ && $baldue =~ /^[\ +d.,]+$/) { print "\n VALID per OP: \$dt: $dt | \$account: $account | \$ba +ldue: $baldue \n\n"; }elsif ($dt =~ /[a-z]+/) { print "Invalid: $dt -|- $account -|- $baldue \n"; }else { print "\nValid per #1055280 \$dt, \$account, \$baldue: $dt, +$account, $baldue\n"; } } =head OUTPUT Valid per #1055280 $dt, $account, $baldue: 4-11-13, 273.13, 273.13 Valid per #1055280 $dt, $account, $baldue: 20130512, 11.17, 36.14 Valid per #1055280 $dt, $account, $baldue: 09222013, 2479.00, 6481.1 +6 VALID per OP: $dt: xyz | $account: 09/30/2013 | $baldue: 500 Invalid: abc -|- 09/30/2013 -|- YYN Invalid: del -|- hfhfh -|- 200 =cut __DATA__ 4-11-13 273.13 273.13 20130512 11.17 36.14 09222013 2479.00 6481.16 xyz 09/30/2013 500 abc 09/30/2013 YYN del hfhfh 200
      thank you very much, this is what I was looking for. Regards Ma