Hi, I have developed a script which takes the tab formatted csv file and validates couple of fields and then generates valid and invalid tab separated csv files. Code is pasted below. The issue is, when there are no values, those records should not go to invalid. However, if the checked field have empty spaces and no values, this should go to the invalid file. This means if the value is never provided (null), it should be in the valid file. If no values are provided but the field contains empty spaces, this is considered invalid record and go to that file. I have done lot of attempts, still not able to handle the null and blank spaces issues. The records simply go to invalid when the checked fields contains null or blank spaces. Any help is greatly appreciated. Last time, I got so much help that I was at least able to write this script.
#!/usr/bin/perl # use strict; use warnings; use Diagnostics; use Text::CSV_XS; #use Regexp::Common::time; my $file_to_parse = "AddressDevCHUBldgProj_LM.csv"; my $file_to_parse_valid = "AddressDevCHUBldgProj_LM_valid.csv"; my $file_to_parse_invalid = "AddressDevCHUBldgProj_LM_invalid.csv"; open(FH, '<', $file_to_parse) or error("cannot open file ($!)"); open(FH1, '>', $file_to_parse_valid) or error("Cannot open file for wr +ite($!)"); open(FH2, '>', $file_to_parse_invalid) or error("Cannot open file for +write($!)"); while (<FH>) { chomp; my ($chu, $dev, $development_name, $bldg_no, $bldg_name, $unit_id, + $proj, $proj_name, $bdsz, $st_no, $street_name, $apt_no, $p_code, $un_st, $mkt, $max_rent, $fut_movein, $p, $last_ +move_in, $last_move_out, $pgm, $clt_no, $mgr_cd, $manager, $unit_status_dt, $cont_occ_dt, $pending_move_out_dt, $unit_status_ +desc) = split("\t"); $max_rent =~ s/,//g; #print "$dev\t$bldg_no\t$unit_id\t$proj\t$max_rent\t\t$clt_no\t\t\t$la +st_move_in\t\t$last_move_out\t\t$unit_status_dt\t\t$pending_move_out_ +dt\n"; print "dev = $dev\n"; print "bldg_no = $bldg_no\n"; print "unit_id = $unit_id\n"; print "proj = $proj\n"; print "max_rent = $max_rent\n"; print "clt_no = $clt_no\n"; print "last_move_in = $last_move_in\n"; print "last_move_out = $last_move_out\n"; print "unit_status_dt = $unit_status_dt\n"; print "pending_move_out_dt = $pending_move_out_dt\n"; if ( ($dev =~/^[0-9]+$/) && ($bldg_no=~/^[0-9]+$/) && ($unit_id=~/ +^[0-9]+$/) && ($proj=~/^[0-9]+$/) && ($max_rent=~/^[0-9]+(\.[0-9][0-9]?)$/) && ($clt_no=~/^[0-9]+$/) && ($last_move_in=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#) && ($last_move_out=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#) && ($unit_status_dt=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#) && ($pending_move_out_dt=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#) ) { print FH1 "$_", "$chu\t$dev\t$development_name\t$bldg_no\t$bldg_ +name\t$unit_id\t$proj\t$proj_name\t$bdsz\t$st_no\t$street_name\t$apt_ +no\t$p_code\t$un_st\t$mkt\t$max_rent\t$fut_movein\t$p\t$last_move_in\ +t$last_move_out\t$pgm\t$clt_no\t$mgr_cd\t$manager\t$unit_status_dt\t$ +cont_occ_dt\t$pending_move_out_dt\t$unit_status_desc", "\n"; + } else { print FH2 "$_", "$chu\t$dev\t$development_name\t$bldg_no\t$bldg_ +name\t$unit_id\t$proj\t$proj_name\t$bdsz\t$st_no\t$street_name\t$apt_ +no\t$p_code\t$un_st\t$mkt\t$max_rent\t$fut_movein\t$p\t$last_move_in\ +t$last_move_out\t$pgm\t$clt_no\t$mgr_cd\t$manager\t$unit_status_dt\t$ +cont_occ_dt\t$pending_move_out_dt\t$unit_status_desc", "\n"; } } close (FH); close (FH1); close (FH2); exit;

In reply to Data validation and blank spaces in tab formatted csv file by Ma

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.