Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Guys, I am trying to parse a file that contains a database record on each line. Format is like this: "a,"b","c","d" the problem is that on a couple of records, the last value of the string ("d") contains multiple lines of text.It looks like this:
"a,"b","c","d "ddd" "ddd" "\n"
Im using split, and it stops at the end of the first line. This is assigning the trailing "d" information to the following "a" which isn't good. I know I need some kind of if statement, but Im not versed well enough in perl to figure it out. Can anyone help? Thanks Steve

Replies are listed 'Best First'.
Re: Need Help Parsing File
by tachyon (Chancellor) on May 16, 2002 at 20:14 UTC
    It is hard to know if your example is literal or not ie is that a \n newline char or the literal string "\n". Anyway in the first case the input record separator is "\n\n" (two newlines) or in the second case it is q/"\n"/. Set the input record separator to one of these values to read in a record (rather than a line) at a time.
    open FILE, $somefile or die $!; # set input record separator $/ = "\n\n"; # to this $/ = q/"\n"/; # or ?? this while(<FILE>){ ($a,$b,$c,$d) = split ','; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Need Help Parsing File
by ravendarke (Beadle) on May 16, 2002 at 20:02 UTC
    I'm not terribly proficient myself, but I think this will work:
    /test.txt:
    first,line,and,end second,line,with,a break third,line,no,break
    The code:
    #!/usr/bin/perl -w use strict; my @array; my @bigun; my $i; open (FILE, "/test.txt"); while (<FILE>){ chomp($_); if (/,/){ #out with the old push @bigun, (join ' ',@array) if (@array); #and in with the new @array=split /,/,$_; } else { #append the 'un-commaed' line to the lest element of the array $array[$#array]=$array[$#array].$_; } } push @bigun, (join ' ',@array); foreach $i(0..$#array){ print "$bigun[$i]\n"; }
    And, the output:
    first line and end second line with abreak third line no break
    It's kind of dirty, but it'll do the job..
    Marty

    update: added use strict;. Mas apologies for that....
Re: Need Help Parsing File
by Super Monkey (Beadle) on May 16, 2002 at 19:22 UTC
    It looks like the the multiple 'ddd' lines do not contain commas (delimeters). You could check the line to see if it has commas (delimiters). If it doesn't, its not a new record. If it does, parse it as a new record. I know this explaniation is rudamentary, but so is your example.
Re: Need Help Parsing File
by Anonymous Monk on May 16, 2002 at 20:54 UTC
    Here is the actual data:
    "8061","APAR","IX89806","IBM","","" "8062","APAR","IX89893","IBM","","" "8063","APAR","IX89419","IBM","","" "8064","APAR","IY06694","IBM","","" "8065","Upgrade","httpsrv.95","Bajie","http://www.geocities.com/gzhang +x/websrv/httpsrv.95.zip","" "8066","Hotfix","Temporary Hotfix: dtspcd.tar.gz","HP","ftp://dtspcd:d +tspcd@hprc.external.hp.com/dtspcd.tar.gz","To install this emergency +hotfix, "\n"download the archive and place it in a protected directory. Verif +y the integrity of the archive: "\n" "\n"MD5 Sum: b122f84857f4da65b50d9926201608a1 "\n" "\n"Unpack it, and run 'install_dtspcd x' "\n" "\n"Where 'x' is either: "\n" "\n"dtspcd.10.10 "\n"dtspcd.10.20 "\n"dtspcd.11.00 "\n"dtspcd.11.11 "\n" "\n"The value chosen depends on the system it is being installed on. + 10.24 systems should use dtspcd.10.20. 11.04 systems should use dts +pcd.11.00. "\n"On VVOS (10.24 and 11.04) systems the install_dtscpd should be run + at the SYSTEM access level." "8067","RPM","7.1k i386 update-disk-20011106.img","Red Hat","ftp://upd +ates.redhat.com/7.1/kr/os/images/i386/update-disk-20011106.img","" "8068","RPM","7.1k noarch redhat-release-7.1k-2.noarch.rpm","Red Hat", +"ftp://updates.redhat.com/7.1/kr/os/noarch/redhat-release-7.1k-2.noar +ch.rpm","" "8069","Patch","110286-04","Sun","","" "8070","Patch","110287-04","Sun","",""
Re: Need Help Parsing File
by webadept (Pilgrim) on May 17, 2002 at 07:05 UTC
    In the example data you gave, it looks like the first "cell" of a new record begins with a ID field of some type. This field appears to have at least 4 digits. So I would work with that. Use a regex to check the field after your split and see if there are 4 digits in a row. If not, then add that information to the existing record. If it does, then its a new record.

    Scan the rest of the records near the end of the file. You shouldn't need to worry too much unless you see a record with only 3 digits in that first ID field.

    Just a thought as well, this looks like a straight import into something, so don't spend a lot of time making it look pretty if you are just going to throw data into a database and never use the script again. Wam Bam.. as the saying goes.

    webadept

    Every day someone is doing what someone else said is impossible.
Re: Need Help Parsing File
by Anonymous Monk on May 16, 2002 at 19:29 UTC
    You are correct. The following lines don't have commas. I just don't know how to check for them. And if they do not, I would then need to add the data to end of the previous record. Im clueless as to how to do this. Thanks