greatshots has asked for the wisdom of the Perl Monks concerning the following question:
Dear monks,
In my Input file I could see 3 different types of lines as specified below.
If I split the lines using ',' The results are weired for the first line.
Because the first line contains lot of ','s inbetween Double quote.
How can I make my parsing logic to work perfectly.
I need your logics to parse this files.
I can write a program.
Thanks a lot,
__DATA__
Submitted,"696,028","50,946","810,590","836,505","13,923,241","13,776,
+443","14,179,619","14,614,558","14,704,885","14,634,911","15,055,774"
+,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14
+,524,069"
Expired,245,275,273,248,240,295,353,316,371,398,387,352,310,288,405,27
+4,270
Less in,90.12%,90.49%,90.04%,89.55%,90.09%,90.63%,90.37%,90.48%,90.73%
+,90.59%,90.83%,90.40%,88.82%,90.71%,90.72%,90.69%,91.04%
The output should look like as below,
Field1 Field2 field3 field4 field5 ..... Fieldn
Submitted 696,028 50,946 810,590 836,505 ..... blahblah
Expired 245 275 273 248 ......blahblah
Less 90.12% 90.49% 90.04% 89.55% ......blahblah
Re: I would like to find a good logic to parse the data
by friedo (Prior) on May 25, 2007 at 02:55 UTC
|
| [reply] |
Re: I would like to find a good logic to parse the data
by naikonta (Curate) on May 25, 2007 at 13:32 UTC
|
I appreciate your intent to roll your on, but I instead suggest to use Text::ParseWords, it's part of Perl standard distribution. You can learn the logic from there, or from other module suggested by other monks in this thread.
#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;
while (<DATA>) {
chomp;
my @parts = parse_line(',', 0, $_);
print join(' ', map { "[$_]" } @parts), "\n";
}
__DATA__
Submitted,"696,028","50,946","810,590","836,505","13,923,241","13,776,
+443","14,179,619","14,614,558","14,704,885","14,634,911","15,055,774"
+,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14
+,524,069"
Expired,245,275,273,248,240,295,353,316,371,398,387,352,310,288,405,27
+4,270
Less in,90.12%,90.49%,90.04%,89.55%,90.09%,90.63%,90.37%,90.48%,90.73%
+,90.59%,90.83%,90.40%,88.82%,90.71%,90.72%,90.69%,91.04%
Output:
[Submitted] [696,028] [50,946] [810,590] [836,505] [13,923,241] [13,77
+6,443] [14,179,619] [14,614,558] [14,704,885] [14,634,911] [15,055,77
+4] [15,127,534] [14,458,899] [14,403,378] [14,566,425] [14,644,406] [
+14,524,069]
[Expired] [245] [275] [273] [248] [240] [295] [353] [316] [371] [398]
+[387] [352] [310] [288] [405] [274] [270]
[Less in] [90.12%] [90.49%] [90.04%] [89.55%] [90.09%] [90.63%] [90.37
+%] [90.48%] [90.73%] [90.59%] [90.83%] [90.40%] [88.82%] [90.71%] [90
+.72%] [90.69%] [91.04%]
Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!
| [reply] [d/l] [select] |
Re: I would like to find a good logic to parse the data
by perleager (Pilgrim) on May 25, 2007 at 04:15 UTC
|
If you can't work around to getting a module to do this, perhaps you can just use this logic:
Read the Input file line by line, and if its the first line parse it in a different way. If not, then parse it normally by splitting the commas.
So if it detects its the first line, which is the "Submitted" values, then figure out some parsing method to read and print the values accordingly.
Perhaps use this logic:
($junk, $submit_values) = split(/Submitted,\"/, $first_line);
Then you'll be left with:696,028","50,946","810,590","836,505","13,923,241","13,776,443","14,17
+9,619","14,614,558","14,704,885","14,634,911","15,055,774"
,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14
,524,069"
Then you can parse the above by by splitting `","`:
my @values = split(/\",\"/, $submit_values);
foreach my $v (@values) {
print $v;
}
perleager | [reply] [d/l] [select] |
Re: I would like to find a good logic to parse the data
by greatshots (Pilgrim) on May 25, 2007 at 03:02 UTC
|
ooops, In our production server I am not allowed to load any modules. I need to use this parsing scripts, in our production server. thanks for the Idea. I will look into Text::CSV_XS module, and try my best. | [reply] |
|
| [reply] |
|
I am not allowed to load any modules
Core modules do not have to be installed so you may benefit from Text::Balanced.
Below code is not bullit proof but should be sufficient to process your data:
#!/usr/bin/perl
use strict;
use warnings;
use Text::Balanced qw(extract_quotelike);
sub getfields {
my ($str) = @_;
my @fields;
my $field = '';
while ($str) {
$field .= $str =~ s/^(\s*)// ? $1 :'';
my $extracted;
if ($str=~/^["']/) {
($extracted,$str) = extract_quotelike($str);
$field.=$extracted;
}
else {
($extracted,$str) = split(',',$str,2);
push @fields,$field.$extracted;
$field='';
}
}
return @fields;
}
while (my $line = <DATA>) {
chomp($line);
print "$_\t" foreach ( getfields($line) );
print "\n";
}
__DATA__
Submitted,"696,028","50,946","15,127,534","14,458,899"
Expired,245,275,273,248
Less in,90.12%,90.49%,90.04%,89.55%
Output is:
Submitted "696,028" "50,946" "15,127,534"
Expired 245 275 273 248
Less in 90.12% 90.49% 90.04% 89.55%
| [reply] [d/l] [select] |
|
# perl Makefile.PL PREFIX=/home/greatshots/perl5
# make test
# make install UNINST=1
Do so with all modules you need, and add the PATH's to your env $PERL5LIB
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
|