Re: regexp problems

You are solving this problem the wrong way. I normally use Text::CSV_XS for doing CVS reading/writing. It's dirt easy. In fact, it is so easy that I've just copied in some sample source for you to look at. If you want to go faster then you can start feeding it a filehandle to read from and other various tricks. Those are up to you and you'll ahve to read the documention on that.

#!/usr/bin/perl
use Text::CSV_XS;
use strict;
use warnings;

$| = 1;

my $c = Text::CSV_XS->new;

while (my $line = <>) {
    $c->parse($line);
    my @fields = $c->fields;
    if (1 < @fields) {
        $line = join("\t",@fields)."\n";
        $line =~ s/\\//g;
        print STDOUT $line;
    }

    else {
        print STDERR $line;
    }
}
[download]

__SIG__
use B;
printf "You are here %08x\n", unpack "L!", unpack "P4", pack
  "L!", B::svref_2object(sub{})->OUTSIDE;
[download]

Comment on Re: regexp problems Select or Download Code

Replies are listed 'Best First'.
Re: Re: regexp problems by Anonymous Monk on Nov 27, 2002 at 08:22 UTC
thank you all for your help:) I greatly appreciate it. The input file is around 500 megs. I was given the wrong specs, and didn't know I had to maintain the old delimeter characters within a string. The very fact that its nested just kills the script. I chose Text::Parsewords over Text::CSV because it doesn't require the people who would use the script to grab the module. Suprisingly, &parse_line() in Text::parsewords does -exactly- what I need...it can strip out double quotes, and backslashes. It also maintains the commas within the string fields. This means all I really have to do is call &parse_line() and join it back together on the new delimeter. The code is extremely clean, and easy to implement, but its just too slow. As it stands, the script takes a little over an hour to run. If I didn't have to worry about nesting the script would run in only a couple of minutes. The reason why I came here is because I have seen some of you guys do some sick sick derranged golfing. Its never pretty but it usually hauls ass:) I have learned a couple tricks from here for speeding stuff up over the years, but I still don't hold a candle to most of the pro golfers. I was hoping someone had an idea for boosting the speed.	[reply]
Re^3: regexp problems by diotalevi (Canon) on Nov 27, 2002 at 08:27 UTC
In this case you do use Text::CSV_XS for the speed. The thing is - the core routines are coded in C and are supposed to be fast. That's what the '_XS' part of the name sort of implies. So for your case you ought to go get the module since it's a speed issue. I didn't reply with that information last time just for kicks. I normally process several gig files through with this and it's definately a help to use the fast module over other things. `__SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;` [download]	[reply] [d/l]