Optimizing a script

Micz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimizing a script by qq (Hermit) on Apr 19, 2004 at 12:20 UTC
`$/ = qq{"\$"\t""\t""\n};` [download] You've got too much here. It should probably be `qq{\$\t\t\n}`, or something similar. You've got `qq{}`, which is double quotes so interpolates, but then `""` wrapping the pieces inside. The interior quotes are left in. Try printing `$/` to see what you've got. Presumably you are not getting the DEBUG messages, which means its likely just trying to grab the whole file. Your previous post, also, doesn't mention the tabs in the record separator. You should take, as your sample file, the first three of entries from your main file. If it works on that it should work on the rest (or die with an error). Finally, build your script slowly. Comment out everything in the while loop. Add `print $_;`. If thats right, add the `my $file ...` line in. Print and check. Repeat. (Poor mans testing!). qq	[reply] [d/l] [select]
Re: Optimizing a script by matija (Priest) on Apr 19, 2004 at 12:17 UTC
I think your problem is in the line `$/ = qq{"\$"\t""\t""\n};` [download] I looked at the other thread and what you're using here does not match the data you described there. I have a feeling that your script is trying to find a match for a record separator that doesn't exist, and therefore tried to read the whole file into memory, and got hopelessly bogged down with swapping as a consequence. Your record separator isn't `"$"<TAB>""<TAB>""(newline)` is it? I think you need to get rid of all those `"` characters, for starters...	[reply] [d/l]
Re: Re: Optimizing a script by Micz (Beadle) on Apr 21, 2004 at 15:03 UTC
Hi Matija, everyone thanks for your help. I changed the regexp to `qq{"\$"\t""\t""};`, that solved my first problem. Now, the script won't give the files the names I want, it continues to overwrite .csv. I have changed the script a bit to `my $i = 1; while ( <A> ) { my $file = ( split /\n/, $_ )[1] or next; ( $file ) = $file =~ m/"(.*?)"/ or next; $file = $i; open B, "> $file.csv" or warn( "Cannot open '$file': $!" ), next; print "[DEBUG] '$file': open ok\n"; chomp $_; print B $_; close B or warn( "Cannot close '$file': $!" ), next; print "[DEBUG] '$file': close ok\n"; $i++; }` [download] Which gives the files the names 1, 2, 3 etc. But I would like the filenames to be the name of the company. How can I achieve this? Thanks, Jan	[reply] [d/l] [select]
Re: Optimizing a script by aquarium (Curate) on Apr 19, 2004 at 12:42 UTC
why are you now delimiting (or trying to split) records on a $<TAB><TAB><NEWLINE> not even sure that that's what's it's trying to split on, as you've got quotes in the wrong places there "...`$/ = qq{"\$"\t""\t""\n};`..." To cut to the chase: the program was trying to slurp the whole file, as it couldn't find the record delimiter, and was trying real hard to do so. Here's a simple solution...please provide a proper sequence of characters for the regex to look for to find the end of each record, if it's not the original "$" (dollar sign on a line by itself) `while($line=<>) { chomp $line; if($line =~ /^\$$/) { #look for start of line, followed by a lite +ral dollar sign, followed by the end of the line (not newline charact +er, but $ has to be at the end of $line) close OUTFILE; undef $data; next; } if(!$data) { open(OUTFILE,">$line") or die; } else { $data = 1; print OUTFILE "$line\n"; } }` [download] this is nice and simple (procedural style coding) that you will find easier to understand. run the script "perl script.pl <your_data_file"	[reply] [d/l] [select]
Re: Re: Optimizing a script by aquarium (Curate) on Apr 19, 2004 at 12:52 UTC
btw...processing the file this way, one line at a time, is very friendly to your computer's memory. it will ever only hold a single line from your input file into memory. So it doesn't matter if input file has 50.000 lines for a particular section you want to separate out, or if it has 5 lines. Doing the "tricky" thing of changing your record delimiter would eat up lots of memory if a data section was 50,000 lines. the problem would be worse if you had malformed record delimiters e.g. "$\t\n", which is harder to debug, as the program would just fail, rather than producing (not exactly what you want) kind of output.	[reply]
Re: Re: Re: Optimizing a script by Fletch (Bishop) on Apr 19, 2004 at 13:09 UTC
`for( $above_post ) {s/lines/records/g; s/single line/single record/g} # </pedantic>` [download] Update: Never mind me, I misread and thought this was in reply to the original post which was processing by records. However if you do the record processing it's more state you have to keep up with yourself rather than letting perl handle it for you. Not to mention that reading by lines isn't necessarily going to protect you from malformed input any better (for example someone sends you a multi-meg file with Mac \cM line endings . . .).	[reply] [d/l]
Re: Re: Re: Re: Optimizing a script by aquarium (Curate) on Apr 19, 2004 at 13:25 UTC
Re: Optimizing a script by ysth (Canon) on Apr 19, 2004 at 15:47 UTC
Obviously your problem is not just performance, but the first two lines of your loop could be combined (untested): `my ($file) = /\n.?"(.?)".?\n/ or next;` [download] (If your data allows, remove the final .?\n)	[reply] [d/l]
Re: Optimizing a script by thor (Priest) on Apr 19, 2004 at 12:26 UTC
Does it even print out the [DEBUG] messages? If not, then I'd suspect that there's something wrong (i.e. it's not processing your file at all). As far as perl is concerned, there is no difference between a small and a large file when you're reading a chunk at a time... thor	[reply]
Re: Optimizing a script by Anonymous Monk on Apr 20, 2004 at 22:57 UTC
You may want to consider the following simple solution: `#!/usr/bin/perl -w use strict; open A, '<export_gesamt.asc' \|\| die "Failed to open: $!\n"; my $line = '$'; do { if ( $line eq /^\$\s+$/ ) { my $file = <A>; open ( OUT ">$file" ) \|\| warn "Failed to create file '$file': $!\n"; print "Writing $file\n"; } else { print OUT $line; } } while ( $line = <A> );` [download]	[reply] [d/l]