Hello monks. I was at a party last weekend where someone told me perl was extremely slow and it is wiser to use the shell. Anyways, the guy at the party told me a program he and a coworker (who he claimed knew perl well) wrote which ran in 3 secs in awk and c and 3 minutes in perl. The program parsed a text file which was 1 million lines, grabbed the hundredth field (each field was delimited with commas), and returned the sum of the hundredth field for the file. As you can imagine I was very skeptical of this claim. Not having that much experience in perl, I told one of our senior programs at work and he set out to prove him wrong.
The coworker's (he is a better perl programmer than me) perl and c code is below. The best we were able to do is to get perl to run about 1/20 the speed of c. We are running on red hat linux with perl version v5.8.4 and gcc version 3.2.3. We used the linux time command to bench mark our results. As you can see (by the comments) we tried a couple of different ideas. Are we missing anything which would make our program faster? Thanks for your help.
Erik
#!/usr/bin/perl -w
#use integer;
my $file = 'test_data.dat';
open (IN,$file) or die "Can't open file - $file - $!";
my $data;
print "Reading: [$file]\n";
my $val = 0;
my @arr;
while (<IN>){
/(?:\d+,){99}(\d+),/;
$val += $1;
#print "$1\n";
#@arr = split(',', $_);
#$val = $arr[99];
#print "$val\n\n";
print STDERR "Working on: [$.]:[$val]\r" unless ($. % 100_000);
}
print STDERR "Final: [$val]\n";
exit;
#include "stdio.h"
int main(){
char buf[100000], *ptr, *ptr2;
FILE *fptr;
int count = 0, i;
long total = 0l;
fptr = fopen("test_data.dat", "r");
if (!fptr){
printf ("Can't open test_data.dat\n");
exit(1);
}
while (fgets(buf,100000,fptr)){
/* printf("Read: [%s]\n", buf); */
ptr = buf;
for (i = 0; i < 99; i++){
ptr++;
ptr = index(ptr,',');
/* printf("Read: [%d]:[%s]\n", i, ptr); */
}
ptr++;
ptr2 = index(ptr,',');
*ptr2 = '\0';
total += atol(ptr);
/* printf("Read: [%s][%ld]\n", ptr, total); */
count++;
}
printf ("Read %d records, total: [%ld]", count, total);
}
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.