Re: Simple line parse question
by AnomalousMonk (Archbishop) on Aug 07, 2010 at 01:33 UTC
|
Don't know about speed, but my own preference would be for something along the lines of:
>perl -wMstrict -le
"my $s = 'aa b: CCC DD. eee, ff ggg?';
my $word = qr{ [[:alpha:]]+ }xms;
my @words = $s =~ m{ $word }xmsg;
my $result = join q{}, @words[2,3];
print qq{'$result'};
"
'CCCDD'
This allows better definition and control of what a 'word' is.
Updates:
-
One can also avoid the intermediate @words array as in the OP with the slightly faster
my $result = join q{}, ($s =~ m{ $word }xmsg)[2,3];
-
Improved code example slightly to try to show that naive splitting on whitespace might produce unintended results. Better, IMO, to define and extract the thing itself rather than try to define and eliminate everything you're not interested in.
| [reply] [d/l] [select] |
Re: Simple line parse question
by GrandFather (Saint) on Aug 07, 2010 at 01:45 UTC
|
Why do you think the solution you have provided is inadequate to the task? Maybe if you tell us something of the bigger problem we can help you find a better higher level solution?
True laziness is hard work
| [reply] |
Re: Simple line parse question
by nvivek (Vicar) on Aug 07, 2010 at 04:28 UTC
|
Yeah,you could do with split and join but it needs two functions.Instead, you can use the simple regular expression to achieve it.You check the following code.I have taken the space as a delimiter between each word in the string.
use strict;
use warnings;
my $string="one two three four five six";
$string=~/^(\w+ ){2}(\w+) (\w+)/;
print $2$3; #it prints the words correctly as you expected
| [reply] [d/l] |
Re: Simple line parse question
by Marshall (Canon) on Aug 07, 2010 at 10:26 UTC
|
Can anyone tell be a better/faster way to do this?
To me "better" means more clear. The number one goal of software should be clarity..."hey, is it easy to understand what this code does?"
Performance is usually a secondary goal. However strange as it may be, if your code is clear, you will often achieve high performance.
Search for "benchmark" and you will find ways to measure the performance of version X vs Y.
Your code: join("", (split(" ", $line, 5))[2,3])
is not easy to understand. Do not mistake fewer lines as meaning higher performance.
I think the following is clear and works well. Don't be shy about giving some intermediate variable a name.
#!/usr/bin/perl -w
use strict;
my $input = "a b c d e f g";
my @words = split(/\s+/,$input);
print @words[2,3], "\n";
__END__
Prints:
cd
| [reply] [d/l] [select] |
Re: Simple line parse question
by jimmy.pl (Initiate) on Aug 07, 2010 at 11:14 UTC
|
echo "a b c d e f g" | awk '{print $3$4}'
I find it hard to believe that my split&join solution is the fastest perl has to offer to achieve this. This one little line of code in my script is actually turning out to be quite the performance hotspot. So i thought, why not ask here to see if there's a faster way that i'm unaware of. I've already try the following, but they're all slower than my split&join:
1: ... | perl -ne 'printf("%s%s", (split(" ", $_, 5))[2,3]);'
2: ... | perl -ne 'print /(?:\S+ ){2}(\S+) (\S+)/
3: ... | perl -ane 'print "$F[2]$F[3]";'
4: I even wrote my own subroutine using index/substr to extract what i
+ need ...
I guess i'm hoping someone will introduce me to a new technique. We can't let the awk'ers have this one so easily can we?
| [reply] [d/l] [select] |
|
jimmy.pl:
We can't let the awk'ers have this one so easily can we?
Keep in mind that awk is a more specialized tool than perl, so it's really not important if awk can do some things faster than perl. It's fine to care about runtime speed, but it can waste your time. Until a program must be faster, spending time optimizing it is simply a waste of your own time. If you enjoy working overtime, then have at it. But I find it better to spend that time with family, friends, goofing off, etc.
Remember: first make it work. Then make it work correctly. Next, check if it meets requirements. If, and only if, it fails to meet speed requirements, make it faster.
...roboticus
Assembly language: Fun and runs fastest!. I haven't had to use it since around 1995.
C/C++: Fun and runs fast! I use it for everything I need to make faster.
Perl: Fun and fastest to write! Fast enough runtime for 95+% of everything I do.
| [reply] |
|
I think that roboticus is "on it"!
From my experience, the coding efficiency of Perl vs C is in the range of 3x-10x:1. Recoding a 5 page C program into a one page Perl program that achieves the same functionality would not be a surprising result.
The Perl program will run at something like <1/3 the speed of the C program, but often (and VERY often), this does not matter at all! Perl OO vs say C++ is a different thing and it has an additional performance penalty.
My only slight "nit" with this would be about assembly. In the past decade, the C "super optimizing" compilers have become so good, that you have to be a real guru at ASM to beat them. It is possible to do for very focused tasks, but it is certainly not easy! Some folks can actually wind up writing slower ASM code than the compiler can do.
| [reply] |
|
|
|
| [reply] |
|
Whoa!
This is very "awk_weird"
Give us an input file and an expected result.
| [reply] |
|
xxx@xxx:~/test/perl$ seq 100 1000000 | perl -ne 'print int(rand($_)),
+"\n"' | xargs -n10 echo > a
xxx@xxx:~/test/perl$ wc -l a
99991 a
xxx@xxx:~/test/perl$ for i in {1..100}; do cat a; done > b
xxx@xxx:~/test/perl$ wc -l b
9999100 b
xxx@xxx:~/test/perl$ cat b | time -p awk '{print $3$4}' > /dev/null
real 8.78
user 7.89
sys 0.38
xxx@xxx:~/test/perl$ cat b | time -p perl -ne 'print join("", (split("
+ ", $_, 5))[2,3]),"\n";' > /dev/null
real 13.78
user 12.93
sys 0.32
| [reply] [d/l] |
|