Simple line parse question

jimmy.pl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Simple line parse question by AnomalousMonk (Archbishop) on Aug 07, 2010 at 01:33 UTC
Don't know about speed, but my own preference would be for something along the lines of: `>perl -wMstrict -le "my $s = 'aa b: CCC DD. eee, ff ggg?'; my $word = qr{ [[:alpha:]]+ }xms; my @words = $s =~ m{ $word }xmsg; my $result = join q{}, @words[2,3]; print qq{'$result'}; " 'CCCDD'` [download] This allows better definition and control of what a 'word' is. Updates: One can also avoid the intermediate `@words` array as in the OP with the slightly faster `my $result = join q{}, ($s =~ m{ $word }xmsg)[2,3];` Improved code example slightly to try to show that naive splitting on whitespace might produce unintended results. Better, IMO, to define and extract the thing itself rather than try to define and eliminate everything you're not interested in.	[reply] [d/l] [select]
Re: Simple line parse question by GrandFather (Saint) on Aug 07, 2010 at 01:45 UTC
Why do you think the solution you have provided is inadequate to the task? Maybe if you tell us something of the bigger problem we can help you find a better higher level solution? True laziness is hard work	[reply]
Re: Simple line parse question by nvivek (Vicar) on Aug 07, 2010 at 04:28 UTC
Yeah,you could do with split and join but it needs two functions.Instead, you can use the simple regular expression to achieve it.You check the following code.I have taken the space as a delimiter between each word in the string. `use strict; use warnings; my $string="one two three four five six"; $string=~/^(\w+ ){2}(\w+) (\w+)/; print $2$3; #it prints the words correctly as you expected` [download]	[reply] [d/l]
Re: Simple line parse question by Marshall (Canon) on Aug 07, 2010 at 10:26 UTC
Can anyone tell be a better/faster way to do this? To me "better" means more clear. The number one goal of software should be clarity..."hey, is it easy to understand what this code does?" Performance is usually a secondary goal. However strange as it may be, if your code is clear, you will often achieve high performance. Search for "benchmark" and you will find ways to measure the performance of version X vs Y. Your code: `join("", (split(" ", $line, 5))[2,3])` is not easy to understand. Do not mistake fewer lines as meaning higher performance. I think the following is clear and works well. Don't be shy about giving some intermediate variable a name. `#!/usr/bin/perl -w use strict; my $input = "a b c d e f g"; my @words = split(/\s+/,$input); print @words[2,3], "\n"; __END__ Prints: cd` [download]	[reply] [d/l] [select]
Re: Simple line parse question by jimmy.pl (Initiate) on Aug 07, 2010 at 11:14 UTC
Thanks for comments. My goal is basically to achieve similar timing to the following in awk: `echo "a b c d e f g" \| awk '{print $3$4}'` [download] I find it hard to believe that my split&join solution is the fastest perl has to offer to achieve this. This one little line of code in my script is actually turning out to be quite the performance hotspot. So i thought, why not ask here to see if there's a faster way that i'm unaware of. I've already try the following, but they're all slower than my split&join: `1: ... \| perl -ne 'printf("%s%s", (split(" ", $_, 5))[2,3]);' 2: ... \| perl -ne 'print /(?:\S+ ){2}(\S+) (\S+)/ 3: ... \| perl -ane 'print "$F[2]$F[3]";' 4: I even wrote my own subroutine using index/substr to extract what i + need ...` [download] I guess i'm hoping someone will introduce me to a new technique. We can't let the awk'ers have this one so easily can we?	[reply] [d/l] [select]
Re^2: Simple line parse question by roboticus (Chancellor) on Aug 07, 2010 at 13:03 UTC
jimmy.pl: We can't let the awk'ers have this one so easily can we? Keep in mind that awk is a more specialized tool than perl, so it's really not important if awk can do some things faster than perl. It's fine to care about runtime speed, but it can waste *your* time. Until a program must be faster, spending time optimizing it is simply a waste of your own time. If you enjoy working overtime, then have at it. But I find it better to spend that time with family, friends, goofing off, etc. Remember: first make it work. Then make it work correctly. Next, check if it meets requirements. If, and only if, it fails to meet speed requirements, make it faster. ...roboticus Assembly language: Fun and runs fastest!. I haven't had* to use it since around 1995.* C/C++: Fun and runs fast! I use it for everything I need to make faster. Perl: Fun and fastest to write! Fast enough runtime for 95+% of everything I do.	[reply]
Re^3: Simple line parse question by Marshall (Canon) on Aug 09, 2010 at 10:06 UTC
I think that roboticus is "on it"! From my experience, the coding efficiency of Perl vs C is in the range of 3x-10x:1. Recoding a 5 page C program into a one page Perl program that achieves the same functionality would not be a surprising result. The Perl program will run at something like <1/3 the speed of the C program, but often (and VERY often), this does not matter at all! Perl OO vs say C++ is a different thing and it has an additional performance penalty. My only slight "nit" with this would be about assembly. In the past decade, the C "super optimizing" compilers have become so good, that you have to be a real guru at ASM to beat them. It is possible to do for very focused tasks, but it is certainly not easy! Some folks can actually wind up writing slower ASM code than the compiler can do.	[reply]
Re^4: Simple line parse question by roboticus (Chancellor) on Aug 09, 2010 at 13:45 UTC
Re^5: Simple line parse question by Marshall (Canon) on Aug 24, 2010 at 17:37 UTC
Re^2: Simple line parse question by Anonymous Monk on Aug 07, 2010 at 11:46 UTC
I find it hard to believe that my split&join solution is the fastest perl has to offer to achieve this. Believe, is that Swahili for Benchmark?	[reply]
Re^2: Simple line parse question by Marshall (Canon) on Aug 07, 2010 at 12:54 UTC
Whoa! This is very "awk_weird" Give us an input file and an expected result.	[reply]
Re^3: Simple line parse question by jimmy.pl (Initiate) on Aug 07, 2010 at 16:21 UTC
You can generate the input yourself. For example: `xxx@xxx:~/test/perl$ seq 100 1000000 \| perl -ne 'print int(rand($_)), +"\n"' \| xargs -n10 echo > a xxx@xxx:~/test/perl$ wc -l a 99991 a xxx@xxx:~/test/perl$ for i in {1..100}; do cat a; done > b xxx@xxx:~/test/perl$ wc -l b 9999100 b xxx@xxx:~/test/perl$ cat b \| time -p awk '{print $3$4}' > /dev/null real 8.78 user 7.89 sys 0.38 xxx@xxx:~/test/perl$ cat b \| time -p perl -ne 'print join("", (split(" + ", $_, 5))[2,3]),"\n";' > /dev/null real 13.78 user 12.93 sys 0.32` [download]	[reply] [d/l]
Re^4: Simple line parse question by Marshall (Canon) on Aug 09, 2010 at 02:24 UTC


go ahead... be a heretic
	PerlMonks