Re: Function call in regex replacement string

I'm not quite sure what needs to be accomplished here. Consider the following code. Note that within a regex, you can use a $var in place of a fixed regex. s/$find/$replace/;

In general I think you will discover that:
1) regex is better than pack/unpack (except for specific cases where you know that columns are guaranteed to line up - and even then just use regex to skip columns and use list slice to get what you want. Of course pack/unpack are necessary when editing binary files, but in general this is not the right way to go.

2)indexed variables are almost never necessary in Perl. One magic thing of Perl is it reduces the probability of "off by one errors"

3)prefer foreach (@array){..} over any kind of C style for loop. I write one C style for loop per about 5K lines of Perl.

#!usr/bin/perl -w
use strict;

while (<DATA>)
{
   my $last_first_digit = ($_=~ m/(\d)\d*\s*$/)[0];
   # print "$last_first_digit\n";     #for debugging
   print "<exML LOLZ = \"$last_first_digit\"/>\n";
}

__DATA__
HAI WORLD Times 0
HAI WORLD Times 1
HAI WORLD Times 2
HAI WORLD Times 3
HAI WORLD Times 4
HAI WORLD Times 5
HAI WORLD Times 6
HAI WORLD Times 7
HAI WORLD Times 8
HAI WORLD Times 9
HAI WORLD Times 10
HAI WORLD Times 11
HAI WORLD Times 12
HAI WORLD Times 13
HAI WORLD Times 14
HAI WORLD Times 15
HAI WORLD Times 16
HAI WORLD Times 17
HAI WORLD Times 18
HAI WORLD Times 19
HAI WORLD Times 20
HAI WORLD Times 21
HAI WORLD Times 22
HAI WORLD Times 23
HAI WORLD Times 24
HAI WORLD Times 25
==============
this prints:
<exML LOLZ = "0"/>
<exML LOLZ = "1"/>
<exML LOLZ = "2"/>
<exML LOLZ = "3"/>
<exML LOLZ = "4"/>
<exML LOLZ = "5"/>
<exML LOLZ = "6"/>
<exML LOLZ = "7"/>
<exML LOLZ = "8"/>
<exML LOLZ = "9"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "1"/>
<exML LOLZ = "2"/>
<exML LOLZ = "2"/>
<exML LOLZ = "2"/>
<exML LOLZ = "2"/>
<exML LOLZ = "2"/>
<exML LOLZ = "2"/>
[download]

I think you just need to get the right regex to feed a simple loop and that will do what you want.

Comment on Re: Function call in regex replacement string Download Code

Replies are listed 'Best First'.
Re^2: Function call in regex replacement string by PoorLuzer (Beadle) on Feb 25, 2009 at 09:34 UTC
I agree with two of your points. I vehemently disagree with the first one : "regex is better than pack/unpack". Almost never is regex is better than pack/unpack Almost never is even substr is better than pack/unpack Please go ahead and disagree with facts on the table. What would ever make you things like that? In fact this post gives me the idea for some discussion I have been wanting to have for a long time.. the misuse of things like regex and substr etc which makes PERL ever so slow in runtime than it really should be.	[reply]
Re^3: Function call in regex replacement string by Marshall (Canon) on Feb 25, 2009 at 10:23 UTC
Well like I said, except in cases where you know for sure that you have a fixed column alignment. Virtually all the data that I work with does not have fixed byte alignment. And in some of the files I work with, even if alignment is "fixed", the alignment shifts when some new release comes out of the other program. There are always trade-offs between efficiency and maintainability, etc. I've done some testing with the regex engine in Perl 5.10 vs Perl 5.8 and earlier....its a LOT faster now. I've got one application that does a LOT of I/O and I've been considering using Storable for intermediate steps. This of course uses byte stream (and pack/unpack) to dump and re-create internal Perl structures. At the end of the day, final output will be in ASCII format of some type. Most performance issues that I've found can be traced to improper algorithm or just flawed implementation. Perl allows very sophisticated algorithms to be implemented quickly and better algorithms can make a big difference! I can write Perl code about 5-10x faster than in C. Code runs maybe 1/3 the speed of C. So there are trade-offs! I've seen some really bad code here on Monks and some of it will run just like a "herd of turtles". Sometimes that doesn't matter and sometimes it does! So I guess this a "your mileage may vary" sort of thing. Update:Now that I think more about this, misuse of OO techniques is probably a far greater performance hit. The OO performance hit is about 30%. This stuff is great for DB, GUI, but I've seen some situations where it is just plain goofy.	[reply]
Re^4: Function call in regex replacement string by PoorLuzer (Beadle) on Feb 25, 2009 at 12:13 UTC
Yes you are completely correct in your scenario. Its much better to use regex when unpack can be often thrown out because of changing length... One question : Do you embed your regexes into code, or you put them as variables? I have never written PERL code that has run 1/3rd at the speed of similar C code - usually the values I get are around 1/20th median. Best I got was 1/10th. But I believe you. I had also read an article (but now I forget its contents and URI) that showed how we can optimize PERL by giving it hints and making appropriate use of functions.. I wold post a thread soon on this asking for more information... do you for example have any? (For eg, favouring unpack over substr etc) I frankly believe GUI makes a very bad case for OO (Object Oriented is what I mean), but yes, many things can be done better without OO, specially in cases when a UML digram was its predecessor.	[reply]