steamerboy has asked for the wisdom of the Perl Monks concerning the following question:

I shall make no bones about it - I am very unexperienced with perl! Anyway this is what i want to do;
I have a lot of files (2-40).html and in each file there is a large list of names that looks like;
AAvBB
AAvT
... generally of the form [one or two letters] a 'v' and then [one or two letters]. each xxvxx name is unique and case sensitive
what i want to change this to is
(some html code 1 AAvBB) (some html code 2 AAvBB)
(some html code 1 AAvT) (some html code 2 AAvT)
...
so for each xxvxx i want to embedd it multiple times into a bit of html. I have a very large number of these to do so I assume I can use a regex for ??v?? - and have it as a variable i can print back into the file. However I am having a tough time with it.
Any help is most welcome!
Thank you

2005-02-15 Janitored by Arunbear - converted square brackets to entities, to avoid creation of bogus links

Replies are listed 'Best First'.
Re: A text replacement question
by holli (Abbot) on Feb 15, 2005 at 12:31 UTC
    Try a one-liner.
    perl -pni.bak -e "s/1 (\w{1,2}v\w{1,2})/2 $1/" filename
    and call that for every file.
    Under windows you can say:
    c:\> for %f in (*.html) do perl -pni.bak -e "s/1 (\w{1,2}v\w{1,2})/2 $ +1/" %f
    I am not sure how to achieve the same under linux.
    Note the %f is a shell-wildcard, not a perl hash.


    holli, /regexed monk/


    Update:
    Silly me. I had in mind -p just prints $_ but it assumes loop like -n but print line also, like sed.
    Thanks for clarifiying Animator.

    Update:
    Animator got me to read you're question more carefully. I think your more after something like this:
    perl -pi.bak -e 's/\b(\w{1,2}v\w{1,2})\b/g<a href="$1.html">$1<\/a>/" +filename
    Added a /g modifier just in case there can be more occurances in one line.

      Hmm, I'm not sure if that's exactly what he wants.. I think he is looking to add it multiple times (which would ofcourse only mean that the replacement part of the s/// changes into $1$1$1... (or $1 x 3) (for example).

      simple, you should be able to use *.html as filename...

      Small note, why do you use '-pn'? -p: print, -n: don't print... -p should do just fine

        hmm the more ive read on this the more im unsure of what the solution would look like. somewhere im expecting to see something that for each
        $1= *v* <v> gives code1 $1 code2 $1 and then so on for each *v* in each file... - (steamerboy)
      well the 'names' i mentioned arent links - theyre names for values that get recorded. The code for the checkboxes is like
      (code value=1 name =xvx) (code value=2 name= xvx)... and so on for each checkbox. So the idea is for the code to get any ? v ? and replace it with itself embedded in the code - saving me 500 years of time! Thanks for the help - much aprreciated! (steamerboy)
        It would be much easier if you gave real examples of the input and the expected output...
Re: A text replacement question
by tphyahoo (Vicar) on Feb 15, 2005 at 12:59 UTC
    One liners make me nervous, so here is a way to get at all the html files in a directory, works on windows with ActiveState, maybe works on unix as well... hope this helps!
    #use strict; #use warnings; my ($dirname, $file, %words); opendir(DIR, ".") or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { if ($file =~ /\.html?/) { open my $F, $file or die "Could not open $file\n"; while (<$F>) { #do stuff s/ (?<= # Look back-group weak # Match the word weak <INPUT TYPE=radio NAME= # tag ( # Capuring group: $1 / \1 \w{2}\d\w{2} # ) # End-capture VALUE=\d> # tag (?: # Non capturing group <INPUT TYPE=radio NAME= # tag \1 # It's possible that this should be $ +1, as I said, the code is untested. VALUE=\d> ){6} # There are 7 input tags, one already + matches, so 6 to go. ) # End look-back (?= # Look-ahead-group strong ) # End look-ahead / the text you want to insert /ixg; } close $F; } } closedir(DIR);
    UPDATE: Integrated holli's fix and animator's regex code from below. (untested!)
      Note that your regex also matches
      • .htmanagerrc
      • foo.htm.old
      • bar.htmx
      Better written as:
      if ($file =~ /\.htm(l)?$/) {


      holli, /regexed monk/
Re: A text replacement question
by steamerboy (Initiate) on Feb 15, 2005 at 15:27 UTC
    Actually, I think another approach would be more suitable. I have actually just thought this up now- sorry for not posting origionally!
    What I have is code for each instance of ?v? in each file that appears as;

    weak<INPUT TYPE=radio NAME=13v13 VALUE=1><INPUT TYPE=radio NAME=13v13 VALUE=2><INPUT TYPE=radio NAME=13v13 VALUE=3><INPUT TYPE=radio NAME=13v13 VALUE=4><INPUT TYPE=radio NAME=13v13 VALUE=5><INPUT TYPE=radio NAME=13v13 VALUE=6><INPUT TYPE=radio NAME=13v13 VALUE=7>strong

    I want to extend this for there to be eleven check boxes- so effectively I want to replace the above code for each ?v? for the range going from VALUE=0 to VALUE=10. So I want to replace each instance of this code - for each ?v? - with the correct ?v? name in each line.
    - Thankyou

      (Update) code at the top of this post does NOT work

      perl -pi.bak -e 's/(?<=weak<INPUT TYPE=radio NAME=(\w{2}\d\w{2}) VALUE +=\d>(?:<INPUT TYPE=radio NAME=\1VALUE=\d>){6})(?=strong)/insert($1)/e +ig; BEGIN { sub insert { my $name = shift; return "value 8, value 9, value + 10"; } }' *.html

      Explenation of regex:

      s/ (?<= # Look back-group weak # Match the word weak <INPUT TYPE=radio NAME= # tag ( # Capuring group: $1 / \1 \w{2}\d\w{2} # ) # End-capture VALUE=\d> # tag (?: # Non capturing group <INPUT TYPE=radio NAME= # tag \1 # It's possible that this should be $1, as I +said, the code is untested. VALUE=\d> ){6} # There are 7 input tags, one already matches +, so 6 to go. ) # End look-back (?= # Look-ahead-group strong ) # End look-ahead / the text you want to insert /ixg;

      Big note, the code in the explenation will not work properly, because the x-modifier is in use and I did not escape the whitespace!

      Update: after reading the reply of Anonymous Monk, I decided to test it... the variable lookbehind error comes from \1. This can be fixed by using (?=\1).{5} instead of \1, but then another errors shows up: 'Lookbehind longer than 255 not implemented'. This makes it impossible to use a look-behind...

      So there is only one thing left to do and that is making it a 'real' group of it...

      Then the short (and tested (or atleast on simple data)) version would become:
      perl -pi.bak -e 's/(weak<INPUT TYPE=radio NAME=(\w{5}) VALUE=\d>(?:<INPUT TYPE=radio NAME=\2 VALUE=\d>){6})(?=strong)/$1 . insert($2)/eig; BEGIN { sub insert { my $name = shift; return "value 8, value 9, value 10"; } }' *.html

      The differences:

      • A capturing group is used,
      • $1 and \1 changed into $2 and \2,
      • $1 . insert($2),
      • The input pattern is corrected (I used \d where it should have been 'v')
      • A space is added between \2 and value (it was missing)

        hmm i get this error; Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<=weak<INPUT TYPE=radio NAME=(\w{2}\d\w{2}) VALUE +=\d>(?:<INPUT TYPE=radio NAME=\1VALUE=\d>){6})(?=strong) <-- HERE / at -e line 2. what does this mean? Thanks for the help! Great stuff. (steamerboy)