in reply to with better example
in thread replacing text in specific tags

Abigail-II is completely right about wasting time of others by not supplying the real example in the first place. Secondly, please use <code>-tags; they make code and input/output much more readable.
Anyways, the following should work for you:
#!perl use strict; use warnings; open(IN, "<temp.in") or die "Can't open file temp.in for input: $!\n" +; open(OUT,">temp.out") or die "Can't open file temp.out for output: $!\ +n"; for (<IN>) { s{<X-REF REFID="FN(\d+)">([^<]*)</X-REF>([^<]*)<FTN ID="FN\1">([^<] ++)</FTN>} {<X-REF REFID="FN$1">$2<FTN>$4</FTN></X-REF>$3}; print OUT; }

Replies are listed 'Best First'.
Re: Re: with better example
by texuser74 (Monk) on Aug 22, 2003 at 01:15 UTC
    Hi, Sorry for not providing a proper example at first time. Thank you very much for your kind support. I receive compilation errors while using your code, since i am a new user, i really don't know how to debug. but the following code works fine and i get my required output. just share with you i have pasted the code i am using now.
    open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; open(OUT,">temp.out"); ## Search for FTN my %values; # read through the input fileto get the values while(<IN>) { if(/<FTN ID="FN(\d{3})">(.+?)<\/FTN>/) { #no need to quote <, >, + ", = $values{$1} = $2; } } close(IN); open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; ## replace X-REF while(<IN> ) { s/<FTN.+?<\/FTN>//; # get rid of existing <FTN> tags - we alrea +dy have the values from them s/(<X-REF REFID="FN(\d{3})">.*)(<\/X-REF>)/$1<FTN>$values{$2}<\ +/FTN>$3/; # alter the X-REF LINE print OUT; } close(IN); close(OUT);
    Once again thanks for your support.

    edited by ybiC: Replace opening <pre> tag with <code> as per Monastery convention *and* add closing tag for same

      texuser, while your code is correct and it is great that you now found a solution on your own, I do have some thoughts to share with you on your code:
      • When checking whether opening of files was successful, you should provide the name and possibly the mode, i.e. instead of writing open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; rather write something like open(IN, '<', "temp.in") or die "Error opening file temp.in for input: $!\n";. This helps you (and the user) fixing the error more quickly as the code grows and you possibly migrate into another environment where your read/write access rights vary.
      • You shoud do this error checking for all files that you open, especially for output files, since most of the times when you have the right to write, you do have the right to read, but not neccessarily vice versa.
      • You should open a file only when you need it, i.e. open the output file after the second thime you open the input file.
      • Some RegEx efficiency issues: In the first RegEx [^<]+ would be faster than .+?, since the RegEx engine does not have to check the rest of the RegEx (well, the next character) before proceding to the next possible number of charachters. In the second RegEx, something similar for .*: You might want to substitute it by [^<]+, so that the RegEx engine oes not have to waste time on backtracking. But those are only speed considerations, which just matter if you have large or numerous files.
      • You open and read the input file twice. Remember that disk read/write operations are much slower than the execution of a piece of program code. If possible, you should try to read the file only once.
      Last thing: I tested both your and my code on a file named "temp.in", and both worked. The input file consisted of your three input lines, the output file of your three output lines in both cases. Considering the sbove, if you would like to get the program running, as to parse your input file in one go, I'llbe happy to help you.
      Oh, yes, and a ++ for you for putting together your own piece of code that works for you.
      Cheers, CombatSquirrel.
        Hi, Thanks for your comments. One more small doubt i have.

        here is the input data

        <input> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </input> <output> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </output>
        i.e. i want to make the <p>...</p> as single line. i mean delete the carrage returns only inside <p>...</p> my following code does the job, but only for the last <p>...</p>. i don't know how to loop it here. pls suggest
        $infile = $ARGV[0]; open(IN, '<', "temp.in") || die "\nCan't open temp.in \n"; open(OUT, '>' "temp.out"); $/=""; while(<IN>) { if($_=~s/(.*)&lt;p&gt;(.*)\<\/p\>(.*)//ms) { $pre = $1; $par = $2; $pos = $3; $par=~s#\n# #ig; print OUT "$pre&lt;p&gt;$par\<\/p\>$pos"; } } close(IN); close(OUT);
        Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post

        edited by ybiC: Reformatted - balanced <code> tags around sample input and code