in reply to Re: replacing text in specific tags
in thread replacing text in specific tags

Hi, here is a better example, explaining what i want. INPUT

This is to test<X-REF REFID="FN001">1</X-REF> some more text. <FTN ID="FN001">Example111</FTN>

This is to test<X-REF REFID="FN002">2</X-REF> some more text. <FTN ID="FN002">Example222</FTN>

This is to test<X-REF REFID="FN003">3</X-REF> some more text. <FTN ID="FN003">Example333</FTN>

EXPECTED OUTPUT

This is to test<X-REF REFID="FN001">1<FTN>Example111</FTN></X-REF> some more text.

This is to test<X-REF REFID="FN002">2<FTN>Example222</FTN></X-REF> some more text.

This is to test<X-REF REFID="FN003">3<FTN>Example333</FTN></X-REF> some more text.

This is the full perl script now i am using

open(IN, "<temp.in") || die "\n Can't open file\n $!\n";

open(OUT,">temp.out");

## Search for FTN

my %values;

while(<IN>) {

if(s/\<FTN ID\=\"FN(\d{3})\"\>(.+?)\<\/FTN\>/\<FTN$1\>/) {

$values{$1} = $2;

}

}

## replace X-REF

while(<IN>) {

s/(<X-REF REFID="FN(\d{3})">)(.+?)(<\/XREF>)/$1$values{$2}$3/;

print OUT;

}

close(IN);

close(OUT);

But i don't get any output in temp.out

In my input file X-REF occurs first and FTN occurs next. So my script is not working properly

Replies are listed 'Best First'.
Re: with better example
by CombatSquirrel (Hermit) on Aug 21, 2003 at 12:09 UTC
    Abigail-II is completely right about wasting time of others by not supplying the real example in the first place. Secondly, please use <code>-tags; they make code and input/output much more readable.
    Anyways, the following should work for you:
    #!perl use strict; use warnings; open(IN, "<temp.in") or die "Can't open file temp.in for input: $!\n" +; open(OUT,">temp.out") or die "Can't open file temp.out for output: $!\ +n"; for (<IN>) { s{<X-REF REFID="FN(\d+)">([^<]*)</X-REF>([^<]*)<FTN ID="FN\1">([^<] ++)</FTN>} {<X-REF REFID="FN$1">$2<FTN>$4</FTN></X-REF>$3}; print OUT; }
      Hi, Sorry for not providing a proper example at first time. Thank you very much for your kind support. I receive compilation errors while using your code, since i am a new user, i really don't know how to debug. but the following code works fine and i get my required output. just share with you i have pasted the code i am using now.
      open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; open(OUT,">temp.out"); ## Search for FTN my %values; # read through the input fileto get the values while(<IN>) { if(/<FTN ID="FN(\d{3})">(.+?)<\/FTN>/) { #no need to quote <, >, + ", = $values{$1} = $2; } } close(IN); open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; ## replace X-REF while(<IN> ) { s/<FTN.+?<\/FTN>//; # get rid of existing <FTN> tags - we alrea +dy have the values from them s/(<X-REF REFID="FN(\d{3})">.*)(<\/X-REF>)/$1<FTN>$values{$2}<\ +/FTN>$3/; # alter the X-REF LINE print OUT; } close(IN); close(OUT);
      Once again thanks for your support.

      edited by ybiC: Replace opening <pre> tag with <code> as per Monastery convention *and* add closing tag for same

        texuser, while your code is correct and it is great that you now found a solution on your own, I do have some thoughts to share with you on your code:
        • When checking whether opening of files was successful, you should provide the name and possibly the mode, i.e. instead of writing open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; rather write something like open(IN, '<', "temp.in") or die "Error opening file temp.in for input: $!\n";. This helps you (and the user) fixing the error more quickly as the code grows and you possibly migrate into another environment where your read/write access rights vary.
        • You shoud do this error checking for all files that you open, especially for output files, since most of the times when you have the right to write, you do have the right to read, but not neccessarily vice versa.
        • You should open a file only when you need it, i.e. open the output file after the second thime you open the input file.
        • Some RegEx efficiency issues: In the first RegEx [^<]+ would be faster than .+?, since the RegEx engine does not have to check the rest of the RegEx (well, the next character) before proceding to the next possible number of charachters. In the second RegEx, something similar for .*: You might want to substitute it by [^<]+, so that the RegEx engine oes not have to waste time on backtracking. But those are only speed considerations, which just matter if you have large or numerous files.
        • You open and read the input file twice. Remember that disk read/write operations are much slower than the execution of a piece of program code. If possible, you should try to read the file only once.
        Last thing: I tested both your and my code on a file named "temp.in", and both worked. The input file consisted of your three input lines, the output file of your three output lines in both cases. Considering the sbove, if you would like to get the program running, as to parse your input file in one go, I'llbe happy to help you.
        Oh, yes, and a ++ for you for putting together your own piece of code that works for you.
        Cheers, CombatSquirrel.
Re: with better example
by Abigail-II (Bishop) on Aug 21, 2003 at 11:12 UTC
    You know, it would really, really help if you post your real problem the first time around. Now I've wasted time answering a question that isn't your problem at all.

    I'm not going to waste more time on you. One last hint: if you have exhausted your input in your first while loop, there isn't more input to read in the second loop.

    Goodbye.

    Abigail