Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to mach an delete - per exemple this part from the data file:
<?--1-Stock3of1--> <b>This is option Stock 3 of 1</b> </?--1-Stock3of1-->

The code I have or I should say the regular expression on the code is
deleting all the parts that starts with <?--1- as well.
I please need help on the regular expression or a better way to
find the match and delete the part it found and rewrite the file again.
The code I have is almost ok, but deletes too much!!!! Thanks again!!!!

Here is a sample of the data file

<?--1-News1--> <b>This is option News 1 of 1</b> </?--1-News1--> <?--1-Weather2of1--> <b>This is option Weather 2 of 1</b> </?--1-Weather2of1--> <?--1-Stock3of1--> <b>This is option Stock 3 of 1</b> </?--1-Stock3of1--> <?--2-Second1of2--> <b>Option Second 1 of 2</b> </?--2-Second1of2--> <?--3-Third1of3--> <b>Option Third 1 of 3</b> </?--3-Third1of3-->


And here is the code:
sub del{ my $filename="test.txt"; my $template_data = "new/".$filename; undef $/; # Slurp mode open(DATA_IN, "$template_data") || print "Can't open output file1: + $template_data\n"; #binmode DATA_IN; $_ = <DATA_IN>; while(/<\?--([^-]*)-([^-]*)-->(.*?)<\/\?--([^-]*)-([^-]*)-->/sg){ $a=$1;$b=$2;$c=$3;$d=$4;$e=$5; if ($a eq $location){ if($b eq $obj_name) { $c=$page; $_=~s/<\?--([^-]*)-([^-]*)-->(.*?)<\/\? +--([^-]*)-([^-]*)-->//sg; } } $list=$list. "<?--$a-$b-->$c</?--$d-$e-->\n";} close DATA_IN; open( DATA_OUT, ">new/test.txt" ) or die + "$!\n"; print DATA_OUT "$list\n"; close DATA_OUT;


Replies are listed 'Best First'.
Re: Regular Expression Help Help
by graff (Chancellor) on Oct 23, 2002 at 01:55 UTC
    The code that you have shown does not say what values are assigned to the variables $location, $obj_name and $page, but I guess these are supposed to be, respectively, "1", "Stock3of1", and some string that is supposed to replace "\n<b>This is option Stock 3 of 1</b>\n" in your test.txt file.

    The odd thing about your code (aside from the lack of decent indentation) is what happens when these conditions are met:

    if ($a eq $location) { if($b eq $obj_name) { $c=$page; $_=~s/<\?--([^-]*)-([^-]*)-->(.*?)<\/\?--([^-]*)-([^-]*)-->/ +/sg; } }
    This erases all of the strings from $_ that are running the while() loop; as a result, once these conditions are met, there will be no more iterations of the while() loop (it has the same effect as saying "last;"). So your output file will only contain whatever has been assigned to $list up to this point (including the block of text that matched those two conditions).

    Was this intentional? If so, it would make more sense to do it this way:

    while(/<\?--([^-]*)-([^-]*)-->(.*?)<\/\?--([^-]*)-([^-]*)-->/sg) { ($a,$b,$c,$d,$e) = ($1,$2,$3,$4,$5); if ($a eq $location and $b eq $obj_name) { $c = $page; $page = undef; } $list=$list. "<?--$a-$b-->$c</?--$d-$e-->\n"; last if ( not defined( $page )); }
    This will do the same thing as your original code.

    Apart from that, it doesn't seem as though you're doing anything bad to the tags that do get written to the output.

    If your intention is to keep the entire file intact, and just replace the one piece of text contained within a particular tag, get rid of the "$_ =~ s/...//sg;"

    If you want to remove the whole tag block (open-tag, close-tag and enclosed text) when you spot a certain tag name, replace the "$c=$page; $_=~s/...//sg;" with "next;"

Re: Regular Expression Help Help
by Enlil (Parson) on Oct 22, 2002 at 23:42 UTC
    From your code it seems to be printing back what you initially matched against and only want to print stuff unless it matches a certain $obj_name and $location.

    So here is my take on the problem:

    use strict; use warnings; my $location = "1"; my $obj_name = "Stock3of1"; { local $/ = "\n\n"; while ( <DATA> ) { if ( /<\?--([^-]*)-([^-]*)-->(.*?)<\/\?--([^-]*)-([^-]*)-->/s ) { print unless ( $location eq $1 and $obj_name eq $2); } } } __DATA__ <?--1-News1--> <b>This is option News 1 of 1</b> </?--1-News1--> <?--1-Weather2of1--> <b>This is option Weather 2 of 1</b> </?--1-Weather2of1--> <?--1-Stock3of1--> <b>This is option Stock 3 of 1</b> </?--1-Stock3of1--> <?--2-Second1of2--> <b>Option Second 1 of 2</b> </?--2-Second1of2--> <?--3-Third1of3--> <b>Option Third 1 of 3</b> </?--3-Third1of3-->

    I believe this does what you wanted.

    -Enlil