Generally using regexen for parsing markup is tricky and best left to modules such as HTML::TreeParser. If you can absolutely predict what the markup will be then you may get away with the following:
use warnings; use strict; my $str = <<TXT; Nerve cells come in many shapes and sizes, but they all have a number +of identifiable parts. A typical nerve cell is shown in <figr n="1">Figur +e 1</figr>. Like all other cells in the body, it has a nucleus that cont +ains genetic information.<figr n="2">Figure 2</figr>. The cell is covered b +y a membrane and is filled with a fluid. TXT $str =~ s/ <figr\sn="(\d+?)"> # tag including figure number (.*?) # element contents <\/figr> # close tag / "<FIGIND NUM=\"$1\" ID=\"FG." . # replacement tag sprintf("%03d", $1) . # padded number "\">$2<\/FIGIND>" # remainder of element /gmsxe; # Global, multi-line, ignore newline, ignore whitespac +e, evaluate print $str;
Prints:
Nerve cells come in many shapes and sizes, but they all have a number +of identifiable parts. A typical nerve cell is shown in <FIGIND NUM="1" I +D="FG.001">Figure 1</FIGIND>. Like all other cells in the body, it has a nucleus that co +ntains genetic information.<FIGIND NUM="2" ID="FG.002">Figure 2</FIGIND>. The + cell is covered by a membrane and is filled with a fluid.
In reply to Re: Matching a pattern in Regex
by GrandFather
in thread Matching a pattern in Regex
by rsriram
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |