adobepro has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I opened a file and placed it into an array - now before I print to the browser the contents of the array, I want to find two specific lines and remove them.

How I want it to be setup is like so:
1.) Print to the browser the xml version, dtd + xsl location
2.) Read in the xml file and store it in an array.
3.) Remove the xml file version, dtd + xsl location in the actual xml file read in from step 2.
4.) Print out to the browser the xml file data ONLY (minusing the xml version/xsl/dtd location that was removed on step 3)


That's it. The reason being is that becuase the xml data files I'm recieving have the dtd/xsl location different than were it's actually going to be - so I need to remove that from the xml data file.

The other problem, is that we have 4 reports type, but right now I'm just testing for just one type. Is the a way to use the regex to start from particular characters and end at a particular characters with a wildcard in the middle? I want to find the string and replace with no value. e.g.:

@lines =~ s/<!DOCTYPE XMLDATA SYSTEM \"artist_royalty.dtd\">//; #????


It could be a artist_royalty.dtd or license_royalty.dtd or mechanical_royalty.dtd - so I'd like to do the wild card before the report type and end off with --> .dtd\"> for the replace search.

Here's my code, everything works great EXCEPT for the Regex on the array which has the xml file contents:

print "<?xml version=\"1.0\"?>\n";

# The conditional statement below checks to see what report type the user is requesting
# Then prints out the dtd + xsl for that report type.


if ($doctype eq "m") { print "<!DOCTYPE XMLDATA SYSTEM \"http://www.mydotcom.com/DRM/clients/ +XML_FORM/XML_DTD/mechanical_royalty_rightrax.dtd\">\n"; print "<?xml-stylesheet type=\"text/xsl\" href=\"http://www.mydotcom.c +om/DRM/clients/XML_FORM/XSL_templates/M_template_rightrax.xsl\"?>\n"; } elsif ($doctype eq "a") { print "<!DOCTYPE XMLDATA SYSTEM \"http://www.mydotcom.com/DRM/clients/ +XML_FORM/XML_DTD/artist_royalty_rightrax.dtd\">\n"; print "<?xml-stylesheet type=\"text/xsl\" href=\"http://www.mydotcom.c +om/DRM/clients/XML_FORM/XSL_templates/A_template_rightrax.xsl\"?>\n"; else { print "<!DOCTYPE XMLDATA SYSTEM \"http://www.mydotcom.com/DRM/clients/ +XML_FORM/XML_DTD/license_royalty_rightrax.dtd\">\n"; print "<?xml-stylesheet type=\"text/xsl\" href=\"http://www.mydotcom.c +om/DRM/clients/XML_FORM/XSL_templates/L_template_rightrax.xsl\"?>\n"; # The snippet below opens then reads the XML data file from the server +<br> # and prints it to the browser - so that they can view it - and not kn +ow<br> # where the doc originates from. open (XMLFILE, "/opt/www/docs/$document") || &open_error("http://$serv +er$document"); @lines = <XMLFILE>; close (XMLFILE); @lines =~ s/<?xml version=\"1.0\"?>//; #???????????? @lines =~ s/<?xml-stylesheet type=\"text/xsl\" href=\"A_template.xsl\" +?>//; #???????????? @lines =~ s/<!DOCTYPE XMLDATA SYSTEM \"artist_royalty.dtd\">; #??????? +????? foreach $line (@lines) { #chop $line; print "$line"; }

Replies are listed 'Best First'.
Re: Regex on a array
by Coyote (Deacon) on Jan 29, 2001 at 10:18 UTC
    Seriously consider using one of the XML modules (XML::Twig is my personal fav) to do this rather than using a regex.

    Use the strict pragma and the -w flag, move your regexen to the foreach loop, and be sure to escape any metacharacters in your regular expressions.

    ----
    Coyote

Re: Regex on a array
by jeroenes (Priest) on Jan 29, 2001 at 11:36 UTC
    In general, a regex on an array is best done using grep. Makes your code look better. If you keep your newlines intact, you can do the following. Escape the dot (perlre)!
    print grep{! /<!DOCTYPE XMLDATA SYSTEM \"artist_royalty\.dtd\">/} @lin +es;
    This will do most of your work.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

    Update $code or die pointed me to stupid omission. Added that damn exclamation mark.

      I think that does the opposite.

      Could be wrong, but don't you need:
      print grep{!/<!DOCTYPE XMLDATA SYSTEM \"artist_royalty\.dtd\">/} @line +s;

      $code or die
      Using perl at
      The Spiders Web
Re: Regex on a array
by adobepro (Initiate) on Jan 29, 2001 at 10:18 UTC
    I fixed it!!!Thanks to Chromtic!!
    I looked at his reply to the removal of html tags reply and did this and it works like a charm! THANKS Chrom!!
    foreach $line (@lines) { #chop $line; $line =~ s/<\?xml\ version=\"1.0\"\?>//igs; next regex - etc... next regex - etc... print "$line"; }