Perl/XML

falco has asked for the wisdom of the Perl Monks concerning the following question:

Hello - Below you'll find a Perl script and XML code. The .pl file reads from the .xml file, then turns the XML info into an HMTL document. The script works, but it works too well. I'm trying to extract the information from between the
<! [ CDATA ] > tags. But instead it retrieves everything (the <![CDATA ]> and the info within it). How can I have the script pull just the info from within the brackets?

XML:

<br>
- <Feed>
- <artikle>
  <ItemNum /> 
- <tittel> <![CDATA[ Test tittel  ]]>   </tittel>
- <ingress> <![CDATA[ test ingres  ]]>   </ingress>
- <url> <![CDATA[ http://www.test.com/test  ]]>   </url>
  </artikle>
  </Feed>
  </TestData>
[download]

PERL:

#!/usr/bin/perl

# This script retrives an xml formatted news feed document and
# generates a HTML file based on a tamplate file
#
# Format of the input should be:
# <article>Some article
# <title>Some Title</title>
# <ingress>Some text</ingress>
# <url>Some url</url>
# </article>
# 
# Modify vars in the config part to match your environment
#
# asbjorn@linux-directory.com, Oct. 2000
#
###### CONFIG START

# Input format
$article_del = "artikle";
$title_del = "tittle";
$url_del = "url";
$ing_del = "ingress";
# Input format end

$template = "../../xml/test/test.txt";  ### Template HTML file.
$html = "../../xml/test/test.html";  ### Name & path of HTML file to b
+e generated
$weburl = "http://www.test.com/xml/test.xml"; ### URL of document to b
+e retrived
$maxarticles = 1; ### Max no. of articles to be buildt. Must match tem
+plate file
$cgi = 1;         ### Set if the script is to be run from a browser
###### CONFIG END



#### MODULES USED
use HTTP::Request;
use LWP::UserAgent;
use CGI;
#### MODULES END

$q=new CGI;
print $q->header() if ( $cgi );
print $q->h3("$ver") if ( $cgi );
print "Henter $weburl";
print "<br>" if ( $cgi );

$ua = LWP::UserAgent->new;
$request = HTTP::Request->new(GET => "$weburl");
$response = $ua->request($request);

%hash = %{$response};

$content = $hash{_content};
$msg = $hash{_msg};


if ( $msg ne "OK" ) {
  print "Could not contact web server!\n";
  exit 2;
}
print "Fetched $weburl\n";
print "<br>" if ( $cgi );

open(IN,"$template") || die "Failed to open $template: $!\n";
open(OUT,">$template.tmp") || die "Failed to write to $template.txt: $
+!\n";
while(<IN>) {
  print OUT;
}
close(OUT);
close(IN);

print "Generating HTML file:\n";
print "<br>" if ( $cgi );

@art = split m#(<$article_del>|</$article_del>)#, $content;
shift @art;
pop @art;
foreach $art ( @art ) {
  ($title) = $art =~ m#<$title_del>(.*)</$title_del>#;
  ($ingress) = $art =~ m#<$ing_del>(.*)</$ing_del>#;
  ($url) = $art =~ m#<$url_del>(.*)</$url_del>#;
  ($url) = $url =~ m#<a href="(.*)">#;
  $i++ if $title;
  last if ( $i > $maxarticles );
  &writefile($i,"title",$title) if ( $title );
  &writefile($i,"ingress",$ingress) if ( $ingress );
  &writefile($i,"url",$url) if ( $url );
}
rename("$template.tmp","$html") || die "Failed to write $html: $!\n";
print "  ----:  DONE  :----";

sub writefile {
  my $num = shift;
  my $type = shift;
  my $text = shift;
 
  $st = "URL_$num" if ( $type eq "url" );
  $st = "TITTEL_$num" if ( $type eq "title" );
  $st = "INGRESS_$num" if ( $type eq "ingress" );
  open(IN,"$template.tmp") || die "Failed to open $template: $!\n";
  @in = <IN>;
  close (IN);

  open(OUT,">$template.tmp") || die "Could not write to $html: $!\n";
  foreach ( @in ) {
    s/$st/$text/g;
    print OUT;
  }
  close(OUT);
}
[download]

Edit: 2001-03-03 by neshura

Comment on Perl/XML Select or Download Code

Replies are listed 'Best First'.
Re: Perl/XML by OeufMayo (Curate) on Feb 19, 2001 at 22:51 UTC
First, I see that this post is your first at Perlmonks, so welcome here! May you find Laziness, Hubris and Impatience! Perlmonks can be very picky sometimes when it comes to badly formatted post. Yours, for example is barely readable for the other monks. You really should read the following articles Site How To and Writeup Formatting Tips to learn how to use the <CODE> tags when posting. More on topic, I see that you're trying to convert XML to HTML. I think the easiest and most reliable way to do it would be to use XML::Parser to parse XML (obviously). The regular expressions you are using are very likely to break when the input from the source XML will include something you didn't expect or did not handle the datas as it is the case with your present script. Here's a couple of nodes that might interest you: On XML parsing Processing XML with Perl Cheers, OeufMayo <kbd>-- PerlMonger::Paris(http => 'paris.pm.org');</kbd>	[reply]
Re: Perl/XML by Trinary (Pilgrim) on Feb 19, 2001 at 22:48 UTC
Auuuugh! =) Read this, please. Your post is totally impossible to read as it stands. =b Trinary	[reply]