First of all. I am starting in the world of Perl and I am completely in love with this language.
I am working in a simple script that reads a few rss addresses from a text file and then retrieves the items from those addresses. Most of the feeds provides items correctly but some throws warnings about wide characters and spits weird characters between the words.
Investigating about the issue I found that these warnings are produced usually by problems in the encoding. The RSS are XML encoded with utf-8. So I decided to use the Encode module with the decode methods. This was not a solution because the RAI module performs the decoding internally.
Here is the code:
#!/usr/bin/perl use warnings; use strict; use HTTP::Request; use LWP::UserAgent; use XML::RAI; use Data::Dumper; use Encode qw( decode ); my $ua = LWP::UserAgent->new; open IN, "in.txt"; open OUT, ">out.txt"; while (<IN>) { chomp; my $request = HTTP::Request->new( GET => $_ ); print "Requesting...\n"; my $response = $ua->request( $request ); print " Status: ", $response->status_line, "\n"; print " Last modified: ", $response->header( 'last-modified' ), "\n"; print " Etag: ", $response->header( 'etag' ), "\n\n"; my $response_content = decode('UTF-8',$response->content); my $rai = XML::RAI->parse_string( $response->content ); my $channel = $rai->channel; print "Channel:\n"; print " Title: " . $channel->title . "\n"; print " Link: " . $channel->link . "\n"; print " Modified: " . $channel->modified . "\n"; print " Publisher: " . $channel->publisher . "\n"; for ( @{$rai->items} ) { #my $descriptiond = decode( 'UTF-8', $_->description ); print OUT "Item:\n"; print OUT " Title: " . $_->title . "\n"; print OUT " Link: " . $_->link . "\n"; print OUT " Description: " . $_->description . "\n"; print OUT " Created: " . $_->created . "\n"; print OUT "------------------------------------------------\n\n"; } $request->header( 'If-Modified-Since', $response->header( 'last-modifi +ed' ) ); $request->header( 'If-None-Match', $response->header( 'etag' ) ); }
The feed address that throws items with problems are:
http://estaticos.elmundo.es/elmundo/rss/portada.xml http://www.publico.es/rss/
What do I call weird characters?
Can you give a hint to solve the problem? Many thanks in advance.Title: Las pelĂculas que fuimos... y que seguiremos siendo
In reply to Weird behavior with RAI and RSS. by Falstaff
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |