The following script causes a segmentation fault on redhat but is fine on OSX (see perl -V's below). I'm not sure where to start looking, so any advice much appreciated.
I know when you install HTML:::Entities you are asked if you want it to encode unicode characters. I have no idea if thats relevant, but while I know it was compiled with that support on the OSX box, I don't know if it was on the redhat box.
#!/usr/bin/perl use strict; use warnings; use XML::RAI; use HTML::Entities; my $xml = do { local $/; <DATA> }; my $r = XML::RAI->parse( $xml ); foreach ( @{$r->items} ) { my $t = $_->title; print "$t\n"; $t = decode_entities($t); print "$t\n"; $t = encode_entities($t); print "$t\n"; } __DATA__ <?xml version="1.0" ?> <rss version="0.91"> <channel> <title>Smartmoney.com - Consumer Action</title> <link>http://www.smartmoney.com/consumer/?nav=RSS091</ +link> <description>Investing, Saving and Personal Finance</d +escription> <language>en-us</language> <copyright>Copyright 2004 Smartmoney.com, joint ventur +e of Dow Jones & Co. and Hearst Communications, Inc.</copyright> <item> <title>The Modern R&eacute;sum&eacute; +</title> <link>http://www.smartmoney.com/consumer/index +.cfm?story=20040505&nav=RSS091</link> <description>R&eacute;sum&eacute;s tha +t worked even a few years ago aren't effective today. Here are f +ive essential updates.</description> </item> </channel> </rss>
#### output
[jollyr@devbox jollyr]$ ./test.pl The Modern Résumé Wide character in print at ./test.pl line 16, <DATA> line 1. The Modern R?sum? Malformed UTF-8 character (unexpected end of string) at /usr/lib/perl5 +/site_perl/5.8.3/i386-linux-thread-multi/HTML/Entities.pm line 435, < +DATA> line 1. Malformed UTF-8 character (unexpected non-continuation byte 0x73, imme +diately after start byte 0xe9) in substitution iterator at /usr/lib/p +erl5/site_perl/5.8.3/i386-linux-thread-multi/HTML/Entities.pm line 43 +5, <DATA> line 1. Segmentation fault
#### broken on this
> perl -V Summary of my perl5 (revision 5.0 version 8 subversion 3) configuratio +n: Platform: osname=linux, osvers=2.4.21-9.elsmp, archname=i386-linux-thread-mu +lti uname='linux bugs.devel.redhat.com 2.4.21-9.elsmp #1 smp thu jan 8 + 17:08:56 est 2004 i686 i686 i386 gnulinux ' config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 - +Dversion=5 .8.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_b +y=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendo +rprefix=/u sr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselarg +efiles -Dd _dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslo +g -Dman3ex t=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -D +pager=/usr /bin/less -isr -Dinc_version_list=5.8.2 5.8.1 5.8.0' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=de fine useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS + -DDEBUGGI NG -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FI +LE_OFFSET_ BITS=64 -I/usr/include/gdbm', optimize='-O2 -g -pipe -march=i386 -mcpu=i686', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGI +NG -fno-st rict-aliasing -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.3.2 20031218 (Red Hat Linux 3.3.2-5)', + gccosandv ers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize =8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.s +o gnulibc_version='2.3.2'
#### fine on this
11 ~>perl -V Summary of my perl5 (revision 5 version 8 subversion 4) configuration: Platform: osname=darwin, osvers=7.3.0, archname=darwin-2level uname='darwin noras-computer.local 7.3.0 darwin kernel version 7.3 +.0: fri mar 5 14:22:55 pst 2004; root:xnuxnu-517.3.15.obj~4release_pp +c power macintosh powerpc ' config_args='-des -Dprefix=/opt/local -Dccflags=-I'/opt/local/incl +ude' -Dldflags=-L/opt/local/lib -Dvendorprefix=/opt/local' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-I/opt/local/include -pipe -fno-common -DPERL_D +ARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include -I/op +t/local/include', optimize='-Os', cppflags='-no-cpp-precomp -I/opt/local/include -pipe -fno-common - +DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/includ +e -I/opt/local/include' ccversion='', gccversion='3.3 20030304 (Apple Computer, Inc. build + 1495)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='-L/opt/local/ +lib -L/usr/local/lib' libpth=/usr/local/lib /opt/local/lib /usr/lib libs=-lgdbm -ldbm -ldl -lm -lc perllibs=-ldl -lm -lc libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libp +erl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dyld.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle -undefined dyn +amic_lookup -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: USE_LARGE_FILES Built under darwin Compiled at Jun 24 2004 19:12:14 %ENV: PERL5LIB="/opt/local/lib/perl5/site_perl/5.8.2/" @INC: /opt/local/lib/perl5/site_perl/5.8.2/ /opt/local/lib/perl5/5.8.4/darwin-2level /opt/local/lib/perl5/5.8.4 /opt/local/lib/perl5/site_perl/5.8.4/darwin-2level /opt/local/lib/perl5/site_perl/5.8.4 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl/5.8.4/darwin-2level /opt/local/lib/perl5/vendor_perl/5.8.4 /opt/local/lib/perl5/vendor_perl .
thanks, qq
(cross posted to perl-unicode, but no answers yet)
In reply to HTML Entities segfault - malformed utf8 by qq
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |