punkish has asked for the wisdom of the Perl Monks concerning the following question:

I have a puzzling problem --

I have a small sub-routine that takes an html page, and extracts the first n characters from it not counting the html tags. Then, it closes any tags remaining open because of the extraction.

I was using HTML::TokeParser version 2.37 and everything was working fine on my laptop with Perl 5.8.8. Well, on Dreamhost, the darn thing started causing segmentation fault.

So, I substituted TokeParser with HTML::TagParser, a much simpler, less ambitious, but pure-Perl module. Again, it works fine on my laptop, but causes segfaults on Dreamhost. Fwiw, Dreamhost is running Perl 5.8.4, which may or may not be the cause (I hope that is not the cause).

TagParser is a pure Perl module, so it should just work, as far as I understand. But, not so... What can I do to solve this?

Update: I just checked and Dreamhost does have HTML::TokeParser version 2.24 installed. Initially I was using their module, but when I got segfaults, I installed my own instance at version 2.37. Still the segfaults.

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: segmentation fault on HTML::TokeParser
by Your Mother (Archbishop) on Mar 28, 2008 at 06:13 UTC

    I think I was getting segfaults with the same family on OS X about 5 years ago. Never tracked it down but if it was a problem it'll be in old tests/lists (check by perl version?). Just curious, sounds like you're doing exactly what HTML::Truncate does so maybe give it a whirl and see if the problem persists. If it goes away it might be because your code has a weird recursion or something that your laptop can handle but the shared host can't.

    I also use DreamHost for some things and when they refused to upgrade their WWW::Mechanize when it would not even compile, I started keeping my own Perl local lib for everything. No problems since. You should open a cpan shell, run 'r' and use the list as a guide to upgrade everything that's relevant; not just the parser.

      Good grief! HTML::Truncate indeed claims to do what I am trying to reinvent. I will give it a whirl, and hopefully it won't crash and burn.

      Update: Yes, HTML::Truncate works on Dreamhost without segfaulting. Problem solved for now.

      Many thanks. Wrt Dreamhost, yes, I do have my own lib, and that is where I installed HTML::TokeParser (after finding their version very old). Nevertheless, segfaults abound.

      --

      when small people start casting long shadows, it is time to go to bed
Re: segmentation fault on HTML::TokeParser
by tachyon-II (Chaplain) on Mar 28, 2008 at 06:14 UTC

    If a pure Perl module is segfaulting as well as one dependent on the C in HTML::Parser I think you have to assume it is a perl problem as that is the only common C code. It could however be that your code runs on your laptop because it has enough resources (memory) but not on the host because you have some sort of resource limited virtual environment. Anyway the quickest way to test/fix is probably to make a brand new 5.8.8 install in your home directory (which assumes you have shell access) and then just use that

    #!/home/you/perl5.8.8/bin/perl

    I presume you are running the same data locally as on the server so you can exclude any weirdness in the data (I think this is highly unlikely to be the cause).

      Unfortunately, installing my own Perl on Dreamhost is *not* the quickest way. It is a very long way around. Why so? Because, since "the home directory on Dreamhost is not writeable" (I quote Dreamhost wiki here), the standard Perl installation in the home directory fails on cwd. Again, per the Dreamhost wiki, there is a patch for it created by Michael Schwern (see http://schwern.org/~schwern/src/dreamhost-5.8.8-cwd.patch), but that does become a very long way around for me. I will visit that option only when stumped every other way.

      Wrt data, yes, I am using the same identical data and scripts on both my laptop and the Dreamhost space.

      Thanks.

      --

      when small people start casting long shadows, it is time to go to bed

        We clearly differ in terms of our definition of Dreamhost. Nightmarehost sounds more like it.

        but that does become a very long way around for me. I will visit that option only when stumped every other way.

        How is installing a patch and installing a local perl the long way round? Should take less than 10 minutes. Anyway good luck finding your easier solution.