in reply to Charset tornment

I don't see cedilles or circonflexes in your list.. presumably ibid and opcit don't foul you up.

Why not choose "word" characters and non-word characters? I may have missed something here like punctuation being included in the class?

Replies are listed 'Best First'.
Re: Re: Charset tornment
by Mysjkin (Initiate) on Jan 28, 2003 at 07:08 UTC
    mattr wrote: >Why not choose "word" characters and non-word characters?
    That is exactly what I would like to do, but as I have things set up at the moment, \w=A-Za-z0-9_\. The reason I do not have any cedilles or circonflexes are that they so far not have turned up in the names I am parsing...
      Ah. I can tell you I got the same errors (actually they kept printing until I ran out of memory which wasn't fun) compiling Jcode.pm on Perl 5.8 (since 5.8 has jcode.pl which to me is obsolete, I use object oriented module instead).

      I *think* the latest version of the module now works, so you could see what the difference between them is. Jcode.pm has an English man page so there should be no trouble there, just make the tgz don't bother installing it.

      Caveat- I have a hazy memory so try to compile both maybe, anyway it's really not a big module at all.

      Also I can say that in the past I have used ShiftJIS::Regexp to do regexes on Japanese (not using Unicode, just SJIS-encoded Japanese which is 8-bit strings). Possibly the re or match functions might actually work for you as-is.

      Like someone said, check your locale and maybe run some tests. For the record when I got the same errors as you I was doing a vanilla install of RedHat 8.0 with their Perl from RPM and I was also tearing things apart to wedge it into a very small old hard disk. So maybe we had the same problem..