in reply to Re: Parsing issue (null bytes?)
in thread Parsing issue (null bytes?)

UTF-16:Unrecognised BOM 5b64 at ./test line 11. UTF-32 fails as well.

For figuring out what encoding a file is, see my comments in Re: Converting UTF8 to ANSI and the replies (except for the stuff about File::BOM, that doesn't seem to apply here).

Update:

A hex dump revealed that there are definitely chars encoded above 0x7f

Can you show us?

Replies are listed 'Best First'.
Re^3: Parsing issue (null bytes?)
by Anonymous Monk on Sep 08, 2017 at 18:03 UTC

    Thanks haukex. It's guessing ascii. I can't really show much of the log without changing the data, but here is a bit:

    0000000: 5b64 6566 6175 6c74 2074 6173 6b2d 3335 [default task-35 0000010: 5d20 2049 4e46 4f20 7c20 3230 3137 2d30 ] INFO | 2017-0 0000020: 392d 3035 2031 313a 3233 3a33 352c 3931 9-05 11:23:35,91

    Hopefully I'm understanding it correctly.

      Yes, that does look like plain ASCII* (please use <code> tags next time, and consider registering an account so you can edit posts).

      So the problem is less likely to be on the input end, but you also haven't shown us any code. I think we need to take a step back here - please see Short, Self-Contained, Correct Example and post the shortest piece of code possible that still reproduces the issue, some short sample input data, the output data corresponding to that input which shows the problem (a hex dump might be best to see the null bytes, or at least the output of Data::Dump), and the output that you actually want to get for that input. More advice in How do I post a question effectively? and I know what I mean. Why don't you?

      * Update: I missed it earlier, but the AM post is correct that if you have bytes above 0x7f, then it's not plain ASCII. If Encode::Guess is guessing ASCII, and you used my example code, then probably the buffer size was not big enough. When I said "Can you show us?", I actually meant that part of the file which contains bytes above 0x7f.

      A hex dump revealed that there are definitely chars encoded above 0x7f

      It's guessing ascii.

      both can't be true, so which is it?