in reply to Re^3: unknown encoding
in thread unknown encoding

Hi Marshall

My confusion began when I looked at "perldoc perluniintro" and "perldoc perlunicode". It sounds like values > 255 get wrapped around if ascii encoding is wrongly assumed. If anyone can straighten me out, that is appreciated. Should have included that in the original post.

The repsonse from earlier led me to a webpage about various encodings. From that, I see that some data entry from the other organization may accidentally have set their encoding to "CP1252 -- WinLatin1". I happended to see "A0" which seems to only apply to that encoding.

When I get a chance, I will try out the substr and sysread approaches.

Thanks, Jim

Replies are listed 'Best First'.
Re^5: unknown encoding
by Marshall (Canon) on Oct 31, 2011 at 19:51 UTC
    Perl deals with ASCII unless otherwise specified. That means a one to one mapping of: one byte => one character.

    Perl can deal with any character encoding that I know of. But you have to tell it what character encoding is expected, UTF-8, UTF-16 or whatever.

    My comment about substr() concerned how to optimize the processing to make things run faster when dealing with typical ASCII 7 bit or even 8 bit encoding. I didn't mean to confuse. It sounds like your friend just wants an answer.

    Let's worry about how to make things run faster, if and when that is necessary. Right now, I think that is just an acedemic exercise, but I'd help you with that if you want.

    For many problems, being optimally efficient is just not necessary! The substr() idea will work closely to the way that the processing would be done in 'C' and will be faster than the split(//), but at the "expense" of more programming effort.

    Perl is a language that can solve problems quickly (in terms of coding efficiency). And it has many features that allow it to run very close to say a 'C' program in terms of performance.

    How fast is fast enough? Well that depends. I have one app that takes 4+ hours on my machine to run. I have another team member who can do it in 56 minutes. Another team member has a new machine on the way that can do it in <40 minutes. How fast is enough? Well, <one hour is "fast enough". 40 minutes vs 56 minutes won't make any difference in this app because it takes us hours+ to "get ready for the next run". 40 minutes vs 4 hours makes a difference because we could get ready for a new run in 2 hours and get two runs done in a day.

    Programming involves trade offs between how long it take you to write the code vs how long the code runs and a whole bunch of other factors. Sometimes slower code is better because it is easier to understand and maintain.

    In general, from my experience, the thing to optimize is your ability to write clear, maintainable code. Usually but not always, clear code is fast code, assuming that this "clear code" uses an efficient algorithm. Deciding upon the algorithm is the, the most important part to writing clear, fast code.

    Hope that this very long post was understandable to you.

    PS: I am working on a new version of this app and it will run in like 20 minutes on my machine (although I have only promised a x4 speed increase) - another programming trick, promise less than you think that you can do (based upon benchmarks)! I am adding a lot of features and this requires hundreds of hours of work. The complexity is x10. If I get all the new features in there and it runs within an hour on my machine, everybody is going to be happy.

      ... another programming trick, promise less than you think that you can do ...

      Programming trick or engineering trick a la Scotty from Star Trek? :)

        Great!

        Actually some managers have spreadsheets like:
        Scotty /=2;
        Dreamer *= 4;
        It is a matter of calibration.
        The best prediction of a software schedule (in my experience) is what has happened before. The very best prediction comes from a delta from the last project. When there are huge technical unknowns and a new team, roll the dice. How to predict the software effort remains a vexing problem with no certain answers.