in reply to Re: Removing everything before the first comma separator on each line of a text file
in thread Removing everything before the first comma separator on each line of a text file

Not that his example has empty first fields, but your regexes will keep the comma of leading empty fields, which IMHO is not correct.


Enjoy, Have FUN! H.Merijn
  • Comment on Re^2: Removing everything before the first comma separator on each line of a text file

Replies are listed 'Best First'.
Re^3: Removing everything before the first comma separator on each line of a text file
by Laurent_R (Canon) on Sep 16, 2014 at 17:45 UTC
    Well, yes, you are right, this would happen, but we don't have such lines in the data sample. A rule of thumb for data munging is to know the data properly, which we cannot do when we are just presented a short sample on a forum post. There could be many other irregularities in the input data,which would lead to other regexes or other methods, we just don't know.

      You use other rules of thumb than I do, obviously.

      Of course it is (very) important to know your input data before/when you start writing code to read/parse it, but on the other hand, I propagate "defensive programming". The OT never stated that the data example is complete. That implies that you might be right today, but you might be wrong tomorrow.

      I usually follow my rule of thumb: implements the minimal requirements but expect the worst. What you want to do when something is not strictly matching the original example is completely up to the author of the code, but my experience in data munging over the past 25 years is that the data format WILL change.


      Enjoy, Have FUN! H.Merijn
        It seems that we have more or less the same experience (except that I have been doing intensive data mining for only 17 or 18 years), and I think we really agree on the background. It is just that simple solution offered to a specific problem on a Web forum is usually quite different from an actual production program based on a thorough knowledge of the input data.