Re: Re: making a single column out of a two-column text file

Thanks for the advice and for the code. I was also thinking if I could find some way for the program to decide where the column breaks are, whether spaces or tabs like tachyon suggested (or whatever other possibilities exist), from there define that whatever as a "column separator" and go from there. I'll now try to figure out a way to make Perl decide if the mask has really found the column separation or not.

Once again in your debt...

--
Allolex

Comment on Re: Re: making a single column out of a two-column text file

Replies are listed 'Best First'.
Re: Re: Re: making a single column out of a two-column text file by dbp (Pilgrim) on Feb 27, 2003 at 05:47 UTC
I'll now try to figure out a way to make Perl decide if the mask has really found the column separation or not. This task is going to be really input specific. BrowserUK and I have both shown you ways to calculate the probability that the column break falls at a certain column (although BrowserUK's method is cleaner, more robust, and more fluent perl than my own). I don't really see how you can "check" this result in a general fashion short of applying some machine learning technique that is likely to be less reliable than the probabilistic approach. That said, knowing something about your input, such as the size of the column break, and how may breaks of that size will be found in a line (I'm thinking of the numbers that fall to the right of the rhc here) will let you apply the mask to various inputs with a high likelihood of success.	[reply]

Replies are listed 'Best First'.

Re: Re: Re: making a single column out of a two-column text file
by dbp (Pilgrim) on Feb 27, 2003 at 05:47 UTC

I'll now try to figure out a way to make Perl decide if the mask has really found the column separation or not.

This task is going to be really input specific. BrowserUK and I have both shown you ways to calculate the probability that the column break falls at a certain column (although BrowserUK's method is cleaner, more robust, and more fluent perl than my own). I don't really see how you can "check" this result in a general fashion short of applying some machine learning technique that is likely to be less reliable than the probabilistic approach. That said, knowing something about your input, such as the size of the column break, and how may breaks of that size will be found in a line (I'm thinking of the numbers that fall to the right of the rhc here) will let you apply the mask to various inputs with a high likelihood of success.

[reply]