Re: Distant Global Regex Challenge

What if you spin through the file once up front, making a hash. Keys are the important data. Values are an array of line numbers where that data was found.

Filter out all the hash entries with a single line number, and you'll be left with the duplicates. Take all those line numbers, and sort.

Second pass through, you can use the list of line numbers to drop the duplicates.

Comment on Re: Distant Global Regex Challenge

Replies are listed 'Best First'.
Re^2: Distant Global Regex Challenge by muppetjones (Novice) on Mar 13, 2012 at 21:38 UTC
Right -- good solution. Like I said, there are easier ways to do it, but I'm curious if there is a way to do it with a single, one pass, global regex. (I've actually implemented something similar to what you suggested just for the time being. At each line, I check for an existing altloc-atom name combo. If it exists, I skip the line, otherwise I update the hash with the new info. This gives me a single pass, though without the elegance of a regex) Thanks!	[reply]

Replies are listed 'Best First'.

Re^2: Distant Global Regex Challenge
by muppetjones (Novice) on Mar 13, 2012 at 21:38 UTC

Right -- good solution. Like I said, there are easier ways to do it, but I'm curious if there is a way to do it with a single, one pass, global regex.

(I've actually implemented something similar to what you suggested just for the time being. At each line, I check for an existing altloc-atom name combo. If it exists, I skip the line, otherwise I update the hash with the new info. This gives me a single pass, though without the elegance of a regex)

Thanks!

[reply]