texuser, while your code is correct and it is great that you now found a solution on your own, I do have some thoughts to share with you on your code:
- When checking whether opening of files was successful, you should provide the name and possibly the mode, i.e. instead of writing open(IN, "<temp.in") || die "\n Can't open file\n $!\n"; rather write something like open(IN, '<', "temp.in") or die "Error opening file temp.in for input: $!\n";. This helps you (and the user) fixing the error more quickly as the code grows and you possibly migrate into another environment where your read/write access rights vary.
- You shoud do this error checking for all files that you open, especially for output files, since most of the times when you have the right to write, you do have the right to read, but not neccessarily vice versa.
- You should open a file only when you need it, i.e. open the output file after the second thime you open the input file.
- Some RegEx efficiency issues: In the first RegEx [^<]+ would be faster than .+?, since the RegEx engine does not have to check the rest of the RegEx (well, the next character) before proceding to the next possible number of charachters. In the second RegEx, something similar for .*: You might want to substitute it by [^<]+, so that the RegEx engine oes not have to waste time on backtracking. But those are only speed considerations, which just matter if you have large or numerous files.
- You open and read the input file twice. Remember that disk read/write operations are much slower than the execution of a piece of program code. If possible, you should try to read the file only once.
Last thing: I tested both your and my code on a file named "
temp.in", and both worked. The input file consisted of your three input lines, the output file of your three output lines in both cases. Considering the sbove, if you would like to get the program running, as to parse your input file in one go, I'llbe happy to help you.
Oh, yes, and a ++ for you for putting together your own piece of code that works for you.
Cheers,
CombatSquirrel.