new file per line output

monteryjack has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: new file per line output by davido (Cardinal) on Dec 31, 2013 at 18:44 UTC
There's one obvious bug, and a number of style or best practice issues. I'll list the ones I see, including the actual bug: In script-line order: `use strict;` (Absent.) `use warnings;` (Absent.) `use DateTime;` (But you never use it in the script) `use IO::File;` (But you never use it in the script) Your open's should be using the 3-arg version as a matter of habit. `foreach $line (<LOGFILE>) {` (Better to use a while loop; foreach will slurp the file. Also, $line is an undeclared package global, with global scope. Use a lexical (my) instead.) `($stream, $timedate) = split("\t");` (split, in the absence of a parameter supplying a string to split, will split the contents of `$_`, which isn't what you want. I'm sure you probably intended `split /\t/, $line`. Also, you should be using lexicals for $stream and $timedate. Also, is your string tab delimited, or would `\s` be more appropriate?) Have you verified that your pattern matches are successful, or just assuming they are? Are you getting an exception "file not opened", or silent failure? warnings would have warned you that the bareword `file` could conflict with a future version of Perl. Use upper-case by convention for bareword filehandles, or better, use lexical file handles. Of all of these issues, the most significant is your use of split, which is splitting $_ by default, instead of $line, as you intend. Consequently, $stream and $timedate don't contain anything useful. Fix this issue, and then put in some error checking to ensure that your pattern matches are successful. That would have helped you to catch the misuse of split bug too, because with the split bug the pattern matches can't possibly be successful either. One final note: You might be happy with a module like Text::Template, or minimally, a HERE doc, rather than a bunch of print statements with XML tags interspersed. Update: ...and you probably ought to be protecting your XML output from contamination with illegal characters. Dave	[reply] [d/l] [select]
Re^2: new file per line output by AnomalousMonk (Archbishop) on Dec 31, 2013 at 20:52 UTC
Have you verified that your pattern matches are successful, or just assuming they are? This point is particularly important given that the `m!.@([^_])-!` regex of monteryjack's OPed code requires a '-' (hyphen) to be present in the string for a match, and this is nowhere the case in the example data given in the OP.	[reply] [d/l]
Re: new file per line output by Kenosis (Priest) on Dec 31, 2013 at 19:07 UTC
You've received excellent scripting suggestions. Still, perhaps the following minor modifications of your script will be helpful: `use strict; use warnings; use DateTime; use POSIX qw(strftime); use autodie; open my $logFH, '<', 'second.txt'; while (<$logFH>) { my ( $streamname, $streamid, $timedate ) = split /[@\s]/; my $time_t = POSIX::strftime( "%Y-%m-%d %r", localtime($timedate) +); open my $fh, '>', "$streamid/$streamname"; print $fh <<END; <event> <stream-id>$streamid</stream-id> <event-name>$streamname</event-name> <primary-event> <delete-time>$time_t</delete-time> </primary-event> </event> END }` [download] Always: `use strict; use warnings;` Used a single `split` Used the three-argument open() Used a here document for printing. A `close` is absent above, since the currently-`open`ed file will automatically `close` when a new handle is assigned to `$fh` Edit: Added `use autodie;` to catch any silent `close` failures. Thank you, davido.	[reply] [d/l] [select]
Re^2: new file per line output by davido (Cardinal) on Dec 31, 2013 at 19:16 UTC
I don't mind implicit closes of input filehandles, but without the autodie pragma, the implicit close within a loop of output files could permit silent failure. Dave	[reply]
Re^3: new file per line output by Kenosis (Priest) on Dec 31, 2013 at 19:22 UTC
Thank you. Have updated the script.	[reply]
Re^2: new file per line output by AnomalousMonk (Archbishop) on Dec 31, 2013 at 20:43 UTC
`open my $fh, '>', $streamid/$streamname;` Open for writing a file that has a name that is the stringized quotient of `$streamid` divided by `$streamname`? Surely this is not what monteryjack intends!	[reply] [d/l] [select]
Re^3: new file per line output by Kenosis (Priest) on Dec 31, 2013 at 21:46 UTC
Appreciate you catching this. Accidentally removed the quotes when I updated the script. Fixed!	[reply]
Re^2: new file per line output by karlgoethebier (Abbot) on Jan 01, 2014 at 14:22 UTC
Hi Kenosis, by the way another question: How did you format the here document? By hand? One reason why i avoid using SQl/XML here documents is that it's sometimes hard to format them. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^3: new file per line output by Kenosis (Priest) on Jan 01, 2014 at 18:28 UTC
Hi Karl and Happy New Year! Yes, by hand--only for readability, since it was just a few tags.	[reply]
Re: new file per line output by jellisii2 (Hermit) on Dec 31, 2013 at 20:08 UTC
And yet no one mentions "Use a proper XML processor"... Let me rectify that: XML::Twig or any of the other wonderful modules for writing XML will be your friend when (not if) the data gets screwy.	[reply]
Re^2: new file per line output by davido (Cardinal) on Jan 01, 2014 at 05:07 UTC
I haven't used XML::Twig for generating XML, and re-skimming its POD I'm not seeing what must be obvious. Can you show me how it might be used to provide XML matching what this creates? `print $fh <<END; <event> <stream-id>$streamid</stream-id> <event-name>$streamname</event-name> <primary-event> <delete-time>$time_t</delete-time> </primary-event> </event> END` [download] It seems to me the OP has control over what he passes through to the XML output. Template systems (even as simple as a HERE doc) seem to fit the bill, but if there's an XML producer that would simplify this further, it would be interesting to see an example of how such a solution looks. Dave	[reply] [d/l]
Re^3: new file per line output by jellisii2 (Hermit) on Jan 02, 2014 at 13:16 UTC
I agree that it's simple thing to produce without using dedicated XML tools, but the character reservations that are involved with XML makes it potentially dangerous to do so. If you're willing to test, capture, and replace the characters (the minimum you should be doing is `&`, `<`, `>`, and `%`) in the data you're managing, that's fine, but the modules button all of that up for you nicely. Here's how I'd do that. It's more code, granted, but it's always valid. Given the quality of some of the other tools that I've had to use that require XML, this will give me the best shot at not having to mess with it after I have it in production. use strict; use warnings; use XML::Twig; my $stream_id = 'stream-id'; my $event_name = 'event-name'; my $time_t = 'time-t'; my $filename = 'foo.xml'; my $twig = XML::Twig->new(pretty_print => 'record'); $twig->parse('<event/>'); my $root = $twig->root(); $root->insert_new_elt('stream-id' => $stream_id); my $event_tag = $root->insert_new_elt('event-name' => $event_name); my $primary_event_tag = $event_tag->insert_new_elt('primary-event'); $primary_event_tag->insert_new_elt('delete-time' => $time_t); open(my $FH, '>', $filename); $twig->flush(\*$FH); close $FH; [download]	[reply] [d/l] [select]
Re^4: new file per line output by davido (Cardinal) on Jan 02, 2014 at 16:27 UTC
Re: new file per line output by Laurent_R (Canon) on Dec 31, 2013 at 18:45 UTC
Instead of `foreach $line (<LOGFILE>)` [download] try: `while (my $line = <LOGFILE>)` [download] There are many other problems in your script, but that's probably why you're not looping as you want on the file lines. EDIT: davido said it all and still succeeded to post one minute before me. ;-)	[reply] [d/l] [select]
Re^2: new file per line output by davido (Cardinal) on Dec 31, 2013 at 18:51 UTC
The only impact that change has is to stop the "behind the scenes" slurping of the input file. But unless Perl itself is broken, both will iterate over the file just fine. EDIT: I probably got a head start. ;) Dave	[reply]
Re^3: new file per line output by Laurent_R (Canon) on Dec 31, 2013 at 21:55 UTC
Well, since I am working daily with files that have sizes in the dozens of GB ranges, it does make a real difference to me. Slurping a file into memory is usually not an option for me. But, granted, I might have overreacted.	[reply]