monteryjack has asked for the wisdom of the Perl Monks concerning the following question:

I am needing to create a new file from the following input file:

DB20121001@3575 1349124143.382
DB20120928@1234 1348865280.066
DB20120927@1234 1348778775.057
DB20120905@2341 1346877693.517
DB20120911@3575 1347396457.566
....


the output file for the first line should create a file (/3575/DB20121001) with the following txt.

<event> <stream-id>3575</stream-id> <event-name>DB20121001</event-name> <primary-event> <delete-time>Mon Oct 1 8:42:23 PM</delete-time> </primary-event> </event>

Here is what I am stuck:
#!/usr/bin/perl use DateTime; use IO::File; use POSIX qw(strftime); $LOGFILE = "second.txt"; open(LOGFILE) or die("Could not open log file."); foreach $line (<LOGFILE>) { chomp($line); # remove the newline from $line. ($stream, $timedate) = split("\t"); my $time_t = POSIX::strftime( "%Y-%m-%d %r", localtime($timedate) ); my ($streamid) = $stream =~ m!.*@([^_]*)-!; my ($streamname) = $stream =~ /(.*)?\@/; open (file, ">$streamid/$streamname") || die "file not opened"; print file "<event>\n"; print file "<stream-id>$streamid</stream-id>\n"; print file "<event-name>$streamname</event-name>\n"; print file "<primary-event>\n"; print file "<delete-time>$time_t</delete-time>\n"; print file "</primary-event>\n"; print file "</event>\n"; close (file); } close (LOGFILE);
I am stuck why this is not looping through and creating these files.
I have been successful in creating a single file with _all_ the output, but I need to get a file per line.
thanks.

Replies are listed 'Best First'.
Re: new file per line output
by davido (Cardinal) on Dec 31, 2013 at 18:44 UTC

    There's one obvious bug, and a number of style or best practice issues. I'll list the ones I see, including the actual bug:

    In script-line order:

    • use strict; (Absent.)
    • use warnings; (Absent.)
    • use DateTime; (But you never use it in the script)
    • use IO::File; (But you never use it in the script)
    • Your open's should be using the 3-arg version as a matter of habit.
    • foreach $line (<LOGFILE>) { (Better to use a while loop; foreach will slurp the file. Also, $line is an undeclared package global, with global scope. Use a lexical (my) instead.)
    • ($stream, $timedate) = split("\t"); (split, in the absence of a parameter supplying a string to split, will split the contents of $_, which isn't what you want. I'm sure you probably intended split /\t/, $line. Also, you should be using lexicals for $stream and $timedate. Also, is your string tab delimited, or would \s be more appropriate?)
    • Have you verified that your pattern matches are successful, or just assuming they are?
    • Are you getting an exception "file not opened", or silent failure?
    • warnings would have warned you that the bareword file could conflict with a future version of Perl. Use upper-case by convention for bareword filehandles, or better, use lexical file handles.

    Of all of these issues, the most significant is your use of split, which is splitting $_ by default, instead of $line, as you intend. Consequently, $stream and $timedate don't contain anything useful. Fix this issue, and then put in some error checking to ensure that your pattern matches are successful. That would have helped you to catch the misuse of split bug too, because with the split bug the pattern matches can't possibly be successful either.

    One final note: You might be happy with a module like Text::Template, or minimally, a HERE doc, rather than a bunch of print statements with XML tags interspersed. Update: ...and you probably ought to be protecting your XML output from contamination with illegal characters.


    Dave

      • Have you verified that your pattern matches are successful, or just assuming they are?

      This point is particularly important given that the   m!.*@([^_]*)-! regex of monteryjack's OPed code requires a '-' (hyphen) to be present in the string for a match, and this is nowhere the case in the example data given in the OP.

Re: new file per line output
by Kenosis (Priest) on Dec 31, 2013 at 19:07 UTC

    You've received excellent scripting suggestions. Still, perhaps the following minor modifications of your script will be helpful:

    use strict; use warnings; use DateTime; use POSIX qw(strftime); use autodie; open my $logFH, '<', 'second.txt'; while (<$logFH>) { my ( $streamname, $streamid, $timedate ) = split /[@\s]/; my $time_t = POSIX::strftime( "%Y-%m-%d %r", localtime($timedate) +); open my $fh, '>', "$streamid/$streamname"; print $fh <<END; <event> <stream-id>$streamid</stream-id> <event-name>$streamname</event-name> <primary-event> <delete-time>$time_t</delete-time> </primary-event> </event> END }
    • Always: use strict; use warnings;
    • Used a single split
    • Used the three-argument open()
    • Used a here document for printing.
    • A close is absent above, since the currently-opened file will automatically close when a new handle is assigned to $fh

    Edit: Added use autodie; to catch any silent close failures. Thank you, davido.

      I don't mind implicit closes of input filehandles, but without the autodie pragma, the implicit close within a loop of output files could permit silent failure.


      Dave

        Thank you. Have updated the script.

      open my $fh, '>', $streamid/$streamname;

      Open for writing a file that has a name that is the stringized quotient of $streamid divided by $streamname? Surely this is not what monteryjack intends!

        Appreciate you catching this. Accidentally removed the quotes when I updated the script. Fixed!

      Hi Kenosis, by the way another question: How did you format the here document? By hand?

      One reason why i avoid using SQl/XML here documents is that it's sometimes hard to format them.

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Hi Karl and Happy New Year!

        Yes, by hand--only for readability, since it was just a few tags.

Re: new file per line output
by jellisii2 (Hermit) on Dec 31, 2013 at 20:08 UTC
    And yet no one mentions "Use a proper XML processor"... Let me rectify that: XML::Twig or any of the other wonderful modules for writing XML will be your friend when (not if) the data gets screwy.

      I haven't used XML::Twig for generating XML, and re-skimming its POD I'm not seeing what must be obvious. Can you show me how it might be used to provide XML matching what this creates?

      print $fh <<END; <event> <stream-id>$streamid</stream-id> <event-name>$streamname</event-name> <primary-event> <delete-time>$time_t</delete-time> </primary-event> </event> END

      It seems to me the OP has control over what he passes through to the XML output. Template systems (even as simple as a HERE doc) seem to fit the bill, but if there's an XML producer that would simplify this further, it would be interesting to see an example of how such a solution looks.


      Dave

        I agree that it's simple thing to produce without using dedicated XML tools, but the character reservations that are involved with XML makes it potentially dangerous to do so. If you're willing to test, capture, and replace the characters (the minimum you should be doing is &, <, >, and %) in the data you're managing, that's fine, but the modules button all of that up for you nicely.

        Here's how I'd do that. It's more code, granted, but it's always valid. Given the quality of some of the other tools that I've had to use that require XML, this will give me the best shot at not having to mess with it after I have it in production.

        use strict; use warnings; use XML::Twig; my $stream_id = 'stream-id'; my $event_name = 'event-name'; my $time_t = 'time-t'; my $filename = 'foo.xml'; my $twig = XML::Twig->new(pretty_print => 'record'); $twig->parse('<event/>'); my $root = $twig->root(); $root->insert_new_elt('stream-id' => $stream_id); my $event_tag = $root->insert_new_elt('event-name' => $event_name); my $primary_event_tag = $event_tag->insert_new_elt('primary-event'); $primary_event_tag->insert_new_elt('delete-time' => $time_t); open(my $FH, '>', $filename); $twig->flush(\*$FH); close $FH;
Re: new file per line output
by Laurent_R (Canon) on Dec 31, 2013 at 18:45 UTC
    Instead of
    foreach $line (<LOGFILE>)
    try:
    while (my $line = <LOGFILE>)

    There are many other problems in your script, but that's probably why you're not looping as you want on the file lines.

    EDIT: davido said it all and still succeeded to post one minute before me. ;-)

      The only impact that change has is to stop the "behind the scenes" slurping of the input file. But unless Perl itself is broken, both will iterate over the file just fine.

      EDIT: I probably got a head start. ;)


      Dave

        Well, since I am working daily with files that have sizes in the dozens of GB ranges, it does make a real difference to me. Slurping a file into memory is usually not an option for me. But, granted, I might have overreacted.