bowei_99 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to parse some text and put the data into a CSV format. I thought it would be simple, but I still can't figure out what I'm missing. The text contains repeating lines, alternating between username information and last login times, like this:
Username: JANDERSON Owner: JOE ANDERSON Last Login: 14-JAN-2002 08:14 (interactive), 4-MAR-2002 17:09 (non-in +teractive) Username: PBARRETT Owner: PAUL BARRETT Last Login: 18-JAN-2006 08:14 (interactive), 24-MAR-2005 17:09 (non-i +nteractive)
... and so on. I want the output to look like
JANDERSON,JOE ANDERSON,14-JAN-2002 08:14,4-MAR-2002 17:09 PBARRETT,PAUL BARRETT,18-JAN-2006 08:14,24-MAR-2005 17:09

So I thought the following code would be a good start -

while (<IN>) { next if (/^\s*$/); chop; if (/^Username\:\s+([\w\s]+)\s+Owner\:\s+(.*)$/) { $username = $1; $desc = $2; print "$username,$desc,"; } elsif (s/^(Last Login\:\s*)//) { ($login_times{"i"}, $login_times{"ni"}) = split /,/, $ +_; print "$login_times{\"i\"}, $login_times{\"ni\"}\n"; } else { die "line invalid - $_, $!\n"; } } close (IN);

But when I run the code, it doesn't print anything for $username or $desc, even though I know it captures $1 and $2 correctly, as I put in debug statements, showing what they were.

Anybody know what I'm missing here?

Replies are listed 'Best First'.
Re: Parsing Text into CSV
by BrowserUk (Patriarch) on Jan 19, 2006 at 07:39 UTC

    This works for the sample supplied.

    #! perl -slw use strict; while( <DATA> ) { $_ .= <DATA>; print join',', m[ Username: \s+ (\S+) \s+ Owner: \s+ ([^\n]+ ) \n Last\sLogin: \s+ (\S+\s\S+) [^,]+, \s+ (\S+\s\S+) ]x; } __DATA__ Username: JANDERSON Owner: JOE ANDERSON Last Login: 14-JAN-2002 08:14 (interactive), 4-MAR-2002 17:09 (non-in +teractive) Username: PBARRETT Owner: PAUL BARRETT Last Login: 18-JAN-2006 08:14 (interactive), 24-MAR-2005 17:09 (non-i +nteractive)

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parsing Text into CSV
by McDarren (Abbot) on Jan 19, 2006 at 07:17 UTC
    The following code will give you the desired output:
    #!/usr/bin/perl -w use strict; my %users; my $username; while (<DATA>) { next if (/^\s*$/); chomp; if (/^Username\:\s+([A-Z]+)\s+Owner\:\s+(.*)$/) { $username = $1; $users{$username}{realname} = $2; } elsif (s/^(Last Login\:\s*)//) { ($users{$username}{i}, $users{$username}{ni}) = split /,/, $_; } else { die "line invalid - $_, $!\n"; } } foreach my $user (keys %users) { print "$user,$users{$user}{realname},$users{$user}{i},$users{$user}{ +ni}\n"; } __DATA__ Username: JANDERSON Owner: JOE ANDERSON Last Login: 14-JAN-2002 08:14 (interactive), 4-MAR-2002 17:09 (non-in +teractive) Username: PBARRETT Owner: PAUL BARRETT Last Login: 18-JAN-2006 08:14 (interactive), 24-MAR-2005 17:09 (non-i +nteractive)

    Comments:

    • Your main problem is that you weren't correctly assigning to the hash. Instead of $login_times{ni} you need something like $login_times{$user}{ni]
    • There is no need to quote your hash elements if they don't contain whitespace (it just makes the code more confusing)
    • I re-named your hash as %users, so it makes more sense
    • Your username pattern match was capturing all the whitespace following the username. If you can assume that all usernames will be uppercase letters only, then you could re-write that as /^Username\:\s+([A-Z]+)\s+Owner\:\s+(.*)$/

    I should also note that the above code is quite fragile, insofar as it assigns the $username when it reads the first line, and then assumes that the username is the same when it gets to the second line. There are probably much better ways to go about this, which no doubt some other monks will point out :)

    Cheers,
    Darren :)

Re: Parsing Text into CSV
by radiantmatrix (Parson) on Jan 19, 2006 at 15:37 UTC

    I think you're making the surrounding code too complicated:

    while (<IN>) { next if /^\s*$/; chop; /^Username\:\s+ (.*?) \s+ Owner\:\s+ (.*?)\s*$/x and do { print "$1,$2,"; next }; /^Last Login:\s+(\d+.*?\:\d+).*,\s*(\d+.*?\:\d+)/ and print "$1,$2\n"; }
    Tested to work on your sample data.
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
      Oh, actually, I forgot - the code also has to deal with lines like
      Last Login: (none) (interactive), (none) (non-in +teractive) Last Login: (none) (interactive), 12-SEP-1997 11:33 (non-in +teractive) Last Login: 11-JUL-2002 18:08 (interactive), (none) (non-in +teractive)
      I figured I'd add to radiant.matrix's code like
      /^Last Login:\s+.*,\s*(\d+.*?\:\d+)/ and print "$1,$2\n";
      but it's giving an error 'Unrecognized escape \d passed through..' I figure adding a condition to BrowserUK's regexp would work, too...

        And what output would you want in those cases? The text '(none)' where the date should be?

        #! perl -slw use strict; while( <DATA> ) { $_ .= <DATA>; print join',', m[ Username: \s+ (\S+) \s+ Owner: \s+ ([^\n]+ ) \n Last\sLogin: \s+ ( .+ ) \s \( [^,]+, \s+ (.+) \s\( ]x; } __DATA__ Username: JANDERSON Owner: JOE ANDERSON Last Login: 14-JAN-2002 08:14 (interactive), 4-MAR-2002 17:09 (non-in +teractive) Username: PBARRETT Owner: PAUL BARRETT Last Login: 18-JAN-2006 08:14 (interactive), 24-MAR-2005 17:09 (non-i +nteractive) Username: LARRY Owner: LARRY FINE Last Login: (none) (interactive), (none) (non-in +teractive) Username: MOE Owner: HARRY MOSES HOWARD Last Login: 11-JUL-2002 18:08 (interactive), (none) (non-in +teractive) Username: CURLY Owner: JEROME HOWARD Last Login: (none) (interactive), 12-SEP-1997 11:33 (non-in +teractive)

        Gives

        P:\test>junk JANDERSON,JOE ANDERSON,14-JAN-2002 08:14,4-MAR-2002 17:09 PBARRETT,PAUL BARRETT,18-JAN-2006 08:14,24-MAR-2005 17:09 LARRY,LARRY FINE,(none),(none) MOE,HARRY MOSES HOWARD,11-JUL-2002 18:08,(none) CURLY,JEROME HOWARD,(none),12-SEP-1997 11:33

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parsing Text into CSV
by arto (Novice) on Jan 20, 2006 at 10:01 UTC
    #! perl $/ = "Username: "; my $re = q#^(\w+)\s+Owner:\s+(.+)\s+Last Login:\s*(.+?)\s*\(interactiv +e\),\s*(.+?)\s*\(non-interactive\).*$#; while (<>) { if (m|$re|ms) { printf "%s,%s,%s,%s\n",$1,$2,$3,$4; } }