My solution is really just two lines of substitution code. Here they are:
$mydata =~ s/([\w\s]+)\s([\w\d]+)\s(0000.*)/$1,$2,$3/g;
$mydata =~ s/(:\d{2})\s0000/$1,$2/g;
And if you're not easily overwhelmed by lots of documentation, here's the whole program with setup code, comments, and output:
#!/usr/bin/perl
# NODE34853.pl
# Assumptions: There are potentially any number of parts of a user's n
+ame.
# For example, "Bill Clinton" might be a user's name, but
# "William Jefferson Clinton the Liar" might also be his name.
# The user's name is next, which is always one word long. Might also h
+ave numbers
# in it, such as Bill69.
# The data is currently in a single scalar (as though you read it from
+ a flat file).
# And, It's not clear what granularity you want the data to have. I'm
+assuming
# that you want the user's name, his username, and the individual chun
+ks of login
# data. Do you also want to split up the login data? Your post didn't
+say.
#
# Knowing what you want to do with this data afterwards would also hel
+p. If you want to
# load this into a SQL database, then you'd probably want to do this a
+ bit differently.
# But, if your goal is just to comma-delimit the file so you can load
+it into
# a spreadsheet, then this oughta do the trick.
#
# This solution is really just a two line program with lots of comment
+s and some
# stuff to setup the environment and print the results.
# I hope it helps.
# --Mark
#
# This line just sets up the scalar variable you want to parse.
# I'm assuming you have other methods of doing this (reading from
# CSV, etc.)
$mydata = <<ENDDATA;
Bob Smith bsmith 00001234567 01/01/1986 00:00:00
Mary Ann Doe mdoe 00001234568 01/01/1986 00:00:01 00001234563 01/01/19
+86 00:00:02 00001234563 01/01/1986 00:00:03
Gilligan Q Smith gsmith 00001234569 01/01/1986 00:00:01 00001234569 01
+/01/1986 00:00:02
ENDDATA
# The purpose of this regex is just to split out the user's NAME,
# USERNAME, and associated DATA. We're leaving the guts of the DATA al
+one for now.
$mydata =~ s/([\w\s]+)\s([\w\d]+)\s(0000.*)/$1,$2,$3/g;
#MyData temporarily looks like this:
#Bob Smith,bsmith,00001234567 01/01/1986 00:00:00
#Mary Ann Doe,mdoe,00001234568 01/01/1986 00:00:01 00001234563 01/01/1
+986 00:00:02 00001234563 01/01/1986 00:00:03
#Gilligan Q Smith,gsmith,00001234569 01/01/1986 00:00:01 00001234569 0
+1/01/1986 00:00:02
# Now, let's split up the DATA parts by looking for the space between
+the :00 and 0000
$mydata =~ s/(:\d{2})\s0000/$1,$2/g;
print "All done. MyData now looks like this\n$mydata\n\n";
#Bob Smith,bsmith,00001234567 01/01/1986 00:00:00
#Mary Ann Doe,mdoe,00001234568 01/01/1986 00:00:01,00001234563 01/01/1
+986 00:00:02,00001234563 01/01/1986 00:00:03
#Gilligan Q Smith,gsmith,00001234569 01/01/1986 00:00:01,00001234569 0
+1/01/1986 00:00:02
I hope this helps. Let us know.
--Mark |