goosefairy has asked for the wisdom of the Perl Monks concerning the following question:

I have a series of ascii files in a directory with the following format:

<SENDER> Sender Name
<TO> To list
<FROM> User Name
<MESSAGE>
Text text text text text...

The Message part will go on for several lines and could have punctuation, spaces and special characters which would need to be preserved. On our pages the message always gets displayed with <pre> tags.

<MESSAGE> is always the last field and the actual message always starts on the next line.

My ultimate goal is to put this information into a mySQL database table where the column names correspond to the <TEXT> part and the data is what comes after that.

I can get the information in all but the <MESSAGE> field. Following is my code:

#!/usr/bin/perl -w $dirname = "2003/"; #----------------------------------------------- sub field_found(@_) { my $line = shift; my $fld = shift; my $val = shift; my $pos = index($line,$fld); if($pos == 0){ # found field my $flen = length $fld; my $llen = length $line; $$val = substr($line,$flen,$llen); } # found field } # opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { open(INPUT, $dirname . $file) or die; while($line=<INPUT>) { chomp($line); field_found($line,"<SENDER>",\$sender); field_found($line,"<TO>",\$to); field_found($line,"<FROM>",\$from); field_found($line,"<MESSAGE>",\$message); @array = ("$sender","$to","$from","$message"); } close(INPUT); open INPUT, ">2003/clean/$file.clean" or die; # this here just to check array contents print INPUT "Sender: $array[0]\n"; print INPUT "To: $array[1]\n"; print INPUT "From: $array[2]\n"; print INPUT "Message: $array[3]\n"; close(INPUT); } closedir(DIR);

How in the world do I go about this? Am I totally off-track?

Thanks for any help or just pointing in the right direction.

goosefairy

Replies are listed 'Best First'.
Re: is array the best way to handle this?
by Ovid (Cardinal) on Sep 08, 2003 at 19:10 UTC

    You mentioned a "series" of ASCII files, so I assume that we have one message per file. If so, how about the following? I'm using a hash, but only for convenience. Also, note that this is simply a rewriting of your inner while loop. I didn't add the rest as I was going for clarity.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my %message; while (defined (my $line = <DATA>)) { chomp $line; my ($field,$value) = $line =~ /^<([A-Z]+)>\s*(.*)/; $message{$field} = $value; # assumes message is last item in file and doesn't # start on same line as MESSAGE key if ('MESSAGE' eq $field) { $message{$field} = do { local $/; <DATA> }; } } print Dumper \%message; __DATA__ <SENDER> Sender Name <TO> To list <FROM> User Name <MESSAGE> Text text text tex

    Cheers,
    Ovid

    New address of my CGI Course.

      Thanks, this worked beautifully. One more question. This is an example of the text I get in the MESSAGE field:

      ^M
      ^M
      =================================================^M
      DT: AUGUST 7, 2003 ^M
      TO: ALL STATIONS / PROGRAM AND NEWS DIRECTORS ^M

      How in the world do I get rid of the ^M characters? They are carriage returns from something (I have no idea what). I have tried every regex replace I can think of but nada.

        Those are Windows carriage returns. s/\r//g; is the regex you're looking for.

        ------
        We are the carpenters and bricklayers of the Information Age.

        The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

        Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: is array the best way to handle this?
by chromatic (Archbishop) on Sep 08, 2003 at 19:09 UTC

    I'd use a hash, because you refer to sections of the record by name. Something like the following might work for reading in records, though it's untested.

    my @messages; # loop through directory { local ($/, *INPUT); open(INPUT, $dirname . $file) or die; my %record = split(/<(SENDER|TO|FROM|MESSAGE)>/, <INPUT>); push @messages, \%record; }