Happy New Year and many more! Here's my story. I'm working on a script that will process huge text files (forwarded spam) and pull certain elements that I need to work with in a database.
#!/usr/bin/perl -w ### sting ### ### Table Setup ### # CREATE TABLE subject_info # (id int not null auto_increment, plaintif varchar(100) not null, sub +ject longtext not null, # date varchar(50) not null, primary key (id)); use warnings; use strict; use DBI; #file variables and flags# my $flag = '0'; my $spamcop_email = shift || 'c:\frodo\mail\spamcop_email.txt'; my $text_output = shift || 'c:\frodo\output\text_output.txt'; #regex parameters and variables# my $from = 'From: '; my $full_subject = 'Subject: '; my $date = 'Date: '; my @buffer = ('',''); #initialize arrays# open (SCEMAIL, "$spamcop_email") || die "Can't open $spamcop_email"; my @spamcop_email_array=<SCEMAIL>; close (SCEMAIL); open (TEXTOUT, ">$text_output") || die "Can't open $text_output"; #loop and fill @buffer# foreach (@spamcop_email_array){ if(s/.*$from//){ $buffer[0] = $_; $flag = 0; } if(s/.*$full_subject//){ $buffer[1] = $_; $flag = 0; } if(s/.*$date//){ $buffer[2] = $_; $flag = 0; print TEXTOUT @buffer; } } close TEXTOUT; #DBI Connect and Insertion# my $dbh = DBI->connect("DBI:mysql:database=SpamCopBot; host=lo +calhost", "amearse", "tttttt", {'RaiseError' => 1}); my $sth =$dbh->prepare("INSERT INTO subject_info (plaintif, su +bject, date) VALUES ('$buffer[0]', '$buffer[1]', '$buffer[2]')"); $sth->execute(); $sth->finish(); $dbh->disconnect();
As you can see, the three elements are printed to both text file and database. I have put the flags there to reject dupes, but I know that they currently do nothing. The first problem occurs in the text output. Here is a snippet to show you what I'm talking about.
52387348@reports.spamcop.net Wednesday, January 02, 2002 10:52 PM 52432604@reports.spamcop.net [SpamCop (http://web1.customoffers.com/unsubscribe.asp?emid=1008&email +=x) id:52387348] 4HourWireless Special of the Month - Signal Booster Wednesday, January 02, 2002 11:28 PM 52496384@reports.spamcop.net [SpamCop (http://web1.customoffers.com/unsubscribe.asp?emid=1008&email +=x) id:52432604] 4HourWireless Special of the Month - Signal Booster Thursday, January 03, 2002 12:20 AM 52553913@reports.spamcop.net [SpamCop (http://web1.customoffers.com/unsubscribe.asp?emid=1009&email +=x) id:52496384] AWARD CONFIRMATION Thursday, January 03, 2002 01:04 AM
Notice how the first result is missing the subject line? Well actually, it has been bumped down to the next result, creating a real problem when it comes to output validity. What is the cause of this? I have checked my data sources and they are fine, all the necessary info is there. It is strange to me, I am testing this on 20 emails, so I should see 60 lines of text output, however I only get 59 lines, with the last subject line dropped, but replaced by the one above it. That said, the next problem is in the database entry. When I check the results, the database has only aquired one entry though I had expected 20.
| id | plaintif | subject | date | | 6 | 52531044@reports.spamcop.net | [SpamCop (http://web1.customoffers.com/unsubscribe.asp?emid=1032&em +ail=x) id:52531044] Membership Confirmation for G G | Thursday, January 03, 2002 12:48 AM |
This entry is the correct three elements from the last email of the twenty. I'm a bit lost, could you please sound off on any possible solutions to clean up my output and database entries? Bests, amearse

In reply to The contents of a misguided array. by amearse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.