Alligator has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, Greetings!!

I posted a wrong email data, sorry for that. The following is the actual.

I get an email everyday with an attachment that indicate the following activity:

1) Authenticated Admin logins
2) Authenticated User logins
3) Authenticated Failed logins
The email attachment indicate the activity for the previous day.

The email attachment comes as HTML (with an image and data in tables) and the following data in the attachment need to be parsed:

1) Peripheral IP Address
2) Data
3) Time
4) Source
There will be two scripts

i) First script to parse the incoming data and insert the data into DB.
ii) Second script to aggregate the data once a week and send an email to a specified list. (This will be invoked by a cron)

Once we have the data, further the following operations need to be done.

1) Insert the parsed values into Oracle/MySQL DB.
2) Generate a aggregated report on every Sunday, that includes all the authentication information (i.e, Authenticated Admin logins, user logins and failed logins). The reports should be generated based on a Sunday-Saturday weekly period
The email attachment looks like the following:
|------------------------------------------------------------| | GIF image Firewall Reports ->(also a gif) | | -------------------- | + | Scheduled report for Peripheral at IP | | address 192.168.100.1 (David - 00027u) | | | |------------------------------------------------------------| | | | Authenticated Admin Logins for 2004-11-01 | | | | Time Source | | | | 09:10:58 192.168.100.10 | | 10:12:21 192.168.100.25 | | | | | |* Report producted for timezone: Pacific Time(US & Canada) | |* Report generated based on data on 11/01/2004 15:27:54 UTC | + |-------------------------------------------------------------

The weekly generated report should look as following:
|--------------------------------------------------------------------- +-------| | GIF image | | | + | | | Authentication Report | + | | |--------------------------------------------------------------------- +-------| | Successful Admin Logins (11/15/04 - 11/21/04) + | |--------------------------------------------------------------------- +-------| |Peripheral IP Address Date Time Source + | | 192.168.100.1 (David - 00027u) 11/04/2004 09:10:54 192.168.100 +.10 | | 11/04/2004 10:19:34 192.168.100.12 | |--------------------------------------------------------------------- +-------| | Failed Logins (11/15/04 - 11/21/04) + | |--------------------------------------------------------------------- +-------| |Peripheral IP Address Date Time Source + | | 192.168.100.1 (David - 00027u) 11/04/2004 09:05:50 192.168.100 +.10 | | 11/04/2004 15:19:34 192.168.100.12 | |--------------------------------------------------------------------- +-------| | Successful User Logins (11/15/04 - 11/21/04) + | |--------------------------------------------------------------------- +-------| |Peripheral IP Address Date Time Source + | | No Data No Data No Data No Data | |--------------------------------------------------------------------- +--------

Note:
1)Peripheral IP address comes in the header of the the email attachment.
2) Date: Date format should be in the format of month/day/year. If no data occurs for an entire week for a particular section, then "No Data" should appear in this column for the section with no data.
3)Time: Time should be in the format hour:minute:second. Hour field shoule be 0 to 23, minute is 0 to 59 and second should be 0 to 50. If no data occurs for an entire week for a particular section, then "No Data" should appear in this column for the section with no data.
4) Source: This column reflects the source IP found in each report. There many be multiple IP addreses listed in the authentication report. If no data occurs for an entire week for a particular section, then "No Data" should appear in this column for the section with no data.

I've come up with the following table design:

Admin_logins
admin_id auto_increment
peripheral_ip
date
time
source

failed_logins
failed_id auto_increment
peripheral_ip
date
time
source

successful_logins
s_id auto_increment
peripheral_ip
date
time
source

aggregate_logins
a_id auto_increment
admin_id FK to admin_logins (admin_id) table
failed_id FK to failed_logins (failed_id) table
s_id FK to successful_logins (s_id) table

Please let me know if the above table design is sufficient or any changes to be made.

Please let me know your inputs about building the scripts based on the above inputs or if possible put some sample code which will help me complete this task faster

Thanks for the help in advance.

Replies are listed 'Best First'.
Re: Parsing email and returning the key/values in two hashes
by olivierp (Hermit) on Dec 21, 2004 at 12:37 UTC
    This should get you started.
    It's up to you to figure out why I am not putting things in hash(es).
    HTH
    --
    Olivier
      Hi Oliver

      I was looking for something similar, that your example did.
      Thanks indeed for the help.

      Cheers
Re: Parsing email and returning the key/values in two hashes
by Random_Walk (Prior) on Dec 21, 2004 at 12:22 UTC

    Can you post the solution you have tried ? You did have a go didn't you ? Did you have problems with split or with your while loop ?

    I think I can see where your problem with split is comming from, perhaps you should use unpack or substr as it looks like fixed width columns

    Cheers,
    R.

      Hi,
      I indeed used 'split' operator, but didn't know how to pack each column value.

        Hi 'gator,

        If you have a line that is fixed width columns you can unpack it based on the widths quite easily. This gets around all the probs of split and fields that can contain a variable amount of whitespace (I suspect your category field easily could)

        # it goes a little like this while (<DATA>) { my ($Name, $Host, $Port, $Rest)=unpack('A15A10A10A*', $_); # now do what you will with them }

        the A tells it to unpack ASCII, the numbers are the number of characters in each column, the A* just grabs the rest 'cos I was too lazy to count all your column widths

        Festive Cheers,
        R.

Re: Parsing email, inserting into database and creating an aggregation report.
by bart (Canon) on Dec 23, 2004 at 09:51 UTC
    You've got a pretty big task ahead of you, alligator. I won't do your work for you, but I can give you a few tips for your outline.
    • You have to get the HTML page out of the email attachment. I'd recommend to use a MIME parsing tool, such as MIME-Tools
    • You'll have to extract the data out of the HTML file, and build a perl data structure (or insert directly into database). HTML::TokeParser::Simple is, IMHO, the most handy approach to tackle that problem.
    • You can use Data::Dumper (which comes with perl) to inspect what your built data structure looks like.
    • Use DBI to access the database, both inserting data into the database, and for doing the queries. BE forewarned, learning about DBI can take quite a long time... I recommend reading the DBI book. In addition, I like DBIx://Simple, which can reduce the code you need to write to make queries considerably.
    • To convert your data back into a report, you can try using blocks consisting of here-docs. Or, you can look into a templating mechanism, like Template-Toolkit (home page, tutorial), or the simpler Text::Template, or even HTML::Template, if you're planning on just outputting HTML.
    There, I think that covers the basics, I think.