Parsing e-mail and building HTML

rograndom has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have some understanding of Perl: I can look at code and pretty much figure out what it does, but I do not have the ability to write my own programs, yet. Here is my problem: Every few days I receive an e-mail from a client with a job listing. The information in the e-mail (job title, location, description, etc.) has to be put into a new HTML document and assigned a sequential number (jobXX.html). Next the Title, Location and Contact Information, has to be put into two separate files, one sorted by State and one sorted by Job Category, along with a link back to the HTML file with the rest of the information. Now, the infomation must be put in a certain part: In the sorted by State file, if the job is located in New York, it must be listed with the other New York jobs (in a HTML table), same for the sorted by Job, all "sales" jobs must be together. I've heard that Perl is very good for text processing, and this seems like a good job for it. I know it would be better to setup a database instead of static HTML files, but that is not possible right now. If someone would like to point me in the right direction that would be great.

Thanks,

andy j.

Comment on Parsing e-mail and building HTML

Replies are listed 'Best First'.
Re: Parsing e-mail and building HTML by c-era (Curate) on Jul 05, 2000 at 20:00 UTC
A database would be good, but you can put it all in a file and use "\|" to deliminate the fields (it is very unlikly "\|" will be found in a text file). I would then write a script that would read the file and put the values in an array and use a foreach loop to create the html. `# to read from the file open (FILE,"</myfile"); flock FILE,1; while (<FILE>){ ($tmp_state,$tmp_job) = split /\\|/,$_; push @job, $tmp_job; } # sort @job or @state how you would like # Print html header foreach $tmp_job(@job){ # Print html with @job for a value and } # Print html footer` [download] That should be a basic outline for your code. PS A good way to learn how to write a program is to write all the steps you need to do in comments. Then you can write the code that will do what you wrote in your comment. `#open file #lock file #read user input #write user input to file #close file #open file open (FILE,"</myfile"); #lock file flock FILE,2; #read user input $input = <STDIN>; #write user input to file print FILE "$input"; #close file close FILE;` [download]	[reply] [d/l] [select]
RE: Re: Parsing e-mail and building HTML by athomason (Curate) on Jul 06, 2000 at 00:38 UTC
it is very unlikly "\|" will be found in a text file Relying on unlikelihoods is a dangerous programming practice, especially when you have a concern for security. You'd be much better off escaping the string or validating the input to guarantee your delimiter can't be found within your data. However, this can be a significant pain, so a good compromise is to use a multi-character delimiter instead of a single character, say '\|\|\|'. This is a minimal change which does a lot more to protect your data.	[reply]
RE: Re: Parsing e-mail and building HTML by mrmick (Curate) on Jul 05, 2000 at 20:59 UTC
I wish I had saved one of todays votes.. (oh well, tomorrow is another day). In addition to c-era's reply, I would like to add that you could also write a Perl program to read your customers' email messages and put the data into the appropriate text file(s), thus saving you considerable effort each time new data arrives. This is assuming, of course, that there is a somewhat consistent template or format in which you receive your data.	[reply]
RE: RE: Re: Parsing e-mail and building HTML by rograndom (Initiate) on Jul 05, 2000 at 23:23 UTC
In addition to c-era's reply, I would like to add that you could also write a Perl program to read your customers' email messages and put the data into the appropriate text file(s), thus saving you considerable effort each time new data arrives. This is assuming, of course, that there is a somewhat consistent template or format in which you receive your data. Yes, this is also what I would like to do. Could I just read the e-mail then start processing from there with out putting into a textfile first? Another thing that I didn't make clear above was that the two main files (sorted by state & job) are already built, and the new information will be added as a row in a pre-existing table. Perhaps I could post urls of sample pages, and a sample e-mail (which comes from a database, so they're all the same). andy j.	[reply]
RE: RE: RE: Re: Parsing e-mail and building HTML by mrmick (Curate) on Jul 06, 2000 at 04:05 UTC
Re: Parsing e-mail and building HTML by davorg (Chancellor) on Jul 05, 2000 at 23:08 UTC
For extracting the data from the HMTL files, you should be looking at `HTML::Parser` or one of it's subclasses (probably `HTML::TokeParser`). If you want most of the flexibility of a database but without the iverheads, why not store the data in a set of CSV files and use `DBD::CSV` to access it. -- <http://www.dave.org.uk> European Perl Conference - Sept 22/24 2000, ICA, London <http://www.yapc.org/Europe/>	[reply]