comment on

I know that this has to be so simple that I am almost embarrassed to ask...but I have been unable to find the answer to this anywhere (maybe I just don't know what I am looking for).

My company receives orders via e-mail from our customers system. Each message is one order. I need to pull the files apart and drop the data into a database. The part I am having trouble with is the pulling apart and defining fields (I can do the database work, I just can't get the data into nice chunks). These messages have a very standard format (this is just a snippet to show some of the different format issues I have...if you can show me how to deal with these I will we able to mix and match to do the whole message):

feild1 name:
data1
data1
data1
feild2 name:
data2
feild3 name: data3
**a full line of astris**

unimportant text

***field4 name: data4
***field5 name: data5

**a full line of astris**
unimportant text
field6 name: data6

field7 name: data7
field8 name: data8

location 1
field9 name: data9
field10 name: data10

location 2
field11 name: data11
field12 name: data12
etc...

In case it does not display correctly here online, the field names are flush left and the data is aligned some distance out. The field names could be more than one word but they all (the field names) end with a colon Each field is on a separate line. But not every line is a field (there are blank lines and lines of * used to make the message more easily human readable). Not all fields are filled. Some of the field names are duplicated (in the above example field 9 would match field 11 (example state:) and field 10 would match field 12 (example city:). The order is not always the same, and the specific fields change (some messages will have some fields and others will have other fields) The amount of data (number of lines) in field1 is variable (it is a list of addresses which I don't care about anyway.). As you can see, field2's data is on the line below the field name (other than field 1, it is the only one like this.)

I would like to be able to reference the data by it's field name (in the case of the order being changed, I can still refer to the same name. Also, when new fields are added I am set)...

That's it....seems like it should be relatively simple. Anything to the left of a colon is a field name and, on that same line but some distance to the right, is the data. The only exception is field #2 which is on the line below. (the good news here is that the name of this field is always the same. So if I see the word Subject: I can pull data from the line below it).

Questions:
How do i do this?
How can I handle the duplicated field names?

The Perl Nubie

In reply to Probably very simple (for those in the know) by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.