If that sample of data really represents the contents of the file you need to use as input, then I'd have to say:

The term "flat file" refers to a plain text file that represents a simple two-dimensional table: each line of the file is one row of the table, and contains one or more fields (columns of the table); if there are two or more fields per line (row), these are separated by some sort of clear and unambiguous character or string (tab, comma, colon or whatever), provided that this cannot occur as part of a field value. This property makes it "machine readable".

A CSV file is a particular type of flat file, where the fields on a line are separated by commas, but a field may contain a comma character as part of its content if the field is surrounded by quotation marks. (And that means that if the field content includes quotation characters, these must be escaped, usually by prefixing each one with a backslash.) The use of quotes and escapes assures that the data are still machine readable.

What you have shown in your post is not machine readable -- it's a format intended for humans, who have the ability to "see" and understand structural boundaries, by "simply" understanding the meanings of labels, values and patterns of whitespace.

If you do not have access to some other form of this data that would be easier for a machine to read, you're going to have a tough job handling the data that you've shown. But it is doable.

If the spacing of columns is consistent, you can categorize the lines based on their type of content, as flagged by some distinctive word token common to all lines of a given type. For example, the particular lines that match the following regular expressions each contain a specific set of fields, and have a specific pattern of spacing betwen the fields (and we can only hope that the spacing is consistent across the 7000 records):

/^Request Number:/ /^Status:/ /^Affected User:/ ... /\d+\s+minutes\s+\d+\s+seconds/ # (update: had forgotten 3rd "\s+")
You'll probably end up using "substr()" or "unpack()", with different numeric parameters for each type of line, in order to pull out the field names and values for the line -- e.g (guessing at the column widths):
my %record; if ( /^Request Number:/ ) { my @flds = (); push @flds, substring( $_, 0, 40 ); push @flds, substring( $_, 40, 26 ); push @flds, substring( $_, 66 ); # rest of line for my $fldname ('Request Number', 'Parent', 'Priority') { (my $val = shift @flds) =~ s/^$fldname\s*:\s*//; $val =~ s/\s*$//; # update: remove leading/trailing whi +tespace... $record{$fldname} = $val; # which means this could be "" -- not + a problem } } # and likewise for each other type of line
When you get to the "fields" that involve multi-line content, you'll need to be more clever, possibly keeping a "state" variable to guide the parsing: as you see the start of a (potentially) multi-line field, set $state to the name of that field, and append subsequent lines to that field value until you see evidence of a new field starting.

Good luck -- especially with getting a more readable version of the data, so you won't have to do all this hard labor.

Update: Since you mentioned you are new at this, I should add one more point about the strategy in the code snippet above. The idea there is to fill a hash array (called "%record"), with a set of field values. Once you get to a point in your file where a new record begins, you need to save the current contents of %record somewhere -- i.e. your Access or other database -- then proceed with parsing the next record (reusing the same space/elements in the hash).

One thing that could simplify this for you is if you can determine a specific string that always occurs at record boundaries and nowhere else. Set "$/" (INPUT_RECORD_SEPARATOR) equal to this string, and you can structure your code as follows:

$/ = "end-of-record-string"; # whatever that may be while (<>) { # this now reads one whole record into $_ my @lines = split( /\n/ ); my %record while ( @lines ) { $_ = shift @lines; # proceed with parsing record fields as above... if ( /^Request Number:/ ) { ... } elsif ... } store_to_database( \%record ); }

In reply to Re: Extracting data from a flat file and inserting it into a databas by graff
in thread Extracting data from a flat file and inserting it into a databas by scribefaith

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.