comment on

Hi Fshah,

Well unfortuately, your first Perl project is quite a big one. I also gather from your questions that you have very little prior programming experience which I'm sure makes this all the more difficult for you.

The purpose of PerlMonks is for you to learn Perl. I'm not seeing any attempts at your own code along with your questions.

I wrote a parser to get you started because it uses several techniques that are way beyond "Class 101, homework #1". I figured that you didn't have a chance at even getting started without substantial help. So I got you started with a big jump. Note that huck also contributed some code for you.

At this point, I would expect you to be spending a considerable number of hours and even full days trying to understand how the parser works. Learning a new language, especially a large language like Perl is difficult even for experienced programmers. Complex data structures are an advanced topic, but one that you need to learn more about. Consider. Perl Data Structures, and Data Types and Variables in the Monks' tutorials section.

My code parses each and every table it encounters. In Version2, a decision is made in finish_current_table() about whether to keep the table that has just been parsed as part of @results or not? finish_current_table() could be modifed to print the results right away instead of saving to @results. My code concentrated the "dump the results" into a section of code at the "end of the input file", but it doesn't have to be that way. And usually this isn't done that way. As a general principal, don't save things that you can dump/print/save to file/get rid of right away. I was thinking of a "multi-table record" when I wrote the code, but didn't put the decision logic in for that. There are many ways to do this.

You can have a variable that starts outputing tables when "Fp379" is seen in the first line of the name. And stops after a record is seen with a "blank" name?

TABLE: 2017 Position log :Fp379
place: cal
time: 23:01:45
Record_Start: 69
....

TABLE: language data:
time= 24hrs
Record_Start: 83
Record_End: 90
....

TABLE: Record_Start: 91  #<<<<- wrong/misleading
Record_End: 95
....
<c>

Modify print of $name to have a \n if its blank, so that you get:
<c>
TABLE: 2017 Position log :Fp379
place: cal
time: 23:01:45
Record_Start: 69
....

TABLE: language data:
time= 24hrs
Record_Start: 83
Record_End: 90
....

TABLE: 
Record_Start: 91  
Record_End: 95
....
[download]

As far as performance goes, if you modify the code such that @results doesn't become "huge" or don't even use @results, the code should run much faster than 5 minutes, even with 1.5 GB input. The speed should be about the rate at which 1.5 GB of lines can be read from the disk. The processing done on each line is very minimal. If you are saving all results to @results, then this could take a "long time" due to virtual memory concerns and disk swapping.

I don't know why (even if it takes 5 minutes to process every single table in this huge file) that 5 minutes is a problem? As a laugh, I remember one client who complained about a program that took 6 hours to run on my laptop. They were pretty upset about this amount of time (albeit much less on their server). I asked how many times per year do you run this program? Answer: 4. Has the program ever "made a documented mistake", are there any bug reports outstanding? Answer: No. You can imagine how the rest of discussion went...The execution time just didn't matter.

In this case, I suspect 5 minutes or even 6 hours to process the entire file is just fine. The idea should be to modify the descision making about the tables so that "once is enough". If you need 1,000 table records, do it all at once in one program run instead of running the program 1,000 times. Some more sophisticated decision making in finish_current_table() is needed.

In reply to Re^11: Parsing .txt into arrays by Marshall
in thread Parsing .txt into arrays by Fshah

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.