Right now I'm working on a text parser for conversion of a pdf of table metadata (because that is the only way to view it outside of the stupid proprietary software) to a tab separated flat file for documentation purposes. I've hit a stumbling block that I am having trouble getting around though.

@meta is a 3-dimensional array of pages, lines, and fields. The parsing and recombining of lines is working as desired, the problem happens when it reaches the end of the page. Once it reaches the end, new elements keep getting added to the end of @{$meta[$page]}, preventing the inner for loop from breaking as it should.

Here is the relevant section of problem code:

open OUT, ">${new_file}.meta" or die "DIAF OUT: $!"; for (my $page = 0; $page < scalar @meta; ++$page) { for (my $line = 0; $line < scalar @{$meta[$page]}; ++$line) { print "$page:$line/",scalar @{$meta[$page]},"\n"; while ($meta[$page]->[$line+1]->[0] and !$meta[$page]->[$line+ +1]->[1] and !$meta[$page]->[$line+1]->[2]) { print "while loop #1\n"; $meta[$page]->[$line]->[0] .= " $meta[$page]->[$line+1]->[ +0]"; del($meta[$page],$line+1); } while (!$meta[$page]->[$line+1]->[1] and $meta[$page]->[$line+ +1]->[2] and $line+1 < scalar @{$meta[$page]}) { print "while loop #2\n"; $meta[$page]->[$line]->[0] .= " $meta[$page]->[$line+1]->[ +0]" if ($meta[$page]->[$line+1]->[0]); $meta[$page]->[$line]->[2] .= " $meta[$page]->[$line+1]->[ +2]"; del($meta[$page],$line+1); } if (!$meta[$page]->[$line+1]) { print "last if check\n"; while (!$meta[$page+1]->[0]->[1] and $meta[$page+1]->[0]-> +[2]) { print "while loop #3\n"; $meta[$page]->[$line]->[0] .= $meta[$page+1]->[0]->[0] + if ($meta[$page+1]->[0]->[0]); $meta[$page]->[$line]->[2] .= $meta[$page+1]->[0]->[2] +; del($meta[$page+1],0); } } print OUT "$meta[$page]->[$line]->[0]\t$meta[$page]->[$line]-> +[1]\t$meta[$page]->[$line]->[2]\n"; } } close OUT; sub del { my $rArr = shift; my $ele = shift; my $last = scalar @$rArr - 1; if ($ele > $last) { warn "Invalid element removal attempted: $ele > $last\n"; return 0; } for my $num ($ele..$last-1) { $rArr->[$num] = $rArr->[$num+1]; } pop @$rArr; return 1; }

Which produces output (continues on infinitely, cut at $line = 21):

0:0/36 0:1/36 0:2/36 0:3/36 ... blah blah blah ... 0:13/17 0:14/17 while loop #1 while loop #1 0:15/16 0:16/17 0:17/18 0:18/19 0:19/20 0:20/21 0:21/22

I'm pretty stumped. The output shows that the while loops aren't being entered, and that is the only place where @meta is modified. Any ideas?


In reply to Undesired array growth by apok

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.