Hello,

First, please forgive the lengthiness and any undesired formatting, I am a newbee and am just becoming familiar with the forum.

I was wondering if anyone might be able to help me.

I am trying to write code that ultimately will do some complex searching.

The problem I am stuck with is this:

I have two types of files: The first is one single file (example 1), while the second is about 250 files (example 2).

Example 1 is about 1173 pages worth of the following types of lines:

abaci, U, ae 1 b ax 0 s ay 0, 100, 0

All of the example 2 files contain somewhere around 20+ pages and look something like:

47.307796 122 <EXT-I've>; U; U

or

47.530873 122 lived; l ih v d; l ah v d

Currently I am trying to get PERL to read all of the lines in the first file, split them, and print one column. I also am trying to do the same using globbing for the other 250+ files. When I do either of these in isolation I can do this perfectly and generate exactly the results I need. However, when I combine the code (see below) I run into results such as:

a) Printing only the amount of information from the first file that equals the length of the other example. For example, if I begin the code with example 1 and only test with 2 files from ex. 2 (i.e., about 40 pages of ex. 2), I get either (a) the first 40 pages of ex. 1, or one line (first or last) of the file) repeated over 40 pages. The reverse happens when I flip around the codes.

I have tried (not shown in my code below) things such as switching around loops, changing the foreach loop of the globbing to a while loop, moving around the close statements, moving "}" around, switching the order of which file should be worked with first, opening all files in the glob (including ex. 1) and than trying a sort of conditional split function, altering the file handles and $lines to make them more distinct... Nothing I am doing results in a better output.

Thus if anyone has thoughts of why this is misprocessing the information I need, I would sincerely appreciate it.

The code is:

#Open file matching ex.1; open (C, "<dic.txt") || die "dictionary"; #open file to write to; open (B, ">>all.txt") || die "output"; #Making a loop of all lines in example 1 file; while ($line2 = <C>) { #Getting rid of the newline; chomp $line2; #Split all lines; @firstgrouping = split(/, |,\s|,\t|,|\s,|\t,| ,/, $line2); #splitting the lines in $firstgrouping[2] by the numbers so that text +before and after number are different indexed scalars; @actualsyll = split(/\d |\d\s|\d\t|\d|\s\d| \d|\t\d|\t\d\t|\s\d\s| \d +/, $firstgrouping[2]); #Printing the new version of @firstgrouping[2]; print B "@actualsyll\n"; } close C; #Loop gets all files matching ex. 2 opens them; foreach $file (<s*.words>) { open (A, "<$file") || die "files"; #open (B, ">>awe.txt") || die "output"; #Making a loop of all lines in each file; while ($line1 = <A>) { #There are headers with information I do not need so this essentially +cuts them out; $line1 =~ s/^ |^ |^\s\s\s|\s{3,4}//; #Chomping of the newline; chomp; #Making a loop of all lines in all files from ex. 2 without their head +ers; foreach ($line1 =~ /^\d/g) { #Splitting the files into the numbers to the first space, the 122, the + word minus extra markers, the chopped up word before the ";, the fin +al chopped up word; if ($line1 =~ /\d\s\w|\d\s{1,2}\d|\d\s\s\d|\d \d|;\s\w/gi) { $line1 =~ s/\s| |\s\s| |\s{2}/\t/g; $line1 =~ s/\t\t|\t{2}/\t/; ($stamp,$extra,$orth,$a,$b,$c,$d,$e,$f,$g, $h,$i,$j,$k,$l, $m, $n,$o,$ +p,$q,$r,$s,$t,$u,$v,$w,$x,$y,$z) = split(/ <|>;|\t/, $line1); #splitting all of the information after the first ";" into 2 scalars; $split = "$a $b $c $d $e $f $g $h $i $j $k $l $m $n $o $p $q $r $s $t +$u $v $w $x $y $z"; ($canon,$spoke) = split(/; /, $split); #Getting rid of some additional extraneous material (i.e., unwanted sp +aces...); $orth =~ s/;//; $spoke =~ s/\s{1,}$|\t{1,}$//g; #Making an array that will bind everything together (mostly to aid in +later coding not yet created); @general = ($file, $stamp,$extra,$orth,$canon, $spoke, $syll); } #combining all of the $orth's into a loop; foreach ($general[3]) { #Making each column into its own array; push(@array0, $general[0]); push(@array1, $general[1]); push(@array2, $general[2]); push(@array3, $general[3]); push(@array4, $general[4]); push(@array5, $general[5]); }}}} close A; #Making a loop of each array created above; foreach (@array0,@array1,@array2,@array3,@array4,@array5) { #Removing each element one at a time for later (not yet created) condi +tional searching of each array element; @shift0 = shift @array0; @shift1 = shift @array1; @shift2 = shift @array2; @shift3 = shift @array3; @shift4 = shift @array4; @shift5 = shift @array5; #Prints out the $orth word of each line on its own line (used mostly a +s a debugger right now); print B "@shift3\n"; }

Many thanks,
Napa


In reply to Misprocessed Read From Files? by Napa

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.