comment on

Hello,

First, please forgive the lengthiness and any undesired formatting, I am a newbee and am just becoming familiar with the forum.

I was wondering if anyone might be able to help me.

I am trying to write code that ultimately will do some complex searching.

The problem I am stuck with is this:

I have two types of files: The first is one single file (example 1), while the second is about 250 files (example 2).

Example 1 is about 1173 pages worth of the following types of lines:

abaci, U, ae 1 b ax 0 s ay 0, 100, 0

All of the example 2 files contain somewhere around 20+ pages and look something like:

47.307796 122 <EXT-I've>; U; U

47.530873 122 lived; l ih v d; l ah v d

Currently I am trying to get PERL to read all of the lines in the first file, split them, and print one column. I also am trying to do the same using globbing for the other 250+ files. When I do either of these in isolation I can do this perfectly and generate exactly the results I need. However, when I combine the code (see below) I run into results such as:

a) Printing only the amount of information from the first file that equals the length of the other example. For example, if I begin the code with example 1 and only test with 2 files from ex. 2 (i.e., about 40 pages of ex. 2), I get either (a) the first 40 pages of ex. 1, or one line (first or last) of the file) repeated over 40 pages. The reverse happens when I flip around the codes.

I have tried (not shown in my code below) things such as switching around loops, changing the foreach loop of the globbing to a while loop, moving around the close statements, moving "}" around, switching the order of which file should be worked with first, opening all files in the glob (including ex. 1) and than trying a sort of conditional split function, altering the file handles and $lines to make them more distinct... Nothing I am doing results in a better output.

Thus if anyone has thoughts of why this is misprocessing the information I need, I would sincerely appreciate it.

The code is:

#Open file matching ex.1;
open (C, "<dic.txt") || die "dictionary";
#open file to write to;
open (B, ">>all.txt") || die "output";
#Making a loop of all lines in example 1 file;
while ($line2 = <C>) {
#Getting rid of the newline;
chomp $line2;
#Split all lines;
@firstgrouping = split(/, |,\s|,\t|,|\s,|\t,| ,/, $line2);
#splitting the lines in $firstgrouping[2] by the numbers so that text 
+before and after number are different indexed scalars;
@actualsyll = split(/\d |\d\s|\d\t|\d|\s\d| \d|\t\d|\t\d\t|\s\d\s| \d 
+/, $firstgrouping[2]); 
#Printing the new version of @firstgrouping[2];
print B "@actualsyll\n";
}
close C;

#Loop gets all files matching ex. 2 opens them;
foreach $file (<s*.words>) {
open (A, "<$file") || die "files";
#open (B, ">>awe.txt") || die "output";
#Making a loop of all lines in each file;
while ($line1 = <A>) {
#There are headers with information I do not need so this essentially 
+cuts them out;
$line1 =~ s/^   |^    |^\s\s\s|\s{3,4}//;
#Chomping of the newline;
chomp;
#Making a loop of all lines in all files from ex. 2 without their head
+ers;
foreach ($line1 =~ /^\d/g) {
#Splitting the files into the numbers to the first space, the 122, the
+ word minus extra markers, the chopped up word before the ";, the fin
+al chopped up word;
if ($line1 =~ /\d\s\w|\d\s{1,2}\d|\d\s\s\d|\d  \d|;\s\w/gi) {
$line1 =~ s/\s| |\s\s|  |\s{2}/\t/g;
$line1 =~ s/\t\t|\t{2}/\t/;
($stamp,$extra,$orth,$a,$b,$c,$d,$e,$f,$g, $h,$i,$j,$k,$l, $m, $n,$o,$
+p,$q,$r,$s,$t,$u,$v,$w,$x,$y,$z) = split(/ <|>;|\t/, $line1);
#splitting all of the information after the first ";" into 2 scalars;
$split = "$a $b $c $d $e $f $g $h $i $j $k $l $m $n $o $p $q $r $s $t 
+$u $v $w $x $y $z";
($canon,$spoke) = split(/; /, $split); 
#Getting rid of some additional extraneous material (i.e., unwanted sp
+aces...);
$orth =~ s/;//;
$spoke =~ s/\s{1,}$|\t{1,}$//g;
#Making an array that will bind everything together (mostly to aid in 
+later coding not yet created);

@general = ($file, $stamp,$extra,$orth,$canon, $spoke, $syll);
}
#combining all of the $orth's into a loop;
foreach ($general[3]) {
#Making each column into its own array;
push(@array0, $general[0]);
push(@array1, $general[1]);
push(@array2, $general[2]);
push(@array3, $general[3]);
push(@array4, $general[4]);
push(@array5, $general[5]);
}}}}
close A;

#Making a loop of each array created above;
foreach (@array0,@array1,@array2,@array3,@array4,@array5) {
#Removing each element one at a time for later (not yet created) condi
+tional searching of each array element;
@shift0 = shift @array0;
@shift1 = shift @array1;
@shift2 = shift @array2;
@shift3 = shift @array3;
@shift4 = shift @array4;
@shift5 = shift @array5;
#Prints out the $orth word of each line on its own line (used mostly a
+s a debugger right now);


print B "@shift3\n";

}
[download]

Many thanks,
Napa

In reply to Misprocessed Read From Files? by Napa

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.