editholla has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

This is my third day using Perl and I have limited coding background as well. I am trying to pull all 5 character strings that begin with "4" and end with a letter from a group of text files in a directory. I have managed to create an array of these text files and print them all but I am struggling to find a way to pull out the data I want (ex. 4099A from text file shown below). I am interested in any help whether it would build upon my current code or start from scratch. It will also be great if there was a way to omit any repeated strings.

Thank You!

Code:
use strict; use warnings; my $way = "/tester/SPECS/7nm/TestRules/avatar/"; my @rit; opendir( my $rid, $way ); while ( my $entry = readdir $rid ) { next unless -f $way . '/' . $entry; next if $entry eq '.' or $entry eq '..'; push @rit, $entry; } closedir $rid; foreach (@rit) { my $cat = "/tester/SPECS/7nm/TestRules/avatar/$_"; open (WAY , $cat) or die("Can't open $cat"); my @lines = <WAY>; print @lines, "\n";
Example of a Text File:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +!!!!!!!! ! ! GENERIC TEST PLAN (RULES) FOR !!!!!!!!!!!!!!!!!!!!!!! nCaps ! ! CREATED ON 06/15/15 ! ! CHANGES: ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +!!!!!!!! !RECIPE NAME LOTTYPE PNP #WAFERS SITEMAP PRIORITY ! ! PS Level ILTPSPR25C080 POR 4099A 25 A300_20B 1 ILTPSPR25C080 SPLIT 4099A 125 A300_20B 1 ! ILTPSPR25N080 POR 4060A 25 A300_20B 1 !ILTPSPR25N080 POR 4060B 25 A300_20B 1

Replies are listed 'Best First'.
Re: Pulling 5 Character Strings out of an Array of Text Files
by 1nickt (Canon) on Dec 30, 2015 at 17:49 UTC

    Hello,

    You should read your file one line at a time using while with the diamond operator. You should search for text with a regular expression, and capture a match with parentheses.

    #!/usr/bin/perl use strict; use warnings; while ( my $line = <DATA> ) { print "Line number $. matches with $1\n" if $line =~ m/(\d{4}A)/; } __DATA__ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +!!!!!!!! ! ! GENERIC TEST PLAN (RULES) FOR !!!!!!!!!!!!!!!!!!!!!!! nCaps ! ! CREATED ON 06/15/15 ! ! CHANGES: ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +!!!!!!!! !RECIPE NAME LOTTYPE PNP #WAFERS SITEMAP PRIORITY ! ! PS Level ILTPSPR25C080 POR 4099A 25 A300_20B 1 ILTPSPR25C080 SPLIT 4099A 125 A300_20B 1 ! ILTPSPR25N080 POR 4060A 25 A300_20B 1 !ILTPSPR25N080 POR 4060B 25 A300_20B 1
    Output:
    Line number 13 matches with 4099A Line number 14 matches with 4099A Line number 16 matches with 4060A

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Pulling 5 Character Strings out of an Array of Text Files
by stevieb (Canon) on Dec 30, 2015 at 17:49 UTC

    Welcome to the Monastery, editholla!

    You don't specify clearly where the letter at the end of your string is coming from, so in this example, I've just hardcoded it in.

    I've made a few small changes, most notably using the three-argument form of open, and outputting the actual error message that was set if there is one ($!). I also changed from using a bareword file handle to a lexical one.

    After each file found is opened, we loop through the file line-by-line, then using a regex capture, if anything is found fitting the criteria we put the found string into a hash as its key, thus eliminating duplicates.

    use strict; use warnings; my $way = "test"; my (@rit, $rid); opendir $rid, $way; while ( my $entry = readdir $rid ) { next unless -f $way . '/' . $entry; next if $entry eq '.' or $entry eq '..'; push @rit, $entry; } closedir $rid; my %data; for (@rit){ my $cat = "$way/$_"; open my $fh, '<', $cat or die "Can't open $cat: $!"; while (my $line = <$fh>){ chomp $line; if ($line =~ /(4\w{3}A)/){ $data{$1}++; } } } for my $k (keys %data){ print "$k\n"; }

    Output:

    4060A 4099A
      I don't get any errors running your code but I also don't get any output. I have input my path to $way but made no other changes. Let me know if you have any insight. Thank you!