Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: REGEX omit dashes - simple but ...

by kennethk (Abbot)
on Apr 04, 2016 at 17:14 UTC ( [id://1159516]=note: print w/replies, xml ) Need Help??


in reply to Re^2: REGEX omit dashes - simple but ...
in thread REGEX omit dashes - simple but ...

There isn't a strong difference between strings and numbers in Perl. See Context tutorial, in particular More flavors of scalars, and/or http://stuff.mit.edu/iap/perl/slides/context_numeric.html.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^4: REGEX omit dashes - simple but ...
by wrkrbeee (Scribe) on Apr 04, 2016 at 17:17 UTC

    Thanks guys, here's the revised statement, along with the result: if($line=~m/^\s*ACCESSION\s*NUMBER:\s*/m){$access_num=$1; $access_num =~ tr/-//d;} Result is the error message stating "Use of uninitialized value ...." ?? Thanks!!!

      You forgot to add your capture group of (\d*). However, that won't really help, as that won't capture your - (dashes), or any numbers after it.

      Why don't you show us a few lines of example data you're trying to match?

      Also, Use of uninitialized... is not an error, it's a warning. It's most likely saying that $1 is uninitialized (because you didn't capture anything).

      You are getting an uninitialized error because you are trying to change the content of $access_num, to which you've assigned $1, but there were no parentheses in your first regular expression. Maybe you mean something like:
      if ($line =~ s/^\s*ACCESSION\s*NUMBER:\s*/) { $line =~ tr/-//; }
      or possibly
      if ($line =~ s/^\s*ACCESSION\s*NUMBER:\s*([-\d]+)$/m) { $access_num = $1; $access_num =~ tr/-//; }

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re^4: REGEX omit dashes - simple but ...
by wrkrbeee (Scribe) on Apr 04, 2016 at 17:29 UTC

    Examples of the input data: 0001144204-09-017358 0001144204-10-065610 0001042167-15-000175 0000053669-16-000051 Thanks!

      If you get the strings like you say are there, then you can use them as numbers. In Perl, you don't have to call a function to convert a string to a number, if that string is a number, you can just use it like one. Here I just added 10 to the "string" to show that feature. Of course once "$string" is a "number", leading zero'es are suppressed unless you use some kind of printf statement to add them back into the printout. A common idiom to suppress leading zeroes is $number_string+=0;
      #!usr/bin/perl use warnings; use strict; my @input = qw /0001144204-09-017358 0001144204-10-065610 0001042167-15-000175 0000053669-16-000051 /; foreach my $string (@input) { $string =~ tr/-//d; print "string = $string\n"; print "string +10 as number: ", $string + 10,"\n"; } __END__ prints: string = 000114420409017358 string +10 as number: 114420409017368 string = 000114420410065610 string +10 as number: 114420410065620 string = 000104216715000175 string +10 as number: 104216715000185 string = 000005366916000051 string +10 as number: 5366916000061
      Update: I ran this on Win XP, 32 bit.
      normally, 2,147,483,647 would be max int, but Perl 5.22 was able to get 104,216,715,000,185 from the addition.
      Those are the strings you are transforming, but it looks like you are struggling on extracting the your lines. What do your literal lines look like?

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Not completely sure what 'literal lines" should mean to me, but ....

        #!/usr/bin/perl -w use strict; use warnings; use File::stat; use lib "c:/strawberry/perl/site/lib"; #This program will extract the header information in 10K and 10Q filin +gs #as well as file sizes. #Specify the directory containing the files that you want to read; #my $files_dir = 'C:\Rick Francis\Data\SEC Filings\Filing Doc'; my $files_dir = 'E:\research\audit fee models\filings\Test'; #Specify the directory containing the results/output; #my $write_dir = 'C:\Rick Francis\Data\SEC Filings\Header Data\Revised +\DataTest.txt'; my $write_dir = 'E:\research\audit fee models\filings\filenames\filen +ames.txt'; #Open the directory containing the files you plan to read; opendir(my $dir_handle, $files_dir) or die "Can't open directory $!"; #Initialize file counter variable; my $file_count = 0; #Loop for reading each file in the input directory; while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Processing $filename\n"; #Initialize the variable names. my $line_count=0; my $access_num=-99; my $cik=-99; my $form_type=""; my $form=""; my $report_date=-99; my $file_date=-99; my $name=""; #my $sic=-99; #my $sic1=-99; my $file_name=""; my $htm=""; my $url=""; my $slash='/'; #Open the input file; open my $FH_IN, '<',$files_dir.'/'.$filename or die "Can't open $filen +ame"; #Within the file loop, read each line of the current file; while (my $line = <$FH_IN>) { next unless -f $files_dir.'/'.$filename; if ($line_count > 500000) { last;} #The following steps obtain basic data from various lines in the file; if($line=~m/^\s*ACCESSION\s*NUMBER:\s*/m){$access_num=$1; $access_nu +m =~ tr/-//d;}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1159516]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-28 11:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found