allison has asked for the wisdom of the Perl Monks concerning the following question:

For a given file
Sample PP: (899040) Jane Smith Output PP: 899040 Jane Smith
my $filename = "filename"; open my $file, '<', $filename; @fileinput=<$file>; close($file); foreach $line(@fileinput) { my $test=($line); if($test=~s/\(|\)//g) { print $test; }

it remove the parenthesis but it also remove the parenthesis of a non digits. when I Tried to use the code

if($test=~s/\([\d.+]|[\d.+]\)//g)
the output appeared to be PP: 8945, it didn''t get the 6 digits number
How do I get the numeric values without the parenthesis and add that with the tag 'ID:' in the file to display?

Replies are listed 'Best First'.
Re: Extracting the number in a file and get the value and output to the other file
by GrandFather (Saint) on Mar 20, 2011 at 21:29 UTC

    I find it often helps to write small test scripts to figure out how to solve a specific problem. They often look something like:

    use strict; use warnings; while (<DATA>) { next if ! /PP: \((\d+)\)/; print "Matched $1\n"; } __DATA__ PP: (899040) Jane Smith

    where the __DATA__ section contains a selection of different lines to exercise the match.

    True laziness is hard work
Re: Extracting the number in a file and get the value and output to the other file
by toolic (Bishop) on Mar 20, 2011 at 22:48 UTC
    If all you're trying to do is delete all parentheses from a line, you can use a character class (perlre):
    use warnings; use strict; while (<DATA>) { s/[()]//g; print; } __DATA__ PP: (899040) Jane Smith

    prints:

    PP: 899040 Jane Smith

    Update: similarly, you can use tr:

    tr/()//d;
Re: Extracting the number in a file and get the value and output to the other file
by ww (Archbishop) on Mar 20, 2011 at 21:57 UTC

    Nit: the "8945" in your last code block, "the output...6 digits number" doesn't seem to bear any direct relationship to the data sample you provided. It's not crucial to our understanding -- in this case! -- but generally, you do need to be careful to cut and paste accurately to avoid sending us off on wild goose chases.

    More substantially, the regex, $test=~s/\([\d.+]|[\d.+]\)//g often won't do what you appear to expect, as (ignoring the attempt to match parentheses) it matches any digit, followed by one_or_more of anything (digit or not). For help on that -- in other words, to see why you got two sets of two digits -- reread about quantifiers.

    And, my suggestion would be that you'll learn more by doing so before reading the below:

Re: Extracting the number in a file and get the value and output to the other file
by Marshall (Canon) on Mar 21, 2011 at 00:54 UTC
    You only gave one sample input line and then showed some code that admittedly doesn't do what you want - that's of course why you are asking a question. I would suggest that you show a number of test cases demonstrating: a) what you want and b)what your code actually does. One example is usually just not enough to describe the entire behavior that you want.

    Often combining split() and regex is very powerful. And can result in a regex that is easier to write and understand. The code below essentially says: delete the parens (if any) around the 2nd thing in the line where "things" are sequences of characters separated by spaces. This may or may not be all that you need, but without more than one test case, I have no way to determine that.

    Also there can be reasons why it is desirable to "slurp" all of the data in a file into a memory resident array as in @fileinput=<$file>;, but in general when processing files line by line, this is not a good idea as it wastes memory and is slower than: read line, process line, read next line, etc. Also when doing a file operation, like "open" check the status of the open - it is very possible that an "open()" can fail: file does not exits, wrong permissions, etc.

    As Grandfather points out, DATA is an already open file handle that can read data from within your Perl code as shown below. This is extremely useful for testing.

    #!/usr/bin/perl -w use strict; while (<DATA>) { my ($id, $num, $rest) = split(/\s+/, $_, 3); $num =~ tr/()//d; #deletes parens in num field print "$id $num $rest"; } #prints: #PP: 899040 Jane Smith #PP: 899040 Jane (Smith) #PP: 899040 Jane (Smith) __DATA__ PP: (899040) Jane Smith PP: 899040 Jane (Smith) PP: (899040) Jane (Smith)