in reply to Re^7: Easiest way to filter a file based on user input (updated)
in thread Easiest way to filter a file based on user input

Hi there, I've since downloaded and installed the Regexp::Common module. I've used it in my script as seen below.

When I run the script below and enter -3 I expect the script to filter my text file of all the lines beginning with 'None' or numbers which are greater than -3, leaving only lines with numbers equal to or less than -3

Here is an example of my data:
>hsa_circ_0067224|chr3:128345575-128345675-|NM_002950|RPN1 FORWARD -4.4 6 .. 17 xxxxxxxxxxGTGAC CAGT ATGC ACT +G AAGATGAGGTTTGTG -0.9 5 .. 18 xxxxxxxxxxxGTGA CCAGT ATGC ACT +GA AGATGAGGTTTGTGG None 1 .. 20 xxxxxxxxxxxxxxx GTGACCAGTATGCACT +GAAG ATGAGGTTTGTGGAC None 2 .. 21 xxxxxxxxxxxxxxG TGACCAGTATGCACTG +AAGA TGAGGTTTGTGGACC None 6 .. 25 xxxxxxxxxxGTGAC CAGTATGCACTGAAGA +TGAG GTTTGTGGACCATGT -2.3 5 .. 26 xxxxxxxxxxxGTGA C CAGTATGCACTGAAGA +TGAG G TTTGTGGACCATGTG -3.2 4 .. 27 xxxxxxxxxxxxGTG AC CAGTATGCACTGAAGA +TGAG GT TTGTGGACCATGTGT -1.9 3 .. 28 xxxxxxxxxxxxxGT GAC CAGTATGCACTGAAGA +TGAG GTT TGTGGACCATGTGTT

If I typed -3 I should be left with:

>hsa_circ_0067224|chr3:128345575-128345675-|NM_002950|RPN1 FORWARD -4.4 6 .. 17 xxxxxxxxxxGTGAC CAGT ATGC ACT +G AAGATGAGGTTTGTG -3.2 4 .. 27 xxxxxxxxxxxxGTG AC CAGTATGCACTGAAGA +TGAG GT TTGTGGACCATGTGT

So far it is only able to filter the 'None'. Shouldn't $RE{num}{real}{-places=>2} capture real & irrational numbers?

The script:
#!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /number/; print "Enter limit: "; chomp( my $limit = <STDIN> ); $limit = abs($limit); open my $IN, '<', "xt_spacer_results.hairpin" or die $!; open my $SIFTED, '>', "new_xt_spacer_results.hairpin" or die $!; while (<$IN>){ next if /^None/; next if /^($RE{num}{real}{-places=>2})/ && $1 > $limit; print $SIFTED $_; } close $IN; close $SIFTED;

Replies are listed 'Best First'.
Re^9: Easiest way to filter a file based on user input
by haukex (Archbishop) on Jul 16, 2017 at 09:29 UTC

    I added the line $limit = abs($limit); (see abs) because I wasn't sure of your original specification, as I asked in my post. Also, note that -places=N is documented as: "the number is assumed to have exactly N places after the radix point" and even goes on to show an example: "$RE{num}{real}{-places=>2} # matches 123.45 or -0.12", and your input isn't in that format. Take some time to look into the documentation and then try removing the line with abs, as well as "{-places=>2}" from the regex.

      Oh okay, apologies for the buffoonery on my part.

      The script seems to be working fine now, I added another next line: next if /^(\s\s-\d)/ && $1 > $limit;, because without it, it doesn't recognise regular Real numbers like -2, -5 etc.

      The script:

      #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /number/; print "Enter limit: "; chomp( my $limit = <STDIN> ); #$limit = abs($limit); open my $IN, '<', "xt_spacer_results.hairpin" or die $!; open my $SIFTED, '>', "new_xt_spacer_results.hairpin" or die $!; while (<$IN>){ next if /^None/; next if /^($RE{num}{real})/ && $1 > $limit; next if /^(\s\s-\d)/ && $1 > $limit; print $SIFTED $_; } close $IN; close $SIFTED;

      Haukex, you are a legend, thanks.

        it doesn't recognise regular Real numbers like -2, -5

        It does, here's a way you can test that (see e.g. How to ask better questions using Test::More and sample data):

        use warnings; use strict; use Test::More; use Regexp::Common qw/number/; like "-2", qr/^$RE{num}{real}$/; like "-5", qr/^$RE{num}{real}$/; like " -5", qr/^$RE{num}{real}$/; done_testing; __END__ ok 1 ok 2 not ok 3 # Failed test at ... 1..3 # Looks like you failed 1 test of 3.

        As you can see, the problem isn't that it doesn't match integers, it's the whitespace at the beginning of the line. Try changing

        next if /^($RE{num}{real})/ && $1 > $limit; next if /^(\s\s-\d)/ && $1 > $limit;

        to

        next if /^\s*($RE{num}{real})/ && $1 > $limit;

        Where \s* means "zero or more whitespace characters" (perlretut).