Regex problem

Win has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm having problems getting a Perl regex to pick out the following figures in this excel spread sheet record (as text file).

00CEFA0001    0.973694291    0.013140314    0    0    0    0    0    0
+    0    0    0.003278308    0    0    0    0    0    0    0    0    
+0    0    0    0    0    0    0    0    0    0    0    0    0    0   
+ 0    0    0    0    0    0.006569697    0    0    0.00331739    0   
+ 0    0    0
[download]

My efforts have been along the lines of :

 if ($line =~ /^0.{9}(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[
+0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20
+})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9
+]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(
+\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1
+,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[
+0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20
+})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9
+]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(
+\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1
+,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})(\t[0-9]{1,20})/)  
+{
[download]

Comment on Regex problem Select or Download Code

Replies are listed 'Best First'.
Re: Regex problem by ikegami (Patriarch) on Oct 05, 2005 at 16:05 UTC
Maybe the following would be more useful: `my @fields = split("\t", $_, -1);` [download] Then you can perform checks on individual fields, if you want. It'll be more readable and maintainable.	[reply] [d/l]
Re: Regex problem by ides (Deacon) on Oct 05, 2005 at 16:13 UTC
You might also want to look into the following CPAN modules: Text::CSV_XS Spreadsheet::ParseExcel Both of them should make your life easier. Frank Wiles <frank@wiles.org> http://www.wiles.org	[reply]
Re: Regex problem by Perl Mouse (Chaplain) on Oct 05, 2005 at 16:10 UTC
You are not clear what you want, nor what your problems are, but to me it looks like you're checking whether the file starts with a 0, and then from character 10 onwards, whether it contains numbers and tabs, with no more than 20 numbers in a row, and no more than 2 tabs in a row either. Can't you just do something like: `if ($line =~ /^0/) { my $end = substr ($line, 10); unless ($end =~ /[^0-9\t]/ \|\| $end =~ /\t\t/ \|\| $end =~ /[0-9]{21}/) { .... } }` [download] `Perl --((8:>*`	[reply] [d/l]
Re: Regex problem by Skeeve (Parson) on Oct 05, 2005 at 16:27 UTC
`if ($line =~ /^0.{9}(\t\d{1,20}){49}/) {` [download] Will do the same as your lengthy regex, except, that it doesn't grep all the individual numbers. OTOH our REs won't match the line at all. I think, this one will serve the purpose better `if (/^0[0-9a-fA-F]{9}(?:\t[\d\.]{1,20}){49}/ {` [download] Okay: This will also match (e.g.) IP Adresses, as i doesn't take into account, that a number may only contain 1 decimal point, but maybe it's okay. You're the only one who can tell. `$\=~s;s.;q^\|D9JYJ^^qq^\//\\\///^;ex;print`	[reply] [d/l] [select]
Re: Regex problem by bioMan (Beadle) on Oct 05, 2005 at 16:43 UTC
I see there's a lot of repetition. `if ($line =~ /^0.{9} (\t[0-9]{1,20}) etc. etc. etc.` [download] Why not slurp the file into a scalar (Perl Slurp-eaze), split the file at the tabs, and send the data to an array. `my @excelData = split /\t/, $slurpedExcelFile;` [download] Update - Oops! As noted out by ikegami the split statement should read: `my @excelData = split "\t", $slurpedExcelFile;` [download] Update - Aaaaaaaa! A little testing shows both "/t" and /\t/ will work with split. If you need to check the individual values create a simpler regex and apply it to each array item. `my @numbers = grep /# your regex/, @excelData;` [download] Mike "I need more cow bell!"	[reply] [d/l] [select]