I'm not sure this is responsive, but I'm trying to ignore the fields other than those for which you give an example, in the surmise that that's your problem area:

#!/usr/bin/perl -w use 5.018; use strict; #1149716 =head I would like to extract a piece of data from one field that has multip +le fields in it. The original field is a long description that usuall +y contains a #F123456, #123456, #123-F123456, #123-123456, or #12AB-1 +23456 in it. This data floats around from left to right and there sho +uld be whitespace before the #. Also, the end of the data is either w +hitespace, or the end of the field. =cut my @data = ("TRAY HINGED PLSTC 20 CAV #F32473", "BOX HSC,35-3/4X17-1/4 X 50-1/2 SIMULATOR TALL BOX", "PAD, FOAM, 24 X 24 X 1/4 #16193 + 112 SHEETS PER ROLL, ORDER IN FULL ROLLS", "PKG LIST,ASST ARM,RAD,300 #F37784", "PAD, TOP CAP RE17-30048 #F30121 + CORRUGATED ASSEMBLY, 22-7/8 X 21-1/8 X 4-3/4", "foo bar #379460 best F11", "F1234 SIMULATION", ); for my $data (@data) { # say "\t|$data|\n\n"; chomp $data; if ( $data =~ /\n/ ) { $data =~ s/\n//g; } if ( $data =~ /(^.* #[A-Z]*\d+.*$)/m ) { say "\n\$data matches regex\n"; $data =~ s/ +/ /g; # clean up excess spaces say "$data \n"; } else { say "\n\t The data, $data, does NOT MATCH\n"; } }

The regular expression may be obscure: here's an explanation:

C:perl -MYAPE::Regex::Explain -e " print YAPE::Regex::Explain->new(qr/ +(^.* #[A-Z]*\d+.*$)/)->explain();" The regular expression: (?-imsx:(^.* #[A-Z]*\d+.*$)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: # NB: I did NOT need the parens as there's no use of the capture # My bad, but harmless except for shoving bits &amp bytes around # when they didn't need to be disturbed. ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- # ' #' ---------------------------------------------------------------------- [A-Z]* any character of: 'A' to 'Z' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- <p>And the output is thus:</p> <c>C:1149716.pl $data matches regex TRAY HINGED PLSTC 20 CAV #F32473 The data, BOX HSC,35-3/4X17-1/4 X 50-1/2 SIMULATOR TALL BOX, + does NOT MATCH $data matches regex PAD, FOAM, 24 X 24 X 1/4 #16193 112 SHEETS PER ROLL, ORDER IN FULL ROL +LS $data matches regex PKG LIST,ASST ARM,RAD,300 #F37784 $data matches regex PAD, TOP CAP RE17-30048 #F30121 CORRUGATED ASSEMBLY, 22-7/8 X 21-1/8 X + 4-3/4 $data matches regex foo bar #379460 best F11 The data, F1234 SIMULATION, does NOT MATCH

and here's the output of my code:

$data matches regex TRAY HINGED PLSTC 20 CAV #F32473 The data, BOX HSC,35-3/4X17-1/4 X 50-1/2 SIMULATOR TALL BOX, + does NOT MATCH $data matches regex PAD, FOAM, 24 X 24 X 1/4 #16193 112 SHEETS PER ROLL, ORDER IN FULL ROL +LS $data matches regex PKG LIST,ASST ARM,RAD,300 #F37784 $data matches regex PAD, TOP CAP RE17-30048 #F30121 CORRUGATED ASSEMBLY, 22-7/8 X 21-1/8 X + 4-3/4 $data matches regex foo bar #379460 best F11 The data, F1234 SIMULATION, does NOT MATCH

HTH. Sometimes you'll get better answers if you trim your code to the mere few (<20) lines that demonstrate only the problem you want to address. I see you want more than what's here in terms of advice on the code you supplied but don't have time to try to create jumbled CSV that would give a shot at assessing the efficiency and/or clarity.


In reply to Re: Extract data from CSV field. by ww
in thread Extract data from CSV field. by JobC

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.