garbage777 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file that contains thousands of very long lines like the below single line:
SITENAME AIRLIKEONE CATEGORY GENERAL; CODENAME AIRLIKEONE 01000 02000; + ORIGIN 14000 14000; SIZE 5600sqft; SYSTEMETRY 4x 2y; SITE ASIA; SPIN + S DIRECTION OUTPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 4.850 + 14395 5.020 111045; END StaircaseArea 101; END S SPIN CO DIRECTION O +UTPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 5.390 14920 5.450 1 +11315; PHYSICALDIMENSIONS 5.390 14390 5.450 14535; PHYSICALDIMENSIONS + 5.450 14390 5.550 111315; END AntennaDiffArea 14101; END CO SPIN CI +DIRECTION INPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 387265 14 +795 387550 14990; END AntennaGateArea 140516; END CI SPIN B DIRECTION + INPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 14155 14610 14350 +14825; END AntennaGateArea 140534; END B SPIN A DIRECTION INPUT; SPOR +T COLORMAPLAYER M1; PHYSICALDIMENSIONS 111630 14650 111990 14790; END + AntennaGateArea 140492; END A SPIN VSS DIRECTION INOUT; USE GROUND; +SHAPE ABUTMENT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 14000 -0.0 +70 14310 14070; PHYSICALDIMENSIONS 14310 -0.070 14470 14340; PHYSICAL +DIMENSIONS 14470 -0.070 111650 14070; PHYSICALDIMENSIONS 111650 -0.07 +0 111750 14360; PHYSICALDIMENSIONS 111750 -0.070 387930 14070; PHYSIC +ALDIMENSIONS 387930 -0.070 4.100 14155; PHYSICALDIMENSIONS 4.100 -0.0 +70 5.125 14070; PHYSICALDIMENSIONS 5.125 -0.070 5.275 14545; PHYSICAL +DIMENSIONS 5.275 -0.070 5.600 14070; END END VSS SPIN VDD DIRECTION I +NOUT; USE POWDER; SHAPE ABUTMENT; SPORT COLORMAPLAYER M1; PHYSICALDIM +ENSIONS 14000 111530 14305 111670; PHYSICALDIMENSIONS 14305 111365 14 +445 111670; PHYSICALDIMENSIONS 14445 111530 111765 111670; PHYSICALDI +MENSIONS 111765 111445 111925 111670; PHYSICALDIMENSIONS 111925 11153 +0 387490 111670; PHYSICALDIMENSIONS 387490 111440 387650 111670; PHYS +ICALDIMENSIONS 387650 111530 5.125 111670; PHYSICALDIMENSIONS 5.125 1 +11315 5.285 111670; PHYSICALDIMENSIONS 5.285 111530 5.600 111670; END + END VDD BOUNDARYWALL COLORMAPLAYER M1; PHYSICALDIMENSIONS 5.285 1464 +5 5.355 14805; PHYSICALDIMENSIONS 5.195 14645 5.285 111225; PHYSICALD +IMENSIONS 4.970 111135 5.195 111225; PHYSICALDIMENSIONS 4.880 111135 +4.970 111440; PHYSICALDIMENSIONS 387835 111350 4.880 111440; PHYSICAL +DIMENSIONS 4.660 14365 4.750 111245; PHYSICALDIMENSIONS 4.560 14365 4 +.660 14535; PHYSICALDIMENSIONS 4.620 111085 4.660 111245; PHYSICALDIM +ENSIONS 4.440 14630 4.570 14790; PHYSICALDIMENSIONS 4.470 14245 4.560 + 14535; PHYSICALDIMENSIONS 387840 14245 4.470 14335; PHYSICALDIMENSIO +NS 4.380 14630 4.440 111125; PHYSICALDIMENSIONS 4.290 14445 4.380 111 +125; PHYSICALDIMENSIONS 4.020 14445 4.290 14535; PHYSICALDIMENSIONS 4 +.105 14645 4.195 111260; PHYSICALDIMENSIONS 387920 14645 4.105 14735; + PHYSICALDIMENSIONS 387925 111100 4.105 111260; PHYSICALDIMENSIONS 38 +7740 14850 4.015 111010; PHYSICALDIMENSIONS 387830 14425 387920 14735 +; PHYSICALDIMENSIONS 387750 14160 387840 14335; PHYSICALDIMENSIONS 38 +7745 111260 387835 111440; PHYSICALDIMENSIONS 387660 14425 387830 145 +15; PHYSICALDIMENSIONS 243445 14160 387750 14250; PHYSICALDIMENSIONS +387135 111260 387745 111350; PHYSICALDIMENSIONS 387650 14615 387740 1 +11170; PHYSICALDIMENSIONS 387570 14340 387660 14515; PHYSICALDIMENSIO +NS 387435 14615 387650 14705; PHYSICALDIMENSIONS 387225 111080 387650 + 111170; PHYSICALDIMENSIONS 243695 14340 387570 14430; PHYSICALDIMENS +IONS 387275 14530 387435 14705; PHYSICALDIMENSIONS 387135 14525 38717 +5 14620; PHYSICALDIMENSIONS 387045 14525 387135 111405; PHYSICALDIMEN +SIONS 387015 14525 387045 14620; PHYSICALDIMENSIONS 243755 111315 387 +045 111405; PHYSICALDIMENSIONS 243910 14705 243955 111225; PHYSICALDI +MENSIONS 243865 14520 243910 111225; PHYSICALDIMENSIONS 243785 14520 +243865 14795; PHYSICALDIMENSIONS 243655 111135 243865 111225; PHYSICA +LDIMENSIONS 243695 14885 243775 111045; PHYSICALDIMENSIONS 243605 143 +40 243695 111045; PHYSICALDIMENSIONS 243495 111135 243655 111355; PHY +SICALDIMENSIONS 243395 14955 243605 111045; PHYSICALDIMENSIONS 243170 + 14675 243515 14835; PHYSICALDIMENSIONS 111480 111265 243495 111355; +PHYSICALDIMENSIONS 243285 14160 243445 14565; PHYSICALDIMENSIONS 2432 +60 14955 243395 111175; PHYSICALDIMENSIONS 111940 14160 243285 14250; + PHYSICALDIMENSIONS 111360 111085 243260 111175; PHYSICALDIMENSIONS 2 +43080 14415 243170 14995; PHYSICALDIMENSIONS 243040 14415 243080 1457 +5; PHYSICALDIMENSIONS 111540 14895 243080 14995; PHYSICALDIMENSIONS 1 +11850 14160 111940 14550; PHYSICALDIMENSIONS 111555 14460 111850 1455 +0; PHYSICALDIMENSIONS 111465 14210 111555 14550; PHYSICALDIMENSIONS 1 +11450 14750 111540 14995; PHYSICALDIMENSIONS 111390 111265 111480 111 +440; PHYSICALDIMENSIONS 14900 14210 111465 14300; PHYSICALDIMENSIONS +111370 14750 111450 14840; PHYSICALDIMENSIONS 14625 111350 111390 111 +440; PHYSICALDIMENSIONS 111280 14425 111370 14840; PHYSICALDIMENSIONS + 111270 14945 111360 111175; PHYSICALDIMENSIONS 111190 14945 111270 1 +11035; PHYSICALDIMENSIONS 111100 14415 111190 111035; PHYSICALDIMENSI +ONS 14810 111150 111180 111260; PHYSICALDIMENSIONS 14990 14415 111100 + 14615; PHYSICALDIMENSIONS 14900 14705 111010 111045; PHYSICALDIMENSI +ONS 14810 14210 14900 14795; PHYSICALDIMENSIONS 14720 14910 14810 111 +260; PHYSICALDIMENSIONS 14630 14375 14720 111045; PHYSICALDIMENSIONS +14540 111185 14625 111440; PHYSICALDIMENSIONS 14535 14430 14540 11144 +0; PHYSICALDIMENSIONS 14450 14430 14535 111275; PHYSICALDIMENSIONS 14 +210 14430 14450 14520; PHYSICALDIMENSIONS 14210 111185 14450 111275; +PHYSICALDIMENSIONS 14050 14210 14210 14520; PHYSICALDIMENSIONS 14050 +14915 14210 111275; COLORMAPLAYER LVT; PHYSICALDIMENSIONS 14000 14000 + 5.600 111600; END END AIRLIKEONE

The above data is all in one line beginning at SITENAME AIRLIKEONE and ending at END AIRLIKEONE.
From this data, I need to extract from  SPORT to END everytime this pattern occurs.
How do I do this with Regular Expressions?
J.

Replies are listed 'Best First'.
Re: Regular Expression help
by toolic (Bishop) on Nov 05, 2009 at 19:46 UTC
    use strict; use warnings; while (<DATA>) { while (/(\bSPORT\b.*?\bEND\b)/g) {print "$1\n"} } __DATA__ your long line here
    perlrequick, perlretut
      Which "SPORT" and which "END"?
      Won't the greedy flag give him every thing between the first SPORT and the last END as a single string?
        It's not a 'greedy' flag. Regular expressions are greedy by default--you don't need a flag for that. The proposed solution actually turns off the greedy nature of a regex by using a non-greedy match: .*?
        What happened when you ran the code I posted using the OP's data?
Re: Regular Expression help
by Anonymous Monk on Nov 06, 2009 at 08:12 UTC
    #!/usr/bin/perl -- use strict; use warnings; Main(@ARGV); exit(0); sub Main { my $data = ParseC( \*DATA ); ProcessC($data); } sub ProcessC { my $Data = shift; use Data::Dumper; print Data::Dumper->new( [$Data] )->Indent(1)->Dump, "\n"; die "do something interesting\n"; } ## end sub ProcessC sub ParseC { my $Fh = shift; my @Data; local $/ = '; '; while (<$Fh>) { #~ warn "($_)"; chomp; #~ print "($_) $$Fh chunk $.\n"; if (/(^\S+)\s(.*)/) { push @Data, [ $1, $2 ]; } } ## end while (<$Fh>) return \@Data; } ## end sub ParseC __DATA__ SITENAME AIRLIKEONE CATEGORY GENERAL; CODENAME AIRLIKEONE 01000 02000; + ORIGIN 14000 14000; SIZE 5600sqft; SYSTEMETRY 4x 2y; SITE ASIA; SPIN + S DIRECTION OUTPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 4.850 + 14395 5.020 111045; END StaircaseArea 101; END S SPIN CO DIRECTION O +UTPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 5.390 14920 5.450 1 +11315; PHYSICALDIMENSIONS 5.390 14390 5.450 14535; PHYSICALDIMENSIONS + 5.450 14390 5.550 111315; END AntennaDiffArea 14101; END CO SPIN CI +DIRECTION INPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 387265 14 +795 387550 14990; END AntennaGateArea 140516; END CI SPIN B DIRECTION + INPUT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 14155 14610 14350 +14825; END AntennaGateArea 140534; END B SPIN A DIRECTION INPUT; SPOR +T COLORMAPLAYER M1; PHYSICALDIMENSIONS 111630 14650 111990 14790; END + AntennaGateArea 140492; END A SPIN VSS DIRECTION INOUT; USE GROUND; +SHAPE ABUTMENT; SPORT COLORMAPLAYER M1; PHYSICALDIMENSIONS 14000 -0.0 +70 14310 14070; PHYSICALDIMENSIONS 14310 -0.070 14470 14340; PHYSICAL +DIMENSIONS 14470 -0.070 111650 14070; PHYSICALDIMENSIONS 111650 -0.07 +0 111750 14360; PHYSICALDIMENSIONS 111750 -0.070 387930 14070; PHYSIC +ALDIMENSIONS 387930 -0.070 4.100 14155; PHYSICALDIMENSIONS 4.100 -0.0 +70 5.125 14070; PHYSICALDIMENSIONS 5.125 -0.070 5.275 14545; PHYSICAL +DIMENSIONS 5.275 -0.070 5.600 14070; END END VSS SPIN VDD DIRECTION I +NOUT; USE POWDER; SHAPE ABUTMENT; SPORT COLORMAPLAYER M1; PHYSICALDIM +ENSIONS 14000 111530 14305 111670; PHYSICALDIMENSIONS 14305 111365 14 +445 111670; PHYSICALDIMENSIONS 14445 111530 111765 111670; PHYSICALDI +MENSIONS 111765 111445 111925 111670; PHYSICALDIMENSIONS 111925 11153 +0 387490 111670; PHYSICALDIMENSIONS 387490 111440 387650 111670; PHYS +ICALDIMENSIONS 387650 111530 5.125 111670; PHYSICALDIMENSIONS 5.125 1 +11315 5.285 111670; PHYSICALDIMENSIONS 5.285 111530 5.600 111670; END + END VDD BOUNDARYWALL COLORMAPLAYER M1; PHYSICALDIMENSIONS 5.285 1464 +5 5.355 14805; PHYSICALDIMENSIONS 5.195 14645 5.285 111225; PHYSICALD +IMENSIONS 4.970 111135 5.195 111225; PHYSICALDIMENSIONS 4.880 111135 +4.970 111440; PHYSICALDIMENSIONS 387835 111350 4.880 111440; PHYSICAL +DIMENSIONS 4.660 14365 4.750 111245; PHYSICALDIMENSIONS 4.560 14365 4 +.660 14535; PHYSICALDIMENSIONS 4.620 111085 4.660 111245; PHYSICALDIM +ENSIONS 4.440 14630 4.570 14790; PHYSICALDIMENSIONS 4.470 14245 4.560 + 14535; PHYSICALDIMENSIONS 387840 14245 4.470 14335; PHYSICALDIMENSIO +NS 4.380 14630 4.440 111125; PHYSICALDIMENSIONS 4.290 14445 4.380 111 +125; PHYSICALDIMENSIONS 4.020 14445 4.290 14535; PHYSICALDIMENSIONS 4 +.105 14645 4.195 111260; PHYSICALDIMENSIONS 387920 14645 4.105 14735; + PHYSICALDIMENSIONS 387925 111100 4.105 111260; PHYSICALDIMENSIONS 38 +7740 14850 4.015 111010; PHYSICALDIMENSIONS 387830 14425 387920 14735 +; PHYSICALDIMENSIONS 387750 14160 387840 14335; PHYSICALDIMENSIONS 38 +7745 111260 387835 111440; PHYSICALDIMENSIONS 387660 14425 387830 145 +15; PHYSICALDIMENSIONS 243445 14160 387750 14250; PHYSICALDIMENSIONS +387135 111260 387745 111350; PHYSICALDIMENSIONS 387650 14615 387740 1 +11170; PHYSICALDIMENSIONS 387570 14340 387660 14515; PHYSICALDIMENSIO +NS 387435 14615 387650 14705; PHYSICALDIMENSIONS 387225 111080 387650 + 111170; PHYSICALDIMENSIONS 243695 14340 387570 14430; PHYSICALDIMENS +IONS 387275 14530 387435 14705; PHYSICALDIMENSIONS 387135 14525 38717 +5 14620; PHYSICALDIMENSIONS 387045 14525 387135 111405; PHYSICALDIMEN +SIONS 387015 14525 387045 14620; PHYSICALDIMENSIONS 243755 111315 387 +045 111405; PHYSICALDIMENSIONS 243910 14705 243955 111225; PHYSICALDI +MENSIONS 243865 14520 243910 111225; PHYSICALDIMENSIONS 243785 14520 +243865 14795; PHYSICALDIMENSIONS 243655 111135 243865 111225; PHYSICA +LDIMENSIONS 243695 14885 243775 111045; PHYSICALDIMENSIONS 243605 143 +40 243695 111045; PHYSICALDIMENSIONS 243495 111135 243655 111355; PHY +SICALDIMENSIONS 243395 14955 243605 111045; PHYSICALDIMENSIONS 243170 + 14675 243515 14835; PHYSICALDIMENSIONS 111480 111265 243495 111355; +PHYSICALDIMENSIONS 243285 14160 243445 14565; PHYSICALDIMENSIONS 2432 +60 14955 243395 111175; PHYSICALDIMENSIONS 111940 14160 243285 14250; + PHYSICALDIMENSIONS 111360 111085 243260 111175; PHYSICALDIMENSIONS 2 +43080 14415 243170 14995; PHYSICALDIMENSIONS 243040 14415 243080 1457 +5; PHYSICALDIMENSIONS 111540 14895 243080 14995; PHYSICALDIMENSIONS 1 +11850 14160 111940 14550; PHYSICALDIMENSIONS 111555 14460 111850 1455 +0; PHYSICALDIMENSIONS 111465 14210 111555 14550; PHYSICALDIMENSIONS 1 +11450 14750 111540 14995; PHYSICALDIMENSIONS 111390 111265 111480 111 +440; PHYSICALDIMENSIONS 14900 14210 111465 14300; PHYSICALDIMENSIONS +111370 14750 111450 14840; PHYSICALDIMENSIONS 14625 111350 111390 111 +440; PHYSICALDIMENSIONS 111280 14425 111370 14840; PHYSICALDIMENSIONS + 111270 14945 111360 111175; PHYSICALDIMENSIONS 111190 14945 111270 1 +11035; PHYSICALDIMENSIONS 111100 14415 111190 111035; PHYSICALDIMENSI +ONS 14810 111150 111180 111260; PHYSICALDIMENSIONS 14990 14415 111100 + 14615; PHYSICALDIMENSIONS 14900 14705 111010 111045; PHYSICALDIMENSI +ONS 14810 14210 14900 14795; PHYSICALDIMENSIONS 14720 14910 14810 111 +260; PHYSICALDIMENSIONS 14630 14375 14720 111045; PHYSICALDIMENSIONS +14540 111185 14625 111440; PHYSICALDIMENSIONS 14535 14430 14540 11144 +0; PHYSICALDIMENSIONS 14450 14430 14535 111275; PHYSICALDIMENSIONS 14 +210 14430 14450 14520; PHYSICALDIMENSIONS 14210 111185 14450 111275; +PHYSICALDIMENSIONS 14050 14210 14210 14520; PHYSICALDIMENSIONS 14050 +14915 14210 111275; COLORMAPLAYER LVT; PHYSICALDIMENSIONS 14000 14000 + 5.600 111600; END END AIRLIKEONE
Re: Regular Expression help
by Anonymous Monk on Nov 06, 2009 at 03:24 UTC
    Hi toolic,
    I also want to capture the data before SPORT and after END
    J.
      Which SPORT and which END?!
        Hi 7stud,
        Every occurence of SPORT and END.
        J.