in reply to Getting data from a file (Operons and Genes).

It's not at all clear what the bigger picture is, but if you want to subsequently do some sort of lookup to find the genes associated with a particular operon the generating a hash is more useful. Consider:

use strict; use warnings; my %operons; while (<DATA>) { chomp; next unless /^(\S+) \s+\S+\s+\S+\s+ (\S+)$/x; my ($operon, $targets) = ($1, $2); my @genes = split /,/, $targets; $operons{$operon} ||= {}; for (@genes) { my ($subOperon, $gene) = /([^|]*)\|(.*)/; $operons{$operon}{$subOperon} = $gene; } } for my $operon (sort keys %operons) { print "$operon: ", join (', ', map {"$_ ($operons{$operon}{$_})"} sort keys %{$op +erons{$operon}}), "\n"; } __DATA__ malS 1 forward malS|b3571, malT 1 forward malT|b3418, malXY 2 forward malX|b1621,malY|b1622, malZ 1 forward malZ|b0403, manA 1 forward manA|b1613, manXYZ 3 forward manX|b1817,manY|b1818,manZ|b1819, map-glnD-dapD 3 reverse dapD|b0166,glnD|b0167,map|b0168, marC 1 reverse marC|b1529, marRAB 3 forward marA|b1531,marB|b1532,marR|b1530, mbhA 1 forward mbhA|, mcrA 1 forward mcrA|b1159, mcrBC 2 reverse mcrB|b4346,mcrC|b4345, mdaB 1 forward mdaB|b3028, mdh 1 reverse mdh|b3236, mdlAB 2 forward mdlA|b0448,mdlB|b0449, mdoB 1 reverse mdoB|b4359, mdoC 1 reverse mdoC|b1047, mdoD 1 forward mdoD|b1424, mdoGH 2 forward mdoG|b1048,mdoH|b1049, mdtABCD-baeSR 6 forward baeR|b2079,baeS|b2078,mdtA|b2074,mdtB +|b2075,mdtC|b2076,mdtD|b2077, mdtEF 2 forward mdtE|b3513,mdtF|b3514, mdtG 1 reverse mdtG|b1053, mdtH 1 reverse mdtH|b1065, mdtJI 2 reverse mdtI|b1599,mdtJ|b1600, mdtK 1 forward mdtK|b1663, mdtL 1 forward mdtL|b3710, mdtM-yjiN 2 reverse mdtM|b4337,yjiN|b4336, mdtNOP 3 reverse mdtN|b4082,mdtO|b4081,mdtP|b4080, mdtQ 1 reverse mdtQ|b2138, melAB 2 forward melA|b4119,melB|b4120, melR 1 reverse melR|b4118, menA 1 reverse menA|b3930, menFD-yfbB-menBCE 6 reverse menB|b2262,menC|b2261,menD|b2264, +menE|b2260,menF|b2265,yfbB|b2263, metA 1 forward metA|b4013, metBL 2 forward metB|b3939,metL|b3940, metC 1 forward metC|b3008, metE 1 forward metE|b3829, metF 1 forward metF|b3941, metG 1 forward metG|b2114, metH 1 forward metH|b4019, metJ 1 reverse metJ|b3938, metK 1 forward metK|b2942, metNIQ 3 reverse metI|b0198,metN|b0199,metQ|b0197, metR 1 reverse metR|b3828, metT-leuW-glnUW-metU-glnVX 7 reverse glnU|b0670,glnV|b0665,gl +nW|b0668,glnX|b0664,leuW|b0672,metT|b0673,metU|b0666, metY-yhbC-nusA-infB-rbfA-truB-rpsO-pnp 8 reverse infB|b3168,m +etY|b3171,nusA|b3169,pnp|b3164,rbfA|b3167,rpsO|b3165,truB|b3166,yhbC­ +|b3170, metZWV 3 forward metV|b2816,metW|b2815,metZ|b2814, mfd 1 reverse mfd|b1114, mglBAC 3 reverse mglA|b2149,mglB|b2150,mglC|b2148, mgrB 1 reverse mgrB|b1826, mgsA 1 reverse mgsA|b0963, mgtA 1 forward mgtA|b4242,

Prints:

malS: malS (b3571) malT: malT (b3418) malXY: malX (b1621), malY (b1622) malZ: malZ (b0403) manA: manA (b1613) manXYZ: manX (b1817), manY (b1818), manZ (b1819) map-glnD-dapD: dapD (b0166), glnD (b0167), map (b0168) marC: marC (b1529) marRAB: marA (b1531), marB (b1532), marR (b1530) mbhA: mbhA () mcrA: mcrA (b1159) mcrBC: mcrB (b4346), mcrC (b4345) mdaB: mdaB (b3028) mdh: mdh (b3236) mdlAB: mdlA (b0448), mdlB (b0449) mdoB: mdoB (b4359) mdoC: mdoC (b1047) mdoD: mdoD (b1424) mdoGH: mdoG (b1048), mdoH (b1049) mdtABCD-baeSR: baeR (b2079), baeS (b2078), mdtA (b2074), mdtB (b2075), + mdtC (b2076), mdtD (b2077) mdtEF: mdtE (b3513), mdtF (b3514) mdtG: mdtG (b1053) mdtH: mdtH (b1065) mdtJI: mdtI (b1599), mdtJ (b1600) mdtK: mdtK (b1663) mdtL: mdtL (b3710) mdtM-yjiN: mdtM (b4337), yjiN (b4336) mdtNOP: mdtN (b4082), mdtO (b4081), mdtP (b4080) mdtQ: mdtQ (b2138) melAB: melA (b4119), melB (b4120) melR: melR (b4118) menA: menA (b3930) menFD-yfbB-menBCE: menB (b2262), menC (b2261), menD (b2264), menE (b22 +60), menF (b2265), yfbB (b2263) metA: metA (b4013) metBL: metB (b3939), metL (b3940) metC: metC (b3008) metE: metE (b3829) metF: metF (b3941) metG: metG (b2114) metH: metH (b4019) metJ: metJ (b3938) metK: metK (b2942) metNIQ: metI (b0198), metN (b0199), metQ (b0197) metR: metR (b3828) metT-leuW-glnUW-metU-glnVX: glnU (b0670), glnV (b0665), glnW (b0668), +glnX (b0664), leuW (b0672), metT (b0673), metU (b0666) metY-yhbC-nusA-infB-rbfA-truB-rpsO-pnp: infB (b3168), metY (b3171), nu +sA (b3169), pnp (b3164), rbfA (b3167), rpsO (b3165), truB (b3166), yh +bC­ (b3170) metZWV: metV (b2816), metW (b2815), metZ (b2814) mfd: mfd (b1114) mglBAC: mglA (b2149), mglB (b2150), mglC (b2148) mgrB: mgrB (b1826) mgsA: mgsA (b0963) mgtA: mgtA (b4242)

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Getting data from a file (Operons and Genes).
by chrisantha (Initiate) on Jun 15, 2007 at 08:44 UTC
    Thanks, That is precisely what I want, except when I run it, it does not get past this line. next unless /^(\S+) \s+\S+\s+\S+\s+ (\S+)$/; The lines don't match this, and so it skips the loop. This is very strange, considering you were able to get the output. Might it be a problem with how I sent the data to this website? Did the data format get changed? Yours, Chrisantha
Re^2: Getting data from a file (Operons and Genes).
by chrisantha (Initiate) on Jun 15, 2007 at 08:53 UTC
    Sorry, actually it works nicely. I was saving my file as RTF, not txt. I like the way you've done the next unless, I wouldn't have through of doing it that way. Yours, Chrisantha