http://qs1969.pair.com?node_id=285872

bioinformatics has asked for the wisdom of the Perl Monks concerning the following question:

Hello Friends!!!
What is the best way to pattern match an unknown pattern? Allow me to explain... I have a file that contains a series of data values (microarray probe sets to be specific) that I need to sort through. Technically, there should be 11 "probes" for each target (ex. 154115_at=target name), but there are not. So, since there is a commonality between these probes (the target name), I need to be able to sort through the file and have the program take the target name value from the first line, compare it to succesive lines until one doesn't match. (The matching data needs to be further parsed and put on one line tab delimited, but I know how to do that.)When that occurs, the mismatched data needs to become the new pattern to be compared to. I'm familiar with pattern matching. However, I don't know how to designate an "unknown" pattern in perl, since I can't go and write 22,000 some-odd patterns:-). A sample imput file:
>probe:MOE430A:1415670_at(target name):549:177; Interrogation_Position +=2436; Antisense; GGCTGATCACATCCAAAAAGTCATG(probe sequence) >probe:MOE430A:1415670_at:549:177; Interrogation_Position=2513; Antise +nse; GAGGAAACGTTCACCCTGTCTACTA >probe:MOE430A:1415670_at:467:433; Interrogation_Position=2521; Antise +nse; GTTCACCCTGTCTACTATCAAGACA >probe:MOE430A:1415670_at:254:643; Interrogation_Position=2533; Antise +nse; TACTATCAAGACACTCGAAGAGGCT >probe:MOE430A:1415670_at:54:269; Interrogation_Position=2556; Antisen +se; CTGTGGGCAATATTGTGAAGTTCCT >probe:MOE430A:1415670_at:405:339; Interrogation_Position=2583; Antise +nse; GAATGCATCCTTGTGAGAGGTCAGA >probe:MOE430A:1415670_at:60:395; Interrogation_Position=2597; Antisen +se; GAGAGGTCAGACAAAGTGCCAGAAA >probe:MOE430A:1415670_at:284:165; Interrogation_Position=2619; Antise +nse; AAAACAAGAACACCCACACGCTGCT >probe:MOE430A:1415670_at:622:145; Interrogation_Position=2634; Antise +nse; ACACGCTGCTGCTAGCTGGAGTATT >probe:MOE430A:1415670_at:291:661; Interrogation_Position=2804; Antise +nse; TATCTTGTCCAACACTACGTCGAAG >probe:MOE430A:1415670_at:146:701; Interrogation_Position=2956; Antise +nse; TTGTCACCATGCCTGCAAGGAGAGA >probe:MOE430A:1415671_at:116:525; Interrogation_Position=1156; Antise +nse; GGAACAGGAATGTCGCAACATCGTA >probe:MOE430A:1415671_at:655:137; Interrogation_Position=1173; Antise +nse; ACATCGTATGGATTGCTGAGTGCAT >probe:MOE430A:1415671_at:398:139; Interrogation_Position=1232; Antise +nse;
Any help is most appreciated!
Bioinformatics