seek_m has asked for the wisdom of the Perl Monks concerning the following question:
HI friends,
I would like to read a text file and extract lines of text between 2 identical patterns in a file and put it into a new file?
For example: i want to get all the data between each "start" tag as mentioned below:Sample input:
start
this is a example linestart
this is a example 1 linestart
this is a example 2 linestart
this is a example 3 linestart
Sample output:
this is a example 1 line
inside a text file.this is a example 2 line
inside a text file 2. and so on. Thanks in advance for your help.Update:
closed...no content
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: extraction of text between 2 similar patterns in a text file
by marinersk (Priest) on May 16, 2011 at 15:40 UTC | |
You're looking to write a basic file splitter. So what have you tried thus far? We can help you when get stuck on Perl, but if what you're stuck on is software design, then you should probably look for softwaredesignmonks.org (In other words, we won't do your homework for you, but if you're stuck doing your homework, we can help you get unstuck. "help you" is not synonymous with "do your work for you".) For example: You need to read in one file and write a bunch of others, right? So, surely you've written code that reads the file in, right? Show us that code. update: Also, please advise if you would like comments on code style, or just code functionality. I won't waste your time if you don't want to hear about my opinion on "better" ways to do something you wrote which is, of its own right, functional to your needs. | [reply] |
by seek_m (Initiate) on May 16, 2011 at 15:58 UTC | |
Am a novice to programming..need strong basic inputs. Thanks, | [reply] [d/l] |
by roboticus (Chancellor) on May 16, 2011 at 16:29 UTC | |
Since you don't use indentation, it makes your code difficult to read. So the first thing I'd suggest is reformatting it so you can more easily see the structure. I'd next suggest using "use strict;" and "use warnings;" to help find possible problems (such as closing file handles you don't actually have open). Then I'd look at the bits you don't understand, and figure out what they mean. For example, what do you want the if statement on line 29 (if(/(\d\_)+a1/.../(\d\_)+a1/)) to do?. Since it governs when you actually emit files, it would be a good place to start. Finally, you may want to read about the perl debugger, as executing code under the debugger can be very helpful to both fixing your program as well as understanding how the code operates. ...roboticus When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
by Anonymous Monk on May 17, 2011 at 15:37 UTC | |
by roboticus (Chancellor) on May 17, 2011 at 21:38 UTC | |
by marinersk (Priest) on May 16, 2011 at 17:40 UTC | |
Excellent; some code to work with. Okay, some basic coding structure issues first (and a point or two on style; I'm sorry, I can't help myself). Then we address your issue.
With your code structure repaired, use strict; at your side, and the behavior of your regular expression confirmed with the test script, see if you can make the code work. And we'll all be here if you get stuck again. Just show us what you tried. The reformatted script:
| [reply] [d/l] [select] |
|
Re: extraction of text between 2 similar patterns in a text file
by Anonymous Monk on May 16, 2011 at 15:22 UTC | |
Pretend the text is in Chinese (or something else you can't understand), all the writing is on notepads and that you're so drunk you can't remember what you did 10 seconds ago. Replace "notepad" with "file" and the algorithm should now be easy to translate into perl. | [reply] |
|
Re: extraction of text between 2 similar patterns in a text file
by ww (Archbishop) on May 16, 2011 at 17:26 UTC | |
Then, please clarify what your sample output is telling us: on first reading I thought you wanted one output file with all the material except the "start" lines... but, on rereading, the phrase "inside a text file 2. and so on" (belatedly) caught my eye, making me wonder if you wanted each instance of the boldfaced material to go in a separate file (and no, I didn't bother to try to answer that question by studying the code you posted: it's not easily readable and less than readily comprehensible for reasons outlined by others). But, the Perl slogan "There is more than one way to do it" (aka TIMTOWTDI or TIMTOADY, etc)" is definitely a truism for your problem, however one is supposed to understand your question. and there are more. This site, including Super Search and Tutorials can help you get started with Perl (yes, I suspect the reply code is cargo-culted from somewhere without much understanding). So too can numerous on-line tutorials (for example, those at http://learn.perl.org/) but search this site a bid to see which are well-regarded and which are trash) and, finally, being very much "old-school," I also suggest ( tada!! ) "books" such as Learning Perl. Addendum: Your title, if carefully chosen, suggests you are thinking of the "this is a example \d line" as the "similar patterns," perhaps because those are what you want to capture. But for your purposes, think instead of what you want to discard; instances of "stop" (with or without blank lines preceding). It's easy enough to write a regex to deal with the "this is..." lines -- matching on any digit and capturing work just fine (when the regex is constructed correctly), but there's a lot less opportunity for error (in your example data) to match on the four letter word, "stop." | [reply] [d/l] [select] |