zivtaltul has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a new Perl user. I need help. I have a txt. file that contains many sections (that are NOT of one format) that are separated by a certain sign, say %. I need to create a database that for each of these sections extracts: 1. an e-mail address that is mentioned somewhere (random) in the section; 2. the number of words in the section; 3. The number of letters/characters in the section; 4. A number in the section that is likely to follow a certain string of letters. Can someone tell me how to do this? Thanks
  • Comment on extracting specific information from a txt file

Replies are listed 'Best First'.
Re: extracting specific information from a txt file
by BioLion (Curate) on Aug 24, 2010 at 09:30 UTC

    You'll find people are a lot more willing to help you if you show a few things:

    1. Code you have got so far
    2. (an example of the) input you used
    3. Error messages you got (Make sure you are using strict and warnings and if you are a beginner diagnostics can be helpful)
    4. Output you got (or output you want)
    These things will show you have at least made some effort, and so people will be much more likely to make some effort to help you. All of this is explained in How do I post a question effectively?.

    Other than that, there is a lot of info on the web, so make sure you have thouroughly searched (e.g. perlio for reading files, perlre for searching the information) etc...

    Updated: Updated pointer to a more positive place (Thanks to wfsp)

    Just a something something...
Re: extracting specific information from a txt file
by cdarke (Prior) on Aug 24, 2010 at 09:46 UTC
    For reading sections it is often useful to set $/ (or $INPUT_RECORD_SEPARATOR if you use English;) to the delimiter text.

    See perlvar.
Re: extracting specific information from a txt file
by apl (Monsignor) on Aug 24, 2010 at 12:22 UTC
    Write a small program that opens the text file, reads it line by line, and prints out each line read.

    Then modify it to count the number of characters in each line.

    Then modify it to count the number of words in each line.

    Then modify it to find a string containing @.

    Then modify it to generalize the search to look for an e-mail address (text@text.tla).

    Then modify it to look for a number after the e-mail address.

    Then modify it to read blocks of text separated by % (rather than by line).

    Show us the code that presents a problem to you, and we'll be glad to help.