gatogato has asked for the wisdom of the Perl Monks concerning the following question:

<p>Hello again and many thanks for the suggestions <p>I'm new to everything, and would appreciate some help in formatting + text. given this (this is just a sample out of 100 lines) I should b +e using regex: <p>This is basically a bilingual dictionary entry, where under column +X we should see the word in English, under Y in Portuguese, and unde +r Z whether this is a noun, verb, etc. In bold type I'm showing the w +ords that should only be added to the columns, disregard the rest. No +t looking for someone to do my job, just some help on how to continue +. <P>this is what I was trying: <H4>@entry[0..2]= split /\d+/, $_;<br> print "$entry[0]\n";<br> $_=~/\[(n+)\]/;<br> print "$1\n";<br> @trans[0..2]= split /:(\s\s)/, $_;<br> print "$trans[0]\n";<br></H4> <p><b>ignoramus</b> 16800 "behavior - man <b>[n]</b>: <b>ignoran +te</b> [m]; <b>néscio</b> [m]" "behavior - woman [n]: <b>ignorante +</b> [f]; <b>néscia</b> [f]"</p><br> <p><b>ignition</b> 16795 general <b>[n</b>]: <b>ignicion</b> [f] + internal-combustion engine [n]: <b>ignicion2</b> [f]</p><br><br>< +br> ----------<br> <p>I should get this i.e. the lines above split into only three column +s and only this info should be displayed..:</p> <P> X Y Z <P>ignoramus ignorante n <P>ignoramus ignorante n <P>ignoramus nescio n <P>ignoramus nescia n <br> <P>ignition ignicion n <P>ignition ignicion2 n <br><br> <p>Many thanks for any suggestions. And as you can see, I am new in pr +ogramming, I'm a linguist.

Replies are listed 'Best First'.
Re: making a list-beginner
by AnomalousMonk (Archbishop) on May 06, 2009 at 23:51 UTC
    I am not sure just what you are trying to achieve, and it is always nice to see some code from a poster to indicate a willingness to make an effort to learn.

    However, a couple of suggestions that may your post more clear:
    • An HTML end tag begins with a forward slash, not a back slash, e.g.,  <code> ... </code>
    • Within code tags, any HTML tag is rendered literally, e.g., the tag  <p> will appear exactly that way.
    Please see Writeup Formatting Tips and Markup in the Monastery and How do I post a question effectively? for more info.
Re: formatting text
by Utilitarian (Vicar) on May 07, 2009 at 08:53 UTC
    Hi Gato, Could you resend your input, code and expected output? The HTML tags makes it difficult to tell what tags you have in the input. As I see it you wish to extract the first word, first word after a colon and the character contained in the first [] from each record, the following, iterating over a file should extract the terms you are looking for.
    $_=~ m/^(\w+)[^[]+\[(.)\][^:]+(\w+)/; print "$1\t$3\t$2\n";