Guigou has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am very new to Perl coding. I want to achieve a simple taks which consists of getting starts and ends of sentences. All is stored in a file and looks like that :

<Turn speaker="spk2" startTime="6.97" endTime="8.536">

I had a meeting with a specialist who gave me a Perl code that looks like this : C:\_DATA_Guillaume_Nassau>cat C1_A1_1a.trs | perl -ne "print \"$1 $2 $2\n\" if (/Turn.*speaker=(.*)\s+.*=(.*)\s+.*=(.*)/)" > toto.xxx

That would compile a file with all the information I need. But it won't work --> "semicolon seem to be missing at line 1" Anyone sees the problem ? This worked perfectly on this person's computer, he copied and pasted it in an email sent to me. I did install activeperl which he was using and now I get this error...

Replies are listed 'Best First'.
Re: Help please?
by QM (Parson) on May 30, 2014 at 10:03 UTC
    Welcome to the monastery.

    For posting, use <code>code tags</code> around your code.

    Useless use of cat: Put the filename after all other perl arguments, such as:

    perl -ne "blah" filename

    You probably meant $3 instead of the duplicate $2.

    .* is greedy, and will tend to eat too much. Given your input example, you might use \S+ to only capture non-whitespace, and .*? to be non-greedy elsewhere. Or, perhaps constrain your regex more with /Turn.*?speaker=(\S+)\s+startTime=(\S+)\s+endTime=(\S+)/

    On Windoze/DOS, quoting is an issue, but you don't seem to have a problem so far. I prefer to use qq// and q// for quoting in perl oneliners. For example:

    perl -ne "print qq/$1 $2 $3\n/ if (/Turn.*?speaker=(\S+)\s+startTime=( +\S+)\s+endTime=(\S+)/)" input_file > toto.xxx

    To check the compile without running, add -c as a perl option:

    perl -c -ne "print qq/$1 $2 $3\n/ if (/Turn.*?speaker=(\S+)\s+startTim +e=(\S+)\s+endTime=(\S+)/)" -e syntax OK

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      So I tried :

      perl -ne "print qq/$1 $2 $3\n/ if (/Turn.*?speaker=(\S+)\s+startTime=(+\S+)\s+endTime=(\S+)/)" input_file > toto.xxx

      And it tells me :

      c:\>perl -ne "print qq/$1 $2 $3\n/ if (/Turn.*?speaker=(\S+)\s+startTi +me=(+\S+)\ s+endTime=(\S+)/)" C1_A1_1a.trs > toto.xxx Quantifier follows nothing in regex; marked by <-- HERE in m/Turn.*?sp +eaker=(\S+ )\s+startTime=(+ <-- HERE \S+)\s+endTime=(\S+)/ at -e line 1.

      The file toto.xxx is created, but empty. I hardly understand what I am doing but I am in an effort to.

        I observe that your copy/paste skills and/or typing skills have come up short. There's a stray + in (+\S+) (i.e., the first one.) The error message even points it out for you :)

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        How did the + get after the (?
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Help please?
by choroba (Cardinal) on May 30, 2014 at 09:49 UTC
    Did you copy&paste from the e-mail correctly? I fear you didn't, as the second $2 should've been $3. Isn't a backslash or quote missing somewhere?

    Also, your prompt seems MSWin-like. Are you sure cat works on MSWin?

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Sir, I am sure of absolutely nothing ahah. I did copy well because I used copy and paste so I guess it shouldn't add any error ? And the line was like that when that guy used it... I am not sure at all it does work on ms dos neither... might be the problem...
        Outlook sometimes changes dash to emdash or somesuch. Go back and retype the dash, or perhaps retype the whole thing from scratch.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of