harshashende has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl .I want to remove first word from each row of my file and save the rest data in other file, for this I am using substr function for eg..

i/p : 1234 Run test

want o/p :Run test

but getting output : R u n t e s t

I referred to the post 'Perl Regex with Extra Spaces' and included the code given there in my code, but still getting the same error. Searched a lot my not getting anything .

Please help for the same. thanks in advance

  • Comment on extra spaces between the characters in string

Replies are listed 'Best First'.
Re: extra spaces between the characters in string
by davido (Cardinal) on Jun 08, 2011 at 07:27 UTC

    You said you want to remove the first word, but in your example of input and desired output it seems you're removing the leading numbers. Is that what you really meant to say?

    I don't really think that substr is the right tool for the job... at least not when Perl provides such powerful regexp tools.

    use strict; use warnings; s/^\d+// && print while <>;

    Invoke it like this:

    myscript intext.txt > outfile.txt

    Or as a Perl one-liner:

    perl -p -e 's/^\d+//' infile.txt > outfile.txt

    Of course it's always advisable to skip the redirection on the first run through the script so that you can see on screen if your output is going to be what you want.


    Dave

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: extra spaces between the characters in string
by moritz (Cardinal) on Jun 08, 2011 at 07:30 UTC
Re: extra spaces between the characters in string
by ikegami (Patriarch) on Jun 08, 2011 at 07:42 UTC
    Sounds like UTF-16 to me. Use
    open(my $fh, '<:encoding(UTF-16le)', $qfn)
    to open your file.

      Thanks Moritz and ikegami.

      I dont know about UTF-16 encoding , will have a look into this.

Re: extra spaces between the characters in string
by locked_user sundialsvc4 (Abbot) on Jun 08, 2011 at 12:46 UTC

    Believe it or not, even in a problem that looks like it could have a solution using substr(), regular expressions are usually a superior solution.   So much time has been poured into that code, both in terms of ability and of overall efficiency, that it wins race after race after race.   (I know, it’s counter-intuitive.   It had to be proven to me, too.)

      Believe it or not,

      Not. Show us one example where a regex comes even close to matching much less beating substr?

      $s = '1234567890'x10;; cmpthese -1, { a=>q[ my $x = substr $s, $_, 10 for 0 .. 99], b=>q[ my($x) = m[^.{$_}(.{10})] for 0 .. 99] };; Rate b a b 1113/s -- -97% a 42708/s 3738% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Problems that *could* be done with substr, but are not trivial as in your example. Also, "superior" ne "purely faster".

        If it requires loops and ifs around the substr, it is surely faster to write and debug. Execution would depend on how much and how fancy you need to be. The OP problem, for example, is nearly trivial with a regex, but you'd have to pay me money to write as a set of substr()s.

        "Usually" is certainly debatable, and I expect that it strongly depends on your environment. Personally, my code has a ratio of maybe 1:99 substr:regex. Filtering to what could plausibly be done with a set of substr, I'd guess it could be brought up to about 50/50, but the code would be horrendously brittle and scary. And I'm pretty sure my implementation of the search and matching would not be as fast as the regex engine.

        If you are always dealing with fixed width field data, substr becomes more useful and common.