kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to eliminate from a string, $text, the following pattern type (usually at the end of the string):

A 2007A&A...474..653V C 2002yCat.2237....0D A [1 90] 2007A&A...474..653V F 2007A&A...474

etc...

i think it's best to use the position match and then create a substr

But this is not going very well...

Any suggestions? Kindly apreciated...

Kepler

Replies are listed 'Best First'.
Re: String replacement
by afoken (Chancellor) on Sep 13, 2015 at 16:35 UTC

    But this is not going very well...

    Any suggestions?

    1. Show your code, wrapped in <code></code>
    2. Show your input, wrapped in <code></code>
    3. Show the actual output, wrapped in <code></code>
    4. Show the expected output, wrapped in <code></code>
    5. Explain what goes wrong.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: String replacement
by Laurent_R (Canon) on Sep 13, 2015 at 20:39 UTC
    i think it's best to use the position match and then create a substr
    Why? If you could use index (or rindex) to find the right position to use for substr, why not? But, with your data, you'll most probably use a regular expression to find what you want to delete, you might as well use the s/// substitution operator to delete what you don't want.

    But that's sort of theoretical, I really can't say very much without having seen samples of your input data and desired output.

      Hi, I think the following regex/substitution is working...

      $loop_variable =~ s/ ?[A-Za-z]{1} {1}(\[(.*)\])? ?\d+(.*)?\.+(.*)?//g;

      Had quite a bit of work though...

        Yes, maybe it works, but I can't comment much on it without having seen the data, which you haven't shown yet.

        One thing, though: if you need an atom or a sub-pattern once and only once, then you don't need a quantifier such as {1}, that's what the regex engine will do anyway by default. So that this slightly simpler substitution should do the same thing:

        $loop_variable =~ s/ ?[A-Za-z] (\[(.*)\])? ?\d+(.*)?\.+(.*)?//g;
        I believe the rest of the regex could probably be improved. For example something like .* is rarely a good idea (.*? is often better), and (\[(.*)\])? is probably better written as (\[(.*?)\])? or as (\[[^]]*\])? (untested). That's just one example.

        But I would need to see the data and to know what exactly you find significant in the strings you are trying to match (what are the invariants that you are looking for and what are the variable parts) before I could really give more informed advice.