Simple Substitution

winter67uk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Simple Substitution by dragonchild (Archbishop) on Jan 13, 2005 at 17:54 UTC
There's several options for your regex. You could add the character you're binding with to your replace `s/([^?])'/$1\'\n/g` [download] You could use negative lookbehind `s/(?<!\?)'/\'\n/g` [download] Also, instead of -ne, I'd use -pe and get rid of the print statement. I'd also look at making it `-pi.bak -e` to do in-place editing with a backup file. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply] [d/l] [select]
Re^2: Simple Substitution by winter67uk (Initiate) on Jan 14, 2005 at 10:42 UTC
Dragonchild (and all who followed) - thank you. This line did the trick for me: `perl -pe "s/([^?])'/$1'/g" input.txt` Things you folks taught me: grouping in a regex - see perlrequick the special variable $1 - ditto the -p switch the -i switch check code before posting. I had two errors in mine (extra double quote and unnecessary escape) Cheers - Winter	[reply] [d/l]
Re^2: Simple Substitution by winter67uk (Initiate) on Jan 18, 2005 at 11:22 UTC
Final Update: this is what I used in the end... Parses a string in a file. The string uses an apostrophe as line terminator. Ignores apostrophes predeeded by the escape character, "?". Clever enough not to parse the same file more than once. `perl -p -i.bak -e "s/([^?])'([^\n])/$1'\n$2/g" input.txt` perl: Invokes the command interpreter. -p: p switch is 'assume loop like -n but print line also'. -i.bak: i switch is edit in place, .bak is the extension of the backup file. -e: e switch is 'one line of program'. "...": Use double quotes for Windows OS. s/.../.../g: Substitute. Match 1st and substitute 2nd. g means global. (...)...(...): Groups - whatever the value between brackets winds up in $1, $2 etc. `([^?])'([^\n])`: Match any apostrophes neither preceeded by the escape character "?" nor followed by a new line "\n" - three characters total. $1'\n$2: Replace matches with the value of group one (see above) followed by an apostrophe and a new line(\n) and the value of group two - four characters total. input.txt: File to parse.	[reply] [d/l] [select]
Re: Simple Substitution by TedYoung (Deacon) on Jan 13, 2005 at 17:54 UTC
Well, the version you have is very close. Try: `s/([^?]')/$1\n/g;` [download] The $1 is replaced by the contents of the first group. Groups are designated by () in the match portion. So, it matches one non-? and one ' and replaces that with what it found and a \n. This may be good enough, but keep in mind that it won't work if a ' is at the beginning of the string. A more complete solution would be: `s/'(?<!\?)/'\n/g;` [download] Note that this is untested, but should work! :-) Ted Young `($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)`	[reply] [d/l] [select]
Re: Simple Substitution by friedo (Prior) on Jan 13, 2005 at 17:58 UTC
I would use a capture group, like this: `s/([^?]\')/$1\n/g;` [download]	[reply] [d/l]
Re: Simple Substitution by ww (Archbishop) on Jan 13, 2005 at 18:38 UTC
If your sample data is representative, line terminator appears to be `eol[0-9]'` (or, maybe `+eol\d+'`). Looking from that angle, might it be easier to substitute on the (apparently unambiguous) +eol\d' update: belated example: `#!usr/bin/perl -w $foo = <DATA>; $foo =~ s/\+eol\d'/\n/g; print $foo; print "\n\n Done\n"; __DATA__ AAA+XXXX+1234++here?'s some text+eol1'BBB+XXXX+1234++here?'s some text ++eol2'CCC+XXXX+1234++here?'s some text+eol3'etc.` [download] # OUTPUT: # # AAA+XXXX+1234++here?'s some text # BBB+XXXX+1234++here?'s some text # CCC+XXXX+1234++here?'s some text # etc. # # Done The following notion is mine, and may not be wise (CORRECTIVE comments welcome!): It's usually worthwhile to match against the largest possible chunk of data, to minimize ambiguity for the regex.	[reply] [d/l] [select]
Re: Simple Substitution by holli (Abbot) on Jan 13, 2005 at 17:55 UTC
`$_ = "AAA+XXXX+1234++here?'s some text+eol1'BBB+XXXX+1234++here?'s som +e text+eol2'CCC+XXXX+1234++here?'s some text+eol3'"; s/(?<!\?)'/\n/g; print;` [download] holli, regexed monk	[reply] [d/l]
Re: Simple Substitution by ambrus (Abbot) on Jan 13, 2005 at 20:22 UTC
I think you might be having a shell quoting problem. The backslashes are swallowed by the shell if this is under sh. Try this: `perl -wne 's/[^?]\47/\47\n/g; print' filename` [download] This solution won't work if you have two consecutive (unescaped) apostrophes, or if the file has no newlines and is too long to read in memory.	[reply] [d/l]
Re^2: Simple Substitution by winter67uk (Initiate) on Jan 14, 2005 at 10:56 UTC
Thanks ambrus, well done for spotting the unnecessary double quote after the substitution - s/.../.../g". The Windows command environment requires double quotes, but I mistakenly included an extra one in the code I posted.	[reply]