Removing character(s) from the middle of a string

A swapping-ground for Regular Expression syntax

Removing character(s) from the middle of a string

Postby pm771 » Fri May 13, 2011 10:05 pm

Hi

I'm using BRU 2.7.0.4

Let say we have files names with "-"s in the middle. Something like this:
aaa-bbb
1234--ddd
eee---wwww

We want to remove "-"s using Regular Expressions (and not the Remove (5) )

The following works:
Code: Select all
Match: ([^-]*)(-+)(.*)
Replace: \1\3


But the following does not:

Code: Select all
Match: (.*)(-+)(.*)
Replace: \1\3


Is it the intended behavior or a bug?

TIA, Eugene
pm771
 
Posts: 8
Joined: Tue Sep 02, 2008 10:58 pm

Re: Removing character(s) from the middle of a string

Postby Stefan » Sat May 14, 2011 8:35 am

pm771 wrote:Hi

I'm using BRU 2.7.0.4

Let say we have files names with "-"s in the middle. Something like this:
aaa-bbb
1234--ddd
eee---wwww

We want to remove "-"s using Regular Expressions (and not the Remove (5) )

The following works:
Code: Select all
Match: ([^-]*)(-+)(.*)
Replace: \1\3


But the following does not:

Code: Select all
Match: (.*)(-+)(.*)
Replace: \1\3


Is it the intended behavior or a bug?

TIA, Eugene


Try
RegEx(1)
Match: (.*)(-+)(.*)
Replace: \1#\3


and you see that your regex works greedy.
This means the first "(.+)" eats all till the last dash. That's normal.

Test it
INPUT:
Test-1.txt
Test--2Zwei.txt
Test---3Drei.txt
Test----4Vier.txt
OUTPUT:
Test#1.txt
Test-#2Zwei.txt
Test--#3Drei.txt
Test---#4Vier.txt

You see that the first (.+) match all but the last dash.


So we try this
RegEx(1)
Match: (.*?)(-+)(.*)
Replace: \1\3

or
RegEx(1)
Match: (.*?)-+(.*)
Replace: \1\2


and you will see it works as indented:
OUTPUT:
Test1.txt
Test2Zwei.txt
Test3Drei.txt
Test4Vier.txt

The "?" switch to non-greedy and so the first expression ".*" stop at the first occurrence of the next expression "-+".
You just have to learn how regex work ;-)

An other logically regex would be
RegEx(1)
Match: (.*[^-])(-+)(.*)
Replace: \1\3

which means: none-or-more of any sign till an dash

So you has been already on the right track with
Match: ([^-]*)(-+)(.*)
Replace: \1\3


HTH? :D
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Removing character(s) from the middle of a string

Postby pm771 » Mon May 16, 2011 4:19 pm

Stefan,

Thank you very much for the response.

The "?" switch to non-greedy and so the first expression ".*" stop at the first occurrence of the next expression "-+".
You just have to learn how regex work


I didn't know about greedy / lazy functioning of + operator.

Your explanation and http://www.regular-expressions.info/repeat.html clarified it for me.

Eugene
pm771
 
Posts: 8
Joined: Tue Sep 02, 2008 10:58 pm

Re: Removing character(s) from the middle of a string

Postby Stefan » Mon May 16, 2011 8:09 pm

Thanks for the feedback.

Now at least i taken the time to read this info you pointed to
and did learn something new: <[^>]+> "An Alternative to Laziness"

I read such things many times but now i hope this time i have got it ;-) (Learning by doing is the best one can do)
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU


Return to Regular Expressions


cron