Regex to remove duplicate characters

Post any Bulk Rename Command support requirements here. Open to all registered users.

Regex to remove duplicate characters

Postby BuckyKatt » Thu Aug 07, 2014 4:59 am

Hello.

I'm actually pretty good at doing regexs, and I've had this work in other contexts before, but I can't seem to get it working with BRC.

I am trying to get rid of repeated parenthesis and square brackets...

Hello (Dave) [[345], did you ____ (((redacted)). => Hello (Dave) [345], did you ____ (redacted).

I don't know the number of times the parenthesis or brackets might be repeated.

To JUST fix the opening square brackets, I tried:

brc32 /regexp:"(.*)\[+(.*):\1[\2" /execute

I assumed this didn't work due to the expression being greedy, so I tried the ?:

brc32 /regexp:"(.*)\[+?(.*):\1[\2" /execute

I then figured the + was not working, so tried the {1,}:

brc32 /regexp:"(.*)\[{1,}(.*):\1[\2" /execute

Next I tried a variant...

brc32 /regexp:"(.*)\[+?([^\[].*):\1[\2" /execute

All worked wrong. I'm stumped.

Thanks in advance.

BKNJ
BuckyKatt
 
Posts: 4
Joined: Thu Aug 07, 2014 4:02 am

Re: Regex to remove duplicate characters

Postby Stefan » Fri Aug 08, 2014 10:13 pm

 

I would better use an VBScript for such tasks.

With BRU you have anyway to match every part of the whole file name, so all of your file names would must have the same parts and doubled signs to match.




 
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Regex to remove duplicate characters

Postby BuckyKatt » Sat Aug 09, 2014 6:04 am

I did match the entire filename... hence the (.*) start and ([^\[].*) end. The problem is that "+" appears to not be matching "1 or more" \[.

Normally, I do my bulk renaming with Perl... but since these renames are occurring on W32, I wanted a batch file solution. I haven't written anything in VB/VBScript in 15 years... and I like regexs... they save me tens of lines of code.

BKNJ
BuckyKatt
 
Posts: 4
Joined: Thu Aug 07, 2014 4:02 am

Re: Regex to remove duplicate characters

Postby Stefan » Sat Aug 09, 2014 9:10 am

>>I did match the entire filename...
Yes, but that would match probably on one file name only. Since other file names have the problem on another position, you can't rename more than one file at once?

>>>The problem is that "+" appears to not be matching "1 or more" \[.
You may know that + works on the last expression on the left.
So it may be that \[+ only sees the [ as expression and not the escaped pack \[
So I would try (\[)+
Not sure if this works as it depends of the used regex engine and implementation.

>>Normally, I do my bulk renaming with Perl... but since these renames are occurring on W32, I wanted a batch file solution.
Then try the Win32 equivalent 'PowerShell'
Like with VBS i would loop over the file name one-by-one sign and compare the current sign with the last one stored in a temp var.
Also the regex engine is more powerful as you can search&replace parts of an file name, instead oh having to match the whole name.

Me think you can do that with BRC too by executing many times search&replace two signs by one of them.
Find: ((
Repl: (

Find: ))
Repl: )


 Please wait a few seconds then submit. Thank you.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Regex to remove duplicate characters

Postby BuckyKatt » Sun Aug 10, 2014 3:03 am

Stefan wrote:>>I did match the entire filename...
Yes, but that would match probably on one file name only. Since other file names have the problem on another position, you can't rename more than one file at once?


The pre- and post- searches take care of position... (.*) and ([^\[].*)... so these would match on:

aaa [[[bbb]]] ccc ddd eee fff.
or
aaa bbb ccc ddd eee [[[fff]]].

Granted, it would get confused on

aaa [[[bbb]]] ccc ddd [[[eee]]] fff.

Because it has two sets of ['s to match... but in the filenames I have, I don't expect this to happen too often.

>>>The problem is that "+" appears to not be matching "1 or more" \[.
You may know that + works on the last expression on the left.
So it may be that \[+ only sees the [ as expression and not the escaped pack \[
So I would try (\[)+
Not sure if this works as it depends of the used regex engine and implementation.


To the best of my knowledge, the backslash/escape should trump pretty much everything else in order of operations for a regex.

Adding an extra couple parenthesis didn't help.

>>Normally, I do my bulk renaming with Perl... but since these renames are occurring on W32, I wanted a batch file solution.
Then try the Win32 equivalent 'PowerShell'
Like with VBS i would loop over the file name one-by-one sign and compare the current sign with the last one stored in a temp var.
Also the regex engine is more powerful as you can search&replace parts of an file name, instead oh having to match the whole name.

Me think you can do that with BRC too by executing many times search&replace two signs by one of them.
Find: ((
Repl: (

Find: ))
Repl: )



I don't want to loop or use temp variables... that's why I'm Regexing. ;-)

If BRC can't handle this stuff, I guess I'll make a Perl one liner to do it.

BKNJ
BuckyKatt
 
Posts: 4
Joined: Thu Aug 07, 2014 4:02 am


Return to BRC Support