whitelist of allowed characters

Post any Bulk Rename Utility support requirements here. Open to all registered users.

whitelist of allowed characters

Postby bryn » Sun Mar 25, 2012 11:00 pm

Is there any way to create a whitelist of allowed characters? I am trying to strip out any character that is not A-Z,a-z,0-9,-,(,) Everything else I am hoping to remove all together. I tried creating character translations and made a large list of about 30 items, but still I am finding new ones all the time that I did not think of. Also I find a couple of characters will revert back to the english character by themselves after a save, close, and reopen of BRU. For example, ?=L will work fine when i first put it in there. But once I save and close out of BRU, going back in shows it is now just L=L. So hoping to just be able to remove any characters that dont fit the above mentioned.

Thank you.

EDIT: I noticed the forums dont like some of these special characters either. The one I used in the example that got replaced with a question mark was the L with a slash through it.
bryn
 
Posts: 3
Joined: Sat Mar 24, 2012 2:38 pm

Re: whitelist of allowed characters

Postby bryn » Mon Mar 26, 2012 2:59 am

I think regex might be the way to do this, but I cant figure out how to use it within BRU. From what I have read online, I am assuming it will be something along these lines:

^[0-9A-Za-z()-]*$"

Not sure if the parenthesis or the hyphen needs to be escaped and where or how that breaks down to both the match and replace within BRU. Any suggestions? Thanks!
bryn
 
Posts: 3
Joined: Sat Mar 24, 2012 2:38 pm

Re: whitelist of allowed characters

Postby Jane » Fri Mar 30, 2012 3:26 pm

You should be able to clean up the filenames, but BRU unfortunately doesn't make it as easy as it could be due to the way the PCRE regex is utilized.
Normally you could give the regex match a list of the characters you DON'T want, and have it replace these globally with nothing, leaving only the characters you want. Unfortunately global replace (replace all instances rather than stopping after the first one) isn't implemented.
Also BRU has a quirk where you have to capture all parts of the filename you want to retain. In other words, if you match a non standard character and tell it to replace with nothing, a normal regex leaves the rest of the string alone, but BRU dumps it; thus the capture required.
A couple of things would shorten the regex required:
\w is a shortcut for A-Za-z0-9_
\W is a shortcut for everything that is NOT A-Za-z0-9_
So a simple regex would be something like:
(\w*)\W+(.*)
Replace:
\1\2
Because BRU doesn't replace globally you will have to remove the non-wanted characters one group at a time.
The regex means
(\w*) - match any possible wanted characters at the start of the filename up to the first non wanted one and capture them in \1
\W+ - match one or more non wanted ones
(.+) - capture whatever follows the non wanted group in \2
Replace with \1\2 means put back what surrounded the first non wanted character group.
Each time you cycle through a rename with this it will remove another group of non wanted characters until it no longer matches and all filenames are clean.

You can add to \w by using the character classes in square brackets. The use of \w is just to show the technique.
Jane
 
Posts: 24
Joined: Sat Aug 05, 2006 1:20 am


Return to BRU Support