Removing characters

A swapping-ground for Regular Expression syntax

Removing characters

Postby taconsta » Wed Apr 29, 2009 12:27 pm

Hello there!

Some applications generate filenames with valid, although strange characters like ~!@#$%^&|
Please note that these are valid characters for a filename under Windows. However, I dislike (actually fear) having such characters in the filename.
Question 1 is: How can I remove them with BRU and RegEx (please see below)
Question 2 is: Is there a way to do a shortcut on the context menu like "Comb unwanted characters" to do the job automatically?

More detail:
I am aware of the "Remove characters" option and/or "Remove symbols". Problem with both is that they (a) remove, not replace and (b) remove also the underscore character, which I like to use as word separator. So, a file titled
Code: Select all
Journal_Name | Page Title

might become something like
Code: Select all
JournalNamePage Title


I've tried with regex and does not work.
The following:
Match: #
Replace: _
when applied to File@%&#.ext leads to _.ext
Same if Match is [@#]

However, in gVim, the command
:s/[@#$]/_/g
changes Record~!@#$%^.wav to Record~___$%^.wav as intended (note there are THREE underscores).
(according to the BRU help on RegEx [abc] matches any character between the brackets)

HOWEVER, I am aware that the example presented here is not really a sollution, as Record~!@#sometext$%^.wav would not have the $%^ characters removed.
I've also looked in the forum, but did not find any appropriate answer. Thank you!
taconsta
 
Posts: 5
Joined: Wed Apr 29, 2009 12:05 pm

Re: Removing characters

Postby Admin » Wed Apr 29, 2009 1:00 pm

The easiest way to do this I think is using Character Translations in the Options menu.

Character Translations allows you to enter a specific character or sequence of characters, and have that translated into a different character or sequence of characters. So for example, you could specify that you always want a $ sign to be converted into the word DOLLAR.

There are three ways to enter the replacement data:
1. As a character, e.g. A
2. As a hex value, e.g. 0F
3. As a decimal value, e.g. 065

Separate the FROM and the TO conversions with an equals sign. If you wish to actually convert an equals sign to/from something else then you can specify the hex or decimal value for the equals sign in your rules.

Bulk Rename Utility identifies the type of value entered by its length. So if your value is one character long then it's a direct character; two characters long and its a hex value; three characters long and its a decimal value.

In the following examples, every example is converting a capital "A" to a capital "B"

· A=B (direct expression of the characters to convert)
· 41=42 (two character long, therefore hex values)
· 065=066 (three characters long, therefore decimal values)
· A=066 (using a mixture of the above)
· 41=066 (using a mixture of the above)

If you wish to to convert several characters then you can separate the values by commas. So the following example will convert ABC to DEF:

· 41,066,C=D,E,070

If you wish to actually convert a comma sign to/from something else then you can specify the hex or decimal value for the comma sign in your rules.
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Removing characters

Postby taconsta » Wed Apr 29, 2009 3:18 pm

I would not have came with this idea ever! Sure, it's not exactly the perfect way (i.e. I might want it sometimes, but not always), but for my purpose fits well: as I said, I have the principle of not having those characters in filenames.
It turns out that I have to have each character on a separate line, or else I get only one underscore and only the sequence defined there is valid.
Thanks!
taconsta
 
Posts: 5
Joined: Wed Apr 29, 2009 12:05 pm

Re: Removing characters

Postby taconsta » Mon May 04, 2009 2:13 pm

taconsta wrote: I have the principle of not having those characters in filenames.
It turns out that I have to have each character on a separate line, or else I get only one underscore and only the sequence defined there is valid.
Thanks!


Errr... I just noticed: the character translation seem to get lost after quitting the program. Is this by design, is it a bug, or does it try to write in a location where "mere users" (i.e. limited users) don't have the right to?
taconsta
 
Posts: 5
Joined: Wed Apr 29, 2009 12:05 pm

Re: Removing characters

Postby GMA » Mon May 04, 2009 11:01 pm

Hi, taconsta:
I don't think that's a bug, I think that's by design. Maybe you should have all the lines saved in a TXT, and then paste them in the "Character Translation" dialog every time you need to use them. Not the best solution, but it will save you some time.
Cheers,

Gabriel.
GMA
 
Posts: 91
Joined: Sun Dec 02, 2007 1:30 pm
Location: Argentina

Re: Removing characters

Postby taconsta » Wed May 06, 2009 5:17 pm

GMA wrote:Hi, taconsta:
I don't think that's a bug, I think that's by design.


Yeah, there's some reason in it (like not understanding why some characters get translated or, even worse, having them changed and not noticing it), but I am not ecstatic about this feature :)
taconsta
 
Posts: 5
Joined: Wed Apr 29, 2009 12:05 pm

Re: Removing characters

Postby jesop7911 » Thu Jun 17, 2010 6:18 am

Hi , I am a complete novice at this type of thing and also I am not technical and just dont understand so I need help. I have a database of 14000 names and all but a few have a suffix of - and then a number. I have downloaded BRU but have no idea what to do next to remove the - and the number. Can someone help pls , otherwise I see myself spending hours trying but getting nowhere fast. Thanks
jesop7911
 
Posts: 1
Joined: Thu Jun 17, 2010 6:13 am

Re: Removing characters

Postby Stefan » Thu Jun 17, 2010 10:16 am

Hi jesop7911, welcome.

jesop7911 wrote:Hi , I am a complete novice at this type of thing and also I am not technical and just dont understand so I need help.
I have a database of 14000 names and all but a few have a suffix of - and then a number.
I have downloaded BRU but have no idea what to do next to remove the - and the number.
Can someone help pls , otherwise I see myself spending hours trying but getting nowhere fast.
Thanks


Like "file name - 12345.ext" ?

Try this RegEx rule:

RegEx(1)
Match: (.+) - \d+
Repla: \1

Explanation:
(.+) ===> find one-or-more of any sign and group them in group 1
' - \d+' => find blank dash blank followed by one-or-more digit

Replace by what is found in group 1 by the meta char '\1'


-------------------

General notes/ Disclaimer

Hope this helps ? :D
If yes, please help two others too. And consider an donation to the tools autor.


Please note:

* Usaly i do a few tests on this issue only!
* So please test my solution with some test files first before you destroy your data.
* Select one or more files in the Name column to watch how the New Name will be.

RegEx is an pattern matching solution, so all your files have to fit into the same pattern.
If they not, you have to separate them and run some more actions against them.

To find your own solution you have to virtual (in mind) split your file names/strings into parts
following the rules of the regular expression syntax, see the help file coming with your application.
(Please note that there are several flavors of RE engines and also different implementations into apps
and even different ways of doing or thinking, so your expiriences may differ from my explanation)
Once you have split your string into parts you can decide which to use into replacement by grouping the pattern
into (...) parenthesis to which you can refer by using "\1" or "$1" signs later, or which to drop and which to modify.

* It's always an good idea to provide all possibilities of file name pattern in question.
* That would give the supporter an change to do it right ;)
* If your real file names doesn't fit into your example pattern my solution may fail.

* Don't use this ' ' or " " -quotes from my explanation. They are only for clarification.
* '?' means non-greedy: stop at first match, instead of last possible.
* This (...) parenthesis are used to "group" what is found by this RegEx inside the ()'s
to reuse this capture later as replacement by using \1 or $1.
* Instead of ~ -signs, if used in my explanations, type an space/blank.


More Help
* online tester:
- http://rereplace.com/
- http://www.regextester.com/
- http://www.regexlib.com/RETester.aspx

* online help:
- www.regular-expressions.info
- www.regexlib.com/
- www.regexlib.com/CheatSheet.aspx

See this both oldest threads in the "Regular Expressions" forum for an RegEx syntax overview:
=> Getting Started: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
=> Go ahead: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27
There you will find more examples and tips as you may find in other threads in the "Regular Expressions" sub-forum.


Bye,
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Removing characters

Postby taconsta » Thu Jun 17, 2010 10:25 am

Hi Jesop!

First of all, please allow me to bash you a bit: you should open a new topic, not continue an existent one, even if it is remotely similar to what you want, since it is not the same problem. No harm done, though.

Back to your question: probably the best way to do it in BRU is by using "regular expressions" (read also the help topics of BRU which appear when searching for "regular expression", you also have some examples there). For your particular case, I would try something like this: in the RegEx (1)set of rules, type:
Match: ([a-zA-Z _]*)- \d+
Replace: \1
Now let's break it into pieces: you search for a Match and replace it with a pattern.
The "Match" pattern looks like this:
[a-zA-Z] means all characters in the a-z range (so a, b, c...) and A-Z match A, B, C...
Then you might have a blank or an underscore in your file name, so I added them as well; should you have other characters (like # or whatever, add them too)
You define a "set of characters" by putting them between square brackets: [abd] matches a OR b OR d, but only once. To match it several times (but also never), you add a * after the character set
I'll explain the round paranthesis later.
After the series of letters and blanks I presume you have your "- number" sequence.
Since you did not mention how many digits you have, I decided to say: digits (\d) at least one time (+), so you get the - \d+ part of the pattern.
Now, you want to replace this, with just the part before "- digits" and here come in the round paranthesis: whatever is between them is considered one block and can be refered to in the "Replace" by a number (representing its position) prepended by a backslash, so \1 means "whatever was matched by [a-zA-Z _]* should appear here". [you could have put \d+ between paranthesis and refer to it as \2]

Hope this will help, but you have to keep in mind one thing: if you have several files which have the same "name" but different numbers, you'll OVERWRITE them so think well before starting, maybe make a backup copy of all of your files.

And, like I said, spend some time reading the help on regular expressions, and the examples which are provided in the help.
taconsta
 
Posts: 5
Joined: Wed Apr 29, 2009 12:05 pm


Return to Regular Expressions