Need Regular Expressions help

Post any Bulk Rename Utility support requirements here. Open to all registered users.

Need Regular Expressions help

Postby henry » Fri Oct 19, 2007 10:20 pm

Hi! I feel really, really sheepish in asking this question, but I'm pretty new to regular expressions & this one (which I'm sure is pretty basic) has me stumped. I've looked at various guides & tutorials recommended on this site (and elsewhere) but can't get this one to work.

Pretty simple (I thought!) - replacing the "." between letters only with a space, leaving the dots between numbers alone.

Sample:

My.File.v.1.2.345

My.File.v1.2.345

this.Is.my.File.v.1.2.345

IE, tons of files with a dot "." between words (don't want to keep) followed by a version number (do want to keep).

So, results would be (for the above examples):

My File v.1.2.345

my File2 v1.2.345

this Is my File v.1.2.345 [words don't always have uppecase 1st letter]


Sorry, but I've tried & tried & can't find out what I'm doing wrong.

In the above examples I simply want to replace the "dot" between words with a space, keeping the dot between numbers (version number).

It's not always consistent whether there's a "v.1" or a "v1" to indicate file version number. If the end-result is either of these, that's fine with me.

Also, the filenames *usually* are first letter uppercased, but not always.

Any help on how I do this would be =greatly= appreciated!

Again, so sorry to bother you with something so basic as this.

PS - I'm donating some $$ right now, to thank you for this great utility!

Thanks!

Henry
henry
 
Posts: 7
Joined: Wed Aug 01, 2007 8:18 pm

Postby Stefan » Sun Oct 21, 2007 2:58 pm

EDIT:
Note: the \w i use here is not correct!

It has to be [A-Za-z] like Glenn mentioned.
So replace my \w with [A-Za-z] below.


Or with an \D which works in this case too.

/EDIT

-------------

Hi Henry,

you have: this.Is.my.File.v.1.2.345
you want: this Is my File v.1.2.345

you have to search for: char dot char
the char's you want to keep back, so search for: (char) dot (char)


An char you can find with \w ==> a,b,c...A,B,C,...-
And an dot you can find by \.
so you could search for (\w)\.(\w)

Than you can replace by group one by \1
and an manually added space
and group two by \2
So the dot is lost.


Unfortunately this work not the way you need in BRU.
In PERL you would search&Replace by s/(\w)\.(\w)/\1 \2/g (This g global would search all occurs of (\w)\.(\w))

In BRU (and others too) you have to search for all parts of the hole name.
So you have to know how many parts (seperated by dots) are in the name.
But your examples show names from diverend lenght.

So you have to do several runs for each name lenght.

(as many)(\w)\.(\w)(as many)(as many)(\w)\.(\w) etc

(.+?)(\w)\.(\w)(.+?)(\w)\.(\w)(.+?) etc


Sorry, i know this is not clear explained!
But this is not my day :cry:
I will try it next time better.

bye
Last edited by Stefan on Mon Nov 05, 2007 3:04 pm, edited 1 time in total.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Postby Stefan » Mon Oct 22, 2007 8:35 am

I don't know how,
but we have the PCRE.dll already
maybe Jim can add an option
to use this "Perl style" s/ / /ig in BRU too?

Enabled by an check box in the original RegEx field
or with an new dialog separated for this issue only.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Postby Glenn » Mon Oct 22, 2007 7:32 pm

Hi,
Just wanted to add my 2 cents worth
There are a few ways of finding the required dots, but here are a couple of simpler ways.
The way shown by Stefan is a good one except the meta character \w stands for both
letters A-Z and a-z as well as 0-9 and the space character, and would allow a match with a dot adjacent to numbers
We want to find only those dots surrounded by letters.

One way would be with positive lookbehind/lookahead which would look like this:

((?<=[A-Za-z])\.(?=[A-Za-z]))
This says find dots that have letters on both sides. This is only part of the total expression which would be required.
However this can be confusing so we will try a slightly modified version of Stefan's which would look like:

^([A-Za-z ]+)\.([A-Za-z]+)([\w\.]+)

As Stefan said, the way BRU is set up now it won't do a global search and replace. I'll get to that in a minute.
In order to work around that we need to cycle through the filenames replacing one dot each time.
The expression works as follows
We want to capture whatever is on either side of the dot, but NOT the dot, and NOT if surrounded by numbers, so

([A-Za-z]+)

on either side would do for the first pass.
However, on subsequent passes there would be the replacement space(s) on the left side to allow for, so we will allow those to be included in front of the current dot so we would use

([A-Za-z ]+)
on the left side of the expression instead. Note there is now a space between the z and the ]

The replaceable dot has to also be followed by a letter so we follow it with

([A-Za-z]+)

Finally, we want also to capture whatever remains which we catch in

([\w\.]+)

This allows any combination of letters, numbers and dots downstream from the current capture.

Each renaming pass will replace a letter|dot|letter with a letter|space|letter until there aren't any letter|dot|letter combinations left, in which case the pattern will no longer match the filename.

There is a global search and replace option in the PCRE library. It is "-g" in the documentation I read.

However, only Jim would know whether it is possible, and if so, if the time/trouble was worth it

Anyway, for what it's worth

Cordially,
Glenn

Also, I think this post should be moved to the regular expression help section.
Glenn
 
Posts: 28
Joined: Fri Apr 14, 2006 4:53 pm
Location: Winnipeg, Canada

Postby Stefan » Mon Oct 22, 2007 8:12 pm

I am afraid you was right whit \w :wink: Thanks.
Next i would try with \D\.\D for this case (where your regex is more exact to be clear)

\w is equivalent to [a-zA-Z_0-9].
\D Matches any non-digit. Equivalent to and [^0-9]

Reference:
http://www.night-ray.com/regex.pdf
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Postby Glenn » Mon Oct 22, 2007 10:57 pm

Yes, you could use \D instead of [A-Za-z], but I chose not to because:

- some Regex engines do not support it, although I think the PCRE one in BRU does
- for folks that are trying to learn from this forum, it introduces more confusion; [A-Za-z] is fairly obvious what it represents
- \D is not the same character set as [A-Za-z]. With [A-Za-z] you know exactly what you match. Using a negated character class [^0-9], which \D represents (everything except 0-9) can be a slippery slope if you're not careful. It not only represents the letters of the alphabet, but most everything else on your keyboard plus low order ASCII, including the newline, which can cause a lot of debugging headaches if you forget that. Technically, because the BRU probably works on a single filename string at a time the newline may not cause a problem here, but it is something to keep in mind.

Cordially,
Glenn
Glenn
 
Posts: 28
Joined: Fri Apr 14, 2006 4:53 pm
Location: Winnipeg, Canada

Thanks!

Postby henry » Tue Oct 23, 2007 6:55 pm

Wow! Thanks to all for the wonderful advice. As soon as I get another batch of such files (probably a week or so) I'll give all your suggestions a try & report back here.

Thanks again to all!!

Henry
henry
 
Posts: 7
Joined: Wed Aug 01, 2007 8:18 pm

Postby Stefan » Tue Oct 23, 2007 8:27 pm

Jim was that generously and allowed me
to post some alternative Renamers
which fit better for this special task we talked about above.
(Search & Replace an pattern in hole file name, without knowing how often the pattern occurs)

Note:
non of those renamers are better or worse then the others,
just one do this better, an other that.

In non preferred order:

PFrank --- http://www3.telus.net/pfrank/

Image


:==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==:

Siren --- http://www.scarabee-software.net/en/index.html

Image


:==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==:

ReNamer by den4b --- http://www.den4b.com/

Image

ReNamer may be the easiest one to understand for novices users.


:==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==:

Flexible Renamer --- http://hp.vector.co.jp/authors/VA014830 ... /FlexRena/

Image


:==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==: :==:

Thank you Jim.

(Sorry Glenn, i use the shorter regex \D which fit here fantastic and is better for the view)
\D Matches any non-digit. Equivalent to [^0-9]

Reference:
http://www.night-ray.com/regex.pdf
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Another Thanks!

Postby henry » Wed Oct 24, 2007 7:43 pm

Stefan,
Thanks for your follow-up advice. I'll try BRU first (loyalty!) then the others.

I'll report back on this project when I get another bunch of "dotted" files, probably in 1-2 weeks.

Henry
henry
 
Posts: 7
Joined: Wed Aug 01, 2007 8:18 pm


Return to BRU Support