Add a space before each capital letter in the file name

A swapping-ground for Regular Expression syntax

Add a space before each capital letter in the file name

Postby prlaba » Tue Jul 05, 2016 4:27 am

I need to 'correct' some of my MP3 music files that were created with all spaces in the filenames removed. For example, the MP3 file for the song 'I Feel The Earth Move' is named 'IFeelTheEarthMove.mp3' instead of ''I Feel The Earth Move.mp3'. Fortunately all words in the filenames are capitalized, so I can determine where the missing spaces need to be inserted (between two consecutive uppercase letters, or between a lowercase letter and the uppercase letter that immediately follows).

My first attempt was to use this regular expression:

Match: ([A-Za-z])([A-Z])
Replace: \1 \2

I expected this would locate any upper or lowercase letter followed immediately by an uppercase letter, and insert a space between the two letters, like this:

IFeelTheEarthMove.mp3 -> I Feel The Earth Move.mp3

(The above expressions worked as expected using another app called RegexRenamer. Unfortunately that app only lets you rename files one folder at a time, not a viable option for someone looking to rename files spread across hundreds of folders.)

But when I entered the above expressions, it replaced the entire filename with the just the first matching letter pair and inserted space:

IFeelTheEarthMove.mp3 -> I F.mp3

OK, I suppose that makes sense, since that is what the Replace expression says to do. I obviously need to also preserve the 'rest' of the filename before and after each matching letter pair.

So I next tried this:

Match: (.*[A-Za-z])([A-Z].*)
Replace: \1 \2

That did preserve the missing parts of the filename, but didn't exactly do what I wanted:

IFeelTheEarthMove.mp3 -> IFeelTheEarth Move.mp3

A single space was inserted between the last matching letter pair (the 'h' at the end of 'Earth' and the 'M' at the start of 'Move'). But what I wanted was to insert a space between every matching letter pair, not just one.

I suppose I could just apply this renaming multiple times to all of my files -- each 'apply' would insert one more space into each file's filename -- until all of the missing spaces were reinserted. But there must be a better way.

Can anyone suggest Match and Replace expressions that will insert a space between all matching letter pairs while preserving the rest of the filename?

Thanks.
prlaba
 
Posts: 1
Joined: Tue Jul 05, 2016 3:17 am

Re: Replace multiple occurrences?

Postby Admin » Thu Jul 07, 2016 3:42 pm

This can be done with a JavaScript function in BRU if you are interested. thanks
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Replace multiple occurrences?

Postby therube » Thu Jul 07, 2016 6:32 pm

As a start only, certainly not complete, & very likely would not be right in any case, but it does sort of separate the CAPS (to a point).

Something like this:

Code: Select all
Match:  ([A-Z){0,1}([^A-Z]*)([A-Z])([^A-Z]*)([A-Z])([^A-Z]*)([A-Z])([^A-Z]*)([A-Z])([^A-Z]*)
Replace:  \1 \2  \3  \4 \5 \6 \7 \8 \9


Seems to be a limit on the number of replacements \x ?
Not sure if it will handle "abcU"?
The Replace: so far is simply to give an idea of what is going on.
Some names will result in a replacement containing an opening space. Don't know offhand what actually may happen if that is actually written? Would the file name be " filename" or will the file name be named "filename" in any case?

I'm sure other issues will unfold.

Don't know enough, or haven't figured out enough to really go further then a method such as that?
(Often I don't really know just what might or might not be going on or why ;-).)
therube
 
Posts: 1314
Joined: Mon Jan 18, 2016 6:23 pm

Re: Replace multiple occurrences?

Postby Admin » Thu Jul 07, 2016 10:33 pm

BRU Javascript :

Code: Select all
newName = name.replace(/([A-Z])/g, ' $1').trim()
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Add a space before each capital letter in the file name

Postby therube » Fri Jul 08, 2016 1:39 am

Oh, wow!
Then that is another method to solve, Add a space before each capital letter in the file name, as 7:Add -> Word Space does precisely that.


(LOL. One day I may just get around to reading the docs. Might help :-).)
therube
 
Posts: 1314
Joined: Mon Jan 18, 2016 6:23 pm

Re: Add a space before each capital letter in the file name

Postby Admin » Fri Jul 08, 2016 4:22 pm

Yes, I also overlooked that one! :)
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Add a space before each capital letter in the file name

Postby bru » Sat Feb 15, 2020 11:09 pm

Hi, I was asked to try to provide some purely-regex solutions, but I'm afraid this is probably the best BRU can do:

^([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?){0,1}([A-Z][^A-Z]*?)([A-Z][^A-Z]*)(.*)
\1 \2 \3 \4 \5 \6 \7 \8\9

It does word-space anything from 2-8 'words' (9 grabs everthing-else), but check out all the spaces in replacement.
While the {0,1}-match protects against non-occurences in Groups1-6 (for names of few words), the spaces are hard-coded.
So you still need something like #5 D/S to fix them.. Oh well, I tried.

With BRC, you can batch it like:
@echo off
Set reg=/Regexp:"^(.[^ ]*)([A-Z].*):\1 \2"
Set max10=%reg% %reg% %reg% %reg% %reg% %reg% %reg% %reg% %reg% %reg%
brc64 /Dir:"YourFolderPath" /Pattern:"*.mp3" %max10% %max10% /Execute
pause>nul

The regex just inserts Space inbetween a final (NonSpace)(Uppercase).
BRC runs it 20 times, so as-written handles up to 20 'words'.. Throw in another %max10% for 30, etc.
You can remove /Execute to preview the results.
Cheers!
bru
 
Posts: 62
Joined: Wed Jan 31, 2018 7:35 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Sun Feb 16, 2020 5:38 am

For BRU I had something similar that I told you about -

------------
Each Capture Group captures a full part of each word:

Match: (.+?)([A-Z].*?)([A-Z].*?)([A-Z].*?)([A-Z].*)
Replace: \1 \2 \3 \4 \5

This works but it is very limited. It will only work on strings that consist of 5 words. Sure it can be adapted for up to a total of 9 words (The total number of Capture Groups)
but it is too specific - just add an additional ([A-Z].*?) before the ([A-Z].*) at the end for each additional word.

---------------

However the BRC that you added was just what I wanted. Thanks. I analyzed it and saw that you got around the problem of locking on to a hyphen by backtracking to capture the last word preceding
the final <space> for each subsequent run using (.[^ ]*). Nice. Simple too! My attempts were trying to capture moving forward through the string and that is how I ended up with I FeelTheEarthMove, etc.

Thanks again.
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Sun Feb 16, 2020 8:13 pm

Bru, I was wondering if you could shed some light on the following based on your RegEx (not the BRC) that I need to
complete my analysis - the part where I say 'I'm guessing' in the notation text below - I would rather not guess and if you could
clarify specifically as to whether I am correct as to why Capture Group one's value never changed.
- the following is a brief section taken from my analysis:
----------------------------------------------------

step 26. {0,1}
Specify that the last expression, ([A-Z][^A-Z]*?) run a minimum of Zero to 1 time.
Capture Group 5 = Unchanged

step 27. [A-Z]
Match against a class consisting of an uppercase letter
Already at EOF, RegEx engine backtracks to M.
Capture Group 6 =M

Changes Capture Group 5 = Earth
Changes Capture Group 4 = The
Changes Capture Group 3 = Feel
Changes Capture Group 2 = null
Capture Group 1 = Unchanged = I


Notes:

1. This changes all of the values in the Capture Groups except for Capture Group 1 because of two uppercase letters, ‘I’
and “F’ which didn’t match the [^A-Z] to what followed (or if backtracked – preceded? It gets confusing). I am
guessing at this point.

----------------------------

Thanks in advance
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Re: Add a space before each capital letter in the file name

Postby bru » Tue Feb 18, 2020 7:28 pm

No problem, always happy to help out.

The group, when it matches, is always: 1UppercaseOnly,Any#OfNonUppercases(including-none).
After looking at it, I should've colored groups 7/8.. Its ahame we cant edit them afterwards.
The way its written, only Groups7/8 are mandated, since they lack {0,#} or *

Crazy it sounds: Matching begins at Groups7/8, then goes from Group1->6, and finally Group9 for 9-or-more 'words'.
In the case of IFeelTheEarthMove: 1=I, 2=Feel, 3The, 4nul, 5nul, 6nul, 7=Earth, 8=Move, 9nul

If I'm not making sense, create some names like: IFeel, IFeelThe, IFeelTheEarthMovingThruTheMilkyWayGalaxy, etc.
Then use a replacement like: 1\1 . 2\2 . 3\3 . 4\4 . 5\5 . 6\6 . 7\7 . 8\8 . 9\9 (spaces versus . since forum hates double-spacing).
That shows which groups are matching what, so whever you see like 3 . 4 . 5 without any text, you know those groups are nul.

If you do alot of capture expiramenting, it helps to save just-the-replacement into a favourites-file like FindGroups.bru
Again, thanks for all the hard work in creating the manuals.. Any questions, please feel free to ask.
Hope it helps.
bru
 
Posts: 62
Joined: Wed Jan 31, 2018 7:35 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Tue Feb 18, 2020 9:19 pm

Perhaps I wasn't clear - I understand about groups 7-9 (already in the analysis) , but I am referring to the part in my analysis where
Capture Group 6 initially takes the value of 'M', and this changes the values in Capture Groups 5 -2 - as it should (not disputing this)
to reflect that Capture Group 6 = M where previously Capture Group 5 = M. This forces a recalculation of the values in Capture
Groups 2 - 5 - again, as I expect it to (again, not disputing this either).

But - My only question is to clarify why Group 1's value does not change (not recalculated). In other words, Group 2 changes to a null value.
So instead of:

Group 2 = I
Group 1= null

You have -

Group 2 = null
Group 1 =remains unchanged at I

I believe that this occurs because of the two uppercase letters in sequence 'IF' (the RegEx engine is backtracking). I only need for you to concur or provide an
explanation as to why I am wrong because I would rather not have misinformation in the book if it could be avoided.

Thanks.
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Re: Add a space before each capital letter in the file name

Postby bru » Wed Feb 19, 2020 3:05 am

To be honest, I'm not sure what you mean about only Group1 not changing, this should apply to all Groups 1-7 (once they match).
Note I can only see steps 26/27 from your exerpt.. What app are you using to generate the intermediate results??
I can definitely say that consecutive uppercases do not cause any backtracking.

Backtracking only occurs when an inital group captures too much, causing a subsequent match to fail.
The leading groups ([A-Z][^A-Z]*?){0,1} can fail to match (with names of too few words), but when they match:
It is always: 1Uppercase,Any#NonUppercase, since AnyNextGroup must begin as 1Uppercase, its impossible to match too much.

You could easily troubleshoot with names like IFTEM, but I've tried and get exactly the same results.
Would love to get to the bottom of this.
bru
 
Posts: 62
Joined: Wed Jan 31, 2018 7:35 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Wed Feb 19, 2020 5:14 am

I wish there was a way I could send you just one page of the analysis - I analyze each character - each part of the expression tracing where the
RegEx engine takes me - I don't just look at the end result - as for the program I use - painstakingly enter each character and test it in BRU for all possible values:
------------

Step 27 through 31 is an analysis of Capture Group 6 ([A-Z][^A-Z]*?){0,1}.

Current values before Capture Group 6 evaluation and at the conclusion of Capture Group 5 are:

Capture Group 5 = M
Capture Group 4 = Earth
Capture Group 3 = The
Capture Group 2 = Feel
Capture Group 1 = I

Current position of RegEx is M of Move

With the evaluation of {0,1} this changes to:
Capture Group 5,4,3 = Invalid
Capture Group 2 = F
Capture Group 1 = I

Current position is F of Feel

This is because the last expression [A-Z][[^A-Z]*? runs zero times (the minimum)
that is until the next [A-Z] beginning Capture Group 6

27. [A-Z]
Match against a class consisting of an uppercase letter = Capture Group 6 =M

Forces Capture Group 6 = M, so previous capture of M in Capture Group 5 is invalidated.
All other Capture Group’s values are recalculated to reflect the backtracking. This leaves Capture Group 5 with a null value.

Changes Capture Group 5 = null
Changes Capture Group 4 = Earth
Changes Capture Group 3 = The
Changes Capture Group 2 = Feel
Capture Group 1 = Unchanged = I

Current position is M of Move

28. [^A-Z] Ignore next uppercase letter = Capture Group 6 = Mo
What changed is the RegEx engine finds the lowercase ‘o’ and captures it.

Current position is o of Move

29. * Make it Greedy = Capture Group 6 = Move

Current position is e of Move

30. ? But not too Greedy (Non-Greedy) = Capture Group 6 = M
RegEx engine backtracking

Current position is M of Move

31. {0,1} Specify that the last expression, ([A-Z][^A-Z]*?) run a minimum of Zero to 1 time = Capture Group 6,5,4,3 = Invalid
Changes Capture Group 2 = F

Current position is ‘F’ of ‘Feel’


Visualize:

Step 27 - Current position is ‘M’ of ‘Move’
Step 28 - Current position is ‘o’ of ‘Move’
Step 29 - Current position is ‘e’ of ‘Move’
Step 30 - Current position is ‘M’ of ‘Move’
Step 31 – Current position is ‘F’ of Feel
Running zero times, it backtracks to ‘F’ invalidating all of the values held by Capture Groups 6-3

-------------

No this is not going to do it I am not getting my point across even as I read through this response. In the book I created a visualization using a formatting with arrows
to show how the RegEx engine moves through the string. I can't demonstrate that here. I see the engine moving back and forth trying to satisfy the expression.
I express this in many of the detailed analysis of many of the examples I write In the book. Too bad I can't send you a couple pages of what I am talking about.
Geez I hope I am not wrong - that would be a lot of analysis that would have to be done over or not done.. tired again..

anyway,
for example,

in steps 27 through 29 I see it moving forward as the current position is M, then O then E
but then the value changes back to M in step 30 and so there must be backtracking to move back to the M and then to the F in step 31.
As I said every step is traced. The same with the other Capture Groups until 8 and 9 where it only moves forward to EOF and ends.
It moves forward because there is no Non-Greedy metacharacter ? on the * for these last two groups - This is one of the reasons I see it
backtrack to M and {0,1} causing it to fall back to F before the next match of [A-Z] moving forward again.

Hard to explain. I tried.
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Wed Feb 19, 2020 4:33 pm

I have requested that my email be made available to you - hopefully they will comply. When and if they do, send me your email so that I may send
you the section.
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Re: Add a space before each capital letter in the file name

Postby trm2 » Thu Feb 20, 2020 1:10 am

Okay, bru. I got word that my email address was given to you. Please reach out - I have the section all ready to be sent.
trm2
 
Posts: 156
Joined: Wed Jan 15, 2020 12:47 pm

Next

Return to Regular Expressions


cron