Add Underscore to CamelCase

A swapping-ground for Regular Expression syntax

Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 8:47 pm

I'm hoping there is an easy solution to my regex replacement problem here. I can seemingly detect fine, it's the use of the found item as a variable in the replacement that's a problem.

I have filenames currently done in camelcase, for example: StaffMeeting.doc
I need to insert underscores between the words identified by the camelcase change, for example: Staff_Meeting.doc

I've pulled a lot of different source from other sites searching for good ways to approach this, and the regex I'm now working with using the regex coach is (((?<=[a-z])[A-Z])|([A-Z](?![A-Z]|$)))

That regex does identify the capitals fairly well (though it also picks up on the beginning capital that I do not want changed), but my problem is that when trying to do a replacement I don't understand how to include the found item in the replacement string. So for example if i just use _ for the replacement string, I get this result:

_taff_eeting.doc

So is there a trick I'm missing that allows me to include the found text from the first regex in the replacement result? Something like:

_<variable>

and is it possible to make my find regex ignore the first character in the string? Thanks in advance for any help and/or nudging in the right direction.
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 8:55 pm

Well thankfully it was fairly obvious, in that the variable can be added in like so:

_\1

So now my issue is that the find needs to ignore the leading capital. Any help with that would of course still be greatly appreciated.
thanks,
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby Admin » Wed Feb 20, 2008 9:02 pm

Can you go about this the opposite way, and look for a lower case letter followed by a capital?


Jim
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 9:07 pm

Sure, I don't care what it looks for just as long as the result has underscores inserted between the words. And actually looking for a lowercase character may be better so that it ignores the acronyms in some filenames.

Also in testing this with BRU, the expression I have been using for the find doesn't work at all. Was good in Regex coach, with the g checked, but not in BRU.

Thanks,
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 9:29 pm

Ok so going down that path I've written this simplified expression: [a-z]([A-Z]{1})

And again in Regex coach it's close to working with StaffMeeting.doc becoming Staf_Meeting.doc, so there would be just the issue of the lowercase character getting dropped, but I tried that in BRU, and it's nothing like the Regex coach. Is there some other trick to getting the coach to work more like BRU so that I can test in the right direction?

Using the BRU built-in preview is confusing to me, in that it doesn't seem to match the (albeit limited) regex understanding I have. For example using that same filename if I look for [a-z] it'll do the replacement on all characters, including the uppercase and even drops the period and extension, even though I've told it to ignore the extension in all cases. So now I'm totally confused.
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 9:39 pm

And finally, I have this expression that works perfectly in the Coach:

find: ([a-z])([A-Z]{1})
replace: \1_\2

but fails with BRU. Is there a way to make BRU work with that?
thanks,
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby Admin » Wed Feb 20, 2008 9:50 pm

I don't know how the Coach works, I've never tried it, but my guess is that the {1} notation is supported in Coach but not in PCRE (the library I use). BRU just does a simple pattern-match, and then retrieves back the \1 \2 \3 etc values from the PCRE engine. Whatever expression you use, it will have to be able to return back \1 \2 \3 etc. values.

I do not know if PCRE supports Global Replace (which sounds like the "g" option in Coach).



Jim
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 10:02 pm

Yeah I didn't get the need to match all elements of the string in order to variablize them. But that being the case I see what may need to be done. What I was getting is that for the filename StaffMeeting.doc the expressions ([a-z])([A-Z]{1}) and 1_\2 Returned f_M. so it was dropping characters not pickedup by the find expression. So I've made this change:

(.*)([a-z])([A-Z]{1})(.*)

And that works great in BRU to make StaffMeeting.doc into Staff_Meeting.doc. But when I try it on StaffMeetingMinutes.doc, the result is StaffMeeting_Minutes.doc, so it's only performing the find once because of those other bookend checks to pull in the other characters.

Do you think there's any way to make BRU run on all occurrences of the found substring the first time? I suppose I could try running that change over and over till all filenames have been converted, but I need to change a truly large number of old files, so if possible the run-once option would be much easier.
Thanks for the help,
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby Admin » Wed Feb 20, 2008 10:26 pm

I've just trawled through the PCRE documentation, and from what I can see there's no way to force a global operation in the version I use. So I'm afraid it doesn't look likely.

A poor workaround would be to use the Character Translation feature. Set up 26 entries in the Character Translattion list as follows:

A=_A
B=_B
C=_C

etc.

This will insert the underline before every capital. Run through all your files with this, and then as a last command simply remove the first character. Not ideal, but it would do the job.


Jim
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Add Underscore to CamelCase

Postby kjhatch » Wed Feb 20, 2008 11:04 pm

I have a few thousand files to convert, but with the working script that just need multiple runs, that'd probably be faster than doing the char replacements. I have to catch a few exceptions, convert hyphens and spaces, and drop to lowercase (the later two which are incredibly easy with BRU of course), so I'll just run repeatedly till it gets most all of them.

---
Yeah it only took a dozen runs, so a little brute force, but so much easier than manual changes. Thanks for all the help (and the great tool).
--Kevin
kjhatch
 
Posts: 7
Joined: Wed Feb 20, 2008 8:20 pm

Re: Add Underscore to CamelCase

Postby Antoni Gual » Fri Jan 08, 2016 10:56 pm

Hi
I found this thread by Googling my problem (adding spaces to camel case file names). I found myself the solution I post here for if someone else reaches this dead thread...

I'm using brc brc 1.3.3.0 (the console version)
I solved my problem by running several times
Code: Select all
brc /regexp:"(.+\S)([A-Z].+)":"\1 \2" /execute

Each execution adds a space before a Cap, starting at the end of the name. It does'nt add spaces at the start nor doubles spaces. When no further name is modified you're finished. It should require less runs than substituting in turn each uppercase letter to space-letter

For kjhatch's problem an aditional REPLACE space to inderscore would be needed. Underscore is a letter for regexp so it's better to use a space as intermediate symbol.

Cheers
Antonio
Antoni Gual
 
Posts: 1
Joined: Fri Jan 08, 2016 10:09 pm


Return to Regular Expressions