Using Title Enhanced Case with Roman Numerals

Bulk Rename Utility How-To's

Re: Using Title Enhanced Case with Roman Numerals

Postby Admin » Sun Apr 18, 2021 3:38 am

I do not think CIVIC is a valid Roman Number because C (=100) is at the end and it's a 100 value... It has an invalid format even if it's made up of valid Roman Number letters C I V.
Some converters do seem to convert CIVIC to decimal 203, but 203 is actually written as CCIII. We need to integrate that validation logic into BRU to make it perfect. 8)
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Sun Apr 18, 2021 3:50 am

Yeah that one probably is the only one that actually equates to a word. I will try and find that code when I am done with this reload it may provide some insights
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Sun Apr 18, 2021 4:53 am

Yes, Im finally started seeing the patterns!!!

The IXCM's are like 'counters'. They are used once (in front of non-counters) to subtract, or repeated up to three times for adding.
After three times, we use the next 'non-counter', first with 'subtracter' to the left, then alone, then with up to three adders to the right.
Im calling the VLD's non-counters because they cant repeat, so to increase or decrease them, they get bordered on either side by 'counters'.
The left side decreases, and the right side increases... Very logical.

Its unfortunate there can be different notations, so Im staying with 'standard' like the Romans did probably use.
Many sites do say that IIII is illegal, and that standard notion is IV, also the online converters do forbid IIII.
So now Im experimenting with a regex that should only conduct all of the standard Roman-Numerals.

Do you still have many files with Roman-Numerals that did never get capitalized to test the regex?
I had to make a bunch of test files, so it was not really a complete test.
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Sun Apr 18, 2021 5:49 am

I have a few albums that have this type of notation and also some live albums of the studio albums
Rush and Dream Theater are the only ones that come to mind.
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Sun Apr 18, 2021 6:33 am

Which type of notation?? Where 'counters' get used more than three times like IIII ?
Or do you mean both notations, where some files use IIII, but then others use IV ?
Or do some files count even more like VIIIIIII, because most sites report this 'invalid'.

It really doesnt matter, the regex can conduct any of this, but first must know all of the rules.
But Im also thinking if your case is rare, they wont even try to convert the logic for <rnup>.
Im thinking now to maybe write a script, so the regexs can be more completely tested.
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Sun Apr 18, 2021 7:23 am

The song titles use standard Roman Numeral notation. Look up Rush 2112 and Dream Theater Metropolis Part 2: Scenes from a Memory track lists. Also check out Rush Hemispheres track list. They all use standard Roman numeral notion none of the odd ball variations. Those are not legit. Just someone being lazy who doesn't know the Roman Numeric System
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Sun Apr 18, 2021 5:18 pm

Many thanks for clarification. Thats what Im thinking too, because so many websites reporting 'invalid'.
So for now, I have a "v2" regex with a Match and Replace like ...
(?i)((\b|_)(?=[IVXLCDM]{1,15})(M{1,3}){0,1}(C{1,3}|DC{0,3}|C[DM]){0,1}(X{1,3}|LX{0,3}|X[CL]){0,1}(I{1,3}|VI{0,3}|I[VX]){0,1}(\b|_))/g
\U$1

Right now its over complicated, because Im not wishing to simply the expression, until the experiments are completed.
So far, Im testing it with many song names and others, and it does seem to conduct only the standard Roman-Numerals.
If its conducts everything properly, Im thinking a simplified version to look more like...
(?i)((\b|_)(?=[IVXLCDM]{1,15})(M{1,3})?(D?C{0,3}|C[DM])?(L?X{0,3}|X[CL])?(V?I{0,3}|I[VX])?(\b|_))/g

Its using the standard \b word-separators, but I added _ because thinking many people also like _ for a word-separator?
It should only match the true Roman-Numeral 'words', but not the invalid words like 'civil' or 'civic' or 'iiii' or 'vv'.
You could just change the last two 3's --> 4's to also grant matching IIII, but Im not really like format that either.

In my first post, I thought many of those words were valid Roman-Numerals, because getting them from a website.
Lol, but then realized its a scrabble website, and people were just trying to invent words with the Roman-Numerals!!
So at least now, I think I understand all of the 'standard' rules, so this regex should only match those rules.

If you can find any exceptions, please to let me know, because I still believe it does need some more testing.
And the problem is, my different applications want the regex a little differently, so this one is just for BRU.
Also, you can put replacements like.. \U|$1| just for the experiments.
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Sun Apr 18, 2021 7:40 pm

Glad I could help. I understand you're facing, but again let's think of the source. The Romans, a logical, practical and powerful people who prided themselves on order and logic. They would not have had variations in their numbering system. One and done. I have never seen any of these odd ball variations. I remember studying the subject in school and what I showed you is the way I learned it. I don't remember all of it but there was a pattern to the number sequences.
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Mon Apr 19, 2021 1:37 am

The logic is hard to describe with words, but this how Im seeing it...
There is always four RomanNumerals.. So with III, I see the whole RomanNumeral as '000III'.
Or with 'M', I see it as 'M000', except zeroes just fill the place for missing RomanNumerals (not multiplying x10 lol).
So the Roman-Numerals just use a set of characters to create each number, except logically using 'nothing' for zero.

M creates the 1st-number. If not existing, the first number is 0 (nothing in Roman, just like us).
M <==> 1000
M -----> MM ----> MMM
1000 --> 2000 --> 3000

C,D,M create the 2nd-number. If not existing, its another 0 (nothing in Roman).
C <==> 100 (The modifier: used alone, or adds/subtracts to D, and subtracts from M).
D <==> 500
M <==> 1000
C ---> CC ---> CCC ---> CD ---> D ---> DC ---> DCC ---> DCCC ---> CM
100 -> 200 --> 300 ---> 400 --> 500-> 600 --> 700 ----> 800 -----> 900

X,L,C create the 3rd-number. If not existing, just another 0 (nothing in Roman).
X <==> 10 (The modifier: used alone, or adds/subtracts to L, and subtracts from C).
L <==> 50
C <==> 100
X ---> XX ---> XXX ---> XL ---> L ---> LX ---> LXX ---> LXXX ---> XC
10 --> 20 ---> 30 ----> 40 ---> 50 --> 60 ----> 70 -----> 80 ------> 90

I,V,X create the 4th-number.. If not existing, its just nothing.
I <==> 1 (The modifier: used alone, or adds/subtracts to V, and subtracts from X).
V <==> 5
X <==> 10
I ---> II ---> III ---> IV ---> V ---> VI ---> VII ---> VIII ---> IX
1 ---> 2 --> 3 ----> 4 ----> 5 ---> 6 ----> 7 -----> 8 ------> 9

So this how I made the regex, with one group to match each of the four possible 'numbers', but also granting nothing for 'zero'.
I was planning to describe the regex, except now Im getting some errors, so just describing the math instead.
If I can get it to conduct only the valid Roman-Numerals, I will come back to better describe the groups.
Maybe others can look at this math syntax, to figure out some more.
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Mon Apr 19, 2021 2:00 am

Yes you are right it is hard to explain, but there is a cycle to it. Three single digits before cycling to the next character then the digits shift to the right. Poor explanation but I think you get what I mean. Yeah the code that I've seen does some math based on 10 100 and 1000. Do you want me to see if I can find the code.
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Mon Apr 19, 2021 5:22 pm

Its ok, Im understanding the syntax now! Its was just my poor explanations, but I think of each of the four 'digits' as...

<Digit>: <------------Possibile Strings-----------------> ..... <-------Group Testing------> ...... Simplified Untested
1stDigit: M,MM,MM,nothing.................................. (M{1,3}){0,1} ....................... M{0,3}
2ndDigit: C,CC,CCC,CD,D,DC,DCC,DCCC,CM,nothing .... (C{1,3}|DC{0,3}|C[DM]){0,1} .... (D?C{0,3}|C[DN])?
3rdDigit: X,XX,XXX,XL,L,LX,LXX,LXXX,XC,nothing ......... (X{1,3}|LX{0,3}|X[LC]){0,1} ..... (L?X{0,3}|X[LC])?
4thDigit: I,II,III,IV,V,VI,VII,VIII,IX,nothing .................. (I{1,3}|VI{0,3}|I[VX]){0,1} ....... (V?I{0,3}|I[VX])?

So with Roman-Numeral MIII, its not that I really see 'M00III', but instead 'M,nothing,nothing,III' to describe the missing 'digits'.
The romans were too logical, and did not see a reason to use something for nothing, so never saying zero with Roman-Numerals.
But with regex you must match something, even if its nothing for the missing 'digit', if that makes any sense??

So now its just the regex part giving me troubles, and I do not believe its the above groups to be responsible?
It still conducts a few 'invalids', and sometimes if a filename has many Roman-Numerals, it will not conduct them all.
So first I will get some sleep, then do some testing, and report back when I can get some answers.
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby vshanecurtis » Mon Apr 19, 2021 6:59 pm

Yes, I understand. Hopefully you will be able to sort this out. I don't completely understand how RegEx strings work they have always confused me so I can't be much help in that area.
vshanecurtis
 
Posts: 17
Joined: Fri Apr 16, 2021 5:35 am

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Fri Apr 23, 2021 12:54 am

I just wanted to report back, because now there is finally very good news, but also some bad news.
The good news is that the Roman-Numeral groups by themselves, do conduct perfectly!
I just needed to read my own post, because forgetting about song names like...
mi amor =====> MI amor
reggae mix ==> raggae MIX
princess di ==> princess DI

So really, Im thinking these to just be fixed by the user-controlled exception box?
The other problems were all coming from my other groups, besides the ones to match Roman-Numerals.
Its unfortunate, but using \b with _ and also lookarounds, is what presented troubles with names like...
i'm just a girl ===> I'M just a girl
i'd like to teach => I'D like to teach
ii_ii_ii_ii ======> II_ii_II_ii

These problems all came from the groups where Im trying to specify the word separators!
So what happens, is that when "_" gets matched, its gets stolen! So then regex wont match the very next word.
I verified this by making names like ii___ii___ii___ii, to discover it does then properly conduct everything.

But the biggest mystery was.. Why words like i'd and i'm? These 'words' are not even valid Roman-Numerals!!?
But its because Im unfamiliar with \b, I thought there was very few characters (like space) to let \b succeed.
So then researching, it seems there is actually many characters to let it succeed, and ' is one of them!!

The Roman-Numeral groups were not matching I'm as one word, but as two separate words like... I' and 'm !!
So this gets me curious about <rnup> and it conducts likewise! So <rnup> probably uses something similiar to \b ?
So now Im finally discovered whats causing the troubles, so reporting back with at least knowing what to fix.

At first, Im thinking the obvious solution... Stop using \b, and just specify your own custom word separators.
But then, what to do for users who like the Roman-Numeral format in their names to be like... King Henry 'VI' ??
I thought for certain, word separators to be the easy part, but its because Im not experienced with them.

Im actually surprised that regex grants ' to let \b succeed, because many languages do grant 'contraction words'.
So now it seems that the word separator groups is really going to be the most difficult part of the match expression.
The experts is all saying the same things... The only "fix" is either not using \b, or to use the lookaround groups.

So I did some experiments and fixed a few things, but the lookaround groups do always seem to bring more troubles than help.
It seems that whenever I use a lookaround group to fix one thing, it always wants to lookaround for something else to break.
I seriously wonder if its how they get their name! Because if just one character off, it likes to destroy my whole expression.

But the good part is.. Im thinking the program that invents BRU, might have better ways to conduct word boundaries ?!
And also there is the exception box. So at least now I think its proven, that the Roman-Numeral groups do conduct properly.
Except I make a typo in the 'Simplified Untested' column, where there is N for M, but they are all now tested ok now.

At first, Im going to post the improved match with better word separators, except that it does still need much improvement.
So Im still experimenting, if anybody thinks of more valid Roman-Numerals to not capitalize inside the 'contraction' problems.

Im very glad you posted the issue, because its the most Im ever learned in just one post. First the Roman-Numerals were hard enough.
But also, I didnt know you could just type <rnup> in the exception box, I thought you had to use it with 'Title' or something else.

Lol, then I finally discovered which characters do actually let \b succeed, and then even a little more about using /ig.
So now I just need to better learn the lookaround groups, because really Im often just experimenting until getting lucky.
But believe it or not, I already used one lookaround to fix the underscore problem by changing the last (\b|_) ==> (?=\b|_).

So Im very hopeful to fix the other problems also, but if anyone can think of more problems, please to present them here.
And thanks again for clarifying, because without first understanding the Roman-Numerals, I could not begin the experiments.
If I was hacker, I would destroy many websites who present the improper use for Roman-Numerals!
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby Admin » Fri Apr 23, 2021 1:55 am

Hi Luuk,

Are these issues present also with <rnup> in Title Case?

i'm just a girl ===> I'M just a girl
i'd like to teach => I'D like to teach
ii_ii_ii_ii ======> II_ii_II_ii

mi amor =====> MI amor
reggae mix ==> raggae MIX
princess di ==> princess DI

thanks
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Using Title Enhanced Case with Roman Numerals

Postby Luuk » Fri Apr 23, 2021 2:58 am

Yes, the issues do also present when using <rnup> with 'Title', except looking like...
i'm just a girl ====> I'M Just A Girl
i'd like to teach ==> I'D Like To Teach

ii_ii_ii_ii ========> II_II_II_II
mi amor ========> MI Amor
reggae mix =====> Reggae MIX
princess di ======> Princess DI

My regex is doing mostly the exact same things, except I fixed a few of them, but still experimenting.
Many thanks for looking into this!
Luuk
 
Posts: 690
Joined: Fri Feb 21, 2020 10:58 pm

PreviousNext

Return to How-To