Bulk Rename Utility

by **TheGhost78** » Wed Feb 19, 2025 1:31 pm

Here are examples of the combining diaeresis and combining cedilla:
https://qaz.wtf/u/show.cgi?show=a%CC%88%7Cc%CC%A7%7C&type=string

by **TheGhost78** » Wed Feb 19, 2025 9:15 pm

Can you check Î (u00CE) and Þ (u00DE), please, as I think BRU sees them as the same character. I am using Replace (3) and whichever character is first in the list, that character's replacement is set for both.

I then tried using Character Translations (14) instead of Replace (3) and for both Þ (u00DE) and þ (u00FE), it removed the character rather than replace it.

Þ=Th
þ=th

by **TheGhost78** » Wed Feb 19, 2025 11:03 pm

Just realised you can only replace with a single character in Character Translations!

TheGhost78 wrote:I then tried using Character Translations (14) instead of Replace (3) and for both Þ (u00DE) and þ (u00FE), it removed the character rather than replace it.

Þ=Th
þ=th

by **Admin** » Thu Feb 20, 2025 1:40 am

If you find additional accents that should be removed, they can be added to Remove (5) -> Accents.

by **Luuk** » Thu Feb 20, 2025 2:34 am

With Special(14)'s "Character Translations", if converting One-character ---> Many-characters
The Many-characters should be comma-separated, so more like...
Þ=T,h
þ=t,h

by **TheGhost78** » Thu Feb 20, 2025 2:39 am

Thanks, Luuk! Will someone look into Î (u00CE) and Þ (u00DE)?

by **TheGhost78** » Thu Feb 20, 2025 3:24 am

The order makes a difference in Replace (3); not sure if this is deliberate. This does not work correctly for the e and i of the red group:

Replace="?|?|?|?|?| - -| - - | --- | -- |_ _|_ |ä|ç|~|¦|§|©|«|¬||®|±|²|³|´|µ|·|¹|»|½|Æ|Ð|Ø|ß|æ|ð|ø|?|?|?|?|?|Œ|œ|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|–|—|‘|’|‚|“|”|•|…|?|?|›|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|™|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????"

With="-|R|r|H|h| - | - | - | - | - | - |a|c|-|-|S||-|-|-||+-|2|3|'|u|-|1|-|1-2|Ae|D|O|ss|ae|d|o|D|H|i|L|l|Oe|oe|T|3|I|G|A|C|E|Y|G|I|N|r|u|L|'|A|E|H|O|I|A|M|N|E|O|P|T|Y|O|a|B|e|n|I|C|T|a|B|r|e|n|p|y|b|R|E|s|i|r|e|l|e|A|C|D|E|M|O|T|W|D|H|-|-|-|-|'|'|,|''|''|-|...|'|''|-|!|-|!|0|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|W|K|T|P|G|A|B|H|L||P|R||1-3|-|-|A|-|-|-|-| -|-|-|-|!|-| |(|)|[|]|-|s| -|4|-|(|)|-| -||A|B|D|E|F|H|I|M|N|O|R|S|T|V|W|Y| -|A|E|H|I|M|P|R|T|W|Y|a|b|e|f|g|h|i|l|o|p|r|s|t|u|y|e|K|a|l|r|u|y|A|e|i|l|m|e|f|r|y|A|D|G|I|J|K|L|O|S|T|W|X|Y|a|e|f|h|i|n|o|p|r|s|t|w|x|z|A|H|N|R|T|a|b|c|d|e|i|m|n|o|p|r|s|t|u|v|y|B|C|D|E|I|L|M|N|O|R|S|U|d|e|i|n|o|s|t"

But this does:

Replace="????|????|????|????|????|????|????|?|?|?|?|?| - -| - - | --- | -- |_ _|_ |ä|ç|~|¦|§|©|«|¬||®|±|²|³|´|µ|·|¹|»|½|Æ|Ð|Ø|ß|æ|ð|ø|?|?|?|?|?|Œ|œ|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|–|—|‘|’|‚|“|”|•|…|?|?|›|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|™|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????"

With="d|e|i|n|o|s|t|-|R|r|H|h| - | - | - | - | - | - |a|c|-|-|S||-|-|-||+-|2|3|'|u|-|1|-|1-2|Ae|D|O|ss|ae|d|o|D|H|i|L|l|Oe|oe|T|3|I|G|A|C|E|Y|G|I|N|r|u|L|'|A|E|H|O|I|A|M|N|E|O|P|T|Y|O|a|B|e|n|I|C|T|a|B|r|e|n|p|y|b|R|E|s|i|r|e|l|e|A|C|D|E|M|O|T|W|D|H|-|-|-|-|'|'|,|''|''|-|...|'|''|-|!|-|!|0|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|W|K|T|P|G|A|B|H|L||P|R||1-3|-|-|A|-|-|-|-| -|-|-|-|!|-| |(|)|[|]|-|s| -|4|-|(|)|-| -||A|B|D|E|F|H|I|M|N|O|R|S|T|V|W|Y| -|A|E|H|I|M|P|R|T|W|Y|a|b|e|f|g|h|i|l|o|p|r|s|t|u|y|e|K|a|l|r|u|y|A|e|i|l|m|e|f|r|y|A|D|G|I|J|K|L|O|S|T|W|X|Y|a|e|f|h|i|n|o|p|r|s|t|w|x|z|A|H|N|R|T|a|b|c|d|e|i|m|n|o|p|r|s|t|u|v|y|B|C|D|E|I|L|M|N|O|R|S|U"

by **TheGhost78** » Thu Feb 20, 2025 3:32 am

120410 / 1D65A - e
120414 / 1D65E - i

by **Luuk** » Thu Feb 20, 2025 5:54 am

In my original reply, the link was for another user, who was also having problems with unicodes and "|" in Replace(3).
So now Im guessing its probably best to avoid using Replace(3), when there's many different unicodes to be replaced?
The original poster seemed to only have problems with some 3-byte unicodes, but now I see its also some 2-byte ones!

It seems like the character-bytes of some characters, keep Replace(3) from interpeting '|' as 'Next-Match' ??
But RegEx(1), Character-Translations, and Javascript dont seem to have any problem with those characters.

Also, to remove combining-diaeresis/cedilla from 3-byte unicodes, the v2 RegEx(1) can use a "Match" like...
\xCC[\xA7\x88]/g

Or if you need it to look more logical...
(.)(\xCC\xA7|\xCC\x88)/g
$1

Also, you could just add either of the regexs into the javascript-code, to keep everything there.
But of course, "Character Translations" still needs 1-line for each character to be translated.

Remember, some 2-byte/3-byte unicodes look identical, so ä (the 2-byte one) would not be converted.
So all other non 3-byte characters would still have to be mapped, and Im not know any of them.
Im just now learning about many of them, as you post the characters to other pages, etc.

If you have any kind of a complete list, to say their conversion-characters...
I'm sure I could convert it into either regex, javascript, or whatever you prefer.
But of course if you did, you'd probably already be fininshed by now, so still experimenting.

Anyway, feel free to post any more problem characters!

by **Admin** » Thu Feb 20, 2025 10:33 am

Hi, which characters / unicode codes are not handled correctly in Replace (3) ? I can check if it's a BRU issue... thanks

by **TheGhost78** » Thu Feb 20, 2025 11:43 am

I'm just adding them as I find them in filenames, which is a bit painstaking! However, it may just be better to remove whole blocks at a time for things like diacritics, as Luuk mentioned above.

This site may be useful:https://www.fileformat.info/info/unicode/block/index.htmhttps://www.fileformat.info/info/unicode/block/combining_diacritical_marks/index.htmhttps://www.fileformat.info/info/unicode/block/combining_diacritical_marks/list.htm

by **TheGhost78** » Thu Feb 20, 2025 4:41 pm

120394 / 1D64A - ????

It looks like this character may be a problem as well. I had to move it earlier in the Replace (3) list to get it to work properly.

by **Luuk** » Fri Feb 21, 2025 12:45 am

To remove all combining-characters from http://www.fileformat.info/info/unicode/block/combining_diacritical_marks/list.htm
Then "v2" RegEx(1) could use a "Match" something like...
\xcc[\x80-\xbf]|\xcd[\x80-\xaf]/g

But I would recommend changing af--->a2, since the last 13 combining-characters look exactly like regular letters!
Especially if you prefer mapping any of those characters into 2-letters, instead of just removing the combining-one?
Also, any look-alike characters that dont use these combiners, do still need to be mapped.

by **TheGhost78** » Fri Feb 21, 2025 2:33 pm

Based on your previous responses, my current RegEx formula is:

\xcc[\xA7\x88]+/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]+/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]+/g

So would this be correct with the new additions?

\xcc[\xA7\x88\x80-\xbf]|\xcd[\x80-\xa2]+/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]+/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]+/g

by **TheGhost78** » Fri Feb 21, 2025 2:38 pm

120398 / 1D64E / uDE4E - ???? - appears to be another problem character that has to be near the start of the list in Replace (3).

Bulk Rename Utility

Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Bug?

Re: Bug?

Re: Character replacement results in 'replacement character'

Character Translations

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Remove combining-diaeresis/cedilla from 3-byte UTF8 unicodes

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Remove combining-characters from 3-byte UTF8 unicodes

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'