Character replacement results in 'replacement character'

Post any Bulk Rename Utility support requirements here. Open to all registered users.

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Wed Feb 19, 2025 1:31 pm

Here are examples of the combining diaeresis and combining cedilla:
https://qaz.wtf/u/show.cgi?show=a%CC%88%7Cc%CC%A7%7C&type=string
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Bug?

Postby TheGhost78 » Wed Feb 19, 2025 9:15 pm

Can you check Î (u00CE) and Þ (u00DE), please, as I think BRU sees them as the same character. I am using Replace (3) and whichever character is first in the list, that character's replacement is set for both.

I then tried using Character Translations (14) instead of Replace (3) and for both Þ (u00DE) and þ (u00FE), it removed the character rather than replace it.

Þ=Th
þ=th
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Bug?

Postby TheGhost78 » Wed Feb 19, 2025 11:03 pm

Just realised you can only replace with a single character in Character Translations!

TheGhost78 wrote:I then tried using Character Translations (14) instead of Replace (3) and for both Þ (u00DE) and þ (u00FE), it removed the character rather than replace it.

Þ=Th
þ=th
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Character replacement results in 'replacement character'

Postby Admin » Thu Feb 20, 2025 1:40 am

If you find additional accents that should be removed, they can be added to Remove (5) -> Accents.
Admin
Site Admin
 
Posts: 2883
Joined: Tue Mar 08, 2005 8:39 pm

Character Translations

Postby Luuk » Thu Feb 20, 2025 2:34 am

With Special(14)'s "Character Translations", if converting One-character ---> Many-characters
The Many-characters should be comma-separated, so more like...
Þ=T,h
þ=t,h
Luuk
 
Posts: 803
Joined: Fri Feb 21, 2020 10:58 pm

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Thu Feb 20, 2025 2:39 am

Thanks, Luuk! Will someone look into Î (u00CE) and Þ (u00DE)?
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Thu Feb 20, 2025 3:24 am

The order makes a difference in Replace (3); not sure if this is deliberate. This does not work correctly for the e and i of the red group:

Replace="?|?|?|?|?| - -| - - | --- | -- |_ _|_ |ä|ç|~|¦|§|©|«|¬|­|®|±|²|³|´|µ|·|¹|»|½|Æ|Ð|Ø|ß|æ|ð|ø|?|?|?|?|?|Œ|œ|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|–|—|‘|’|‚|“|”|•|…|?|?|›|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|™|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????"

With="-|R|r|H|h| - | - | - | - | - | - |a|c|-|-|S||-|-|-||+-|2|3|'|u|-|1|-|1-2|Ae|D|O|ss|ae|d|o|D|H|i|L|l|Oe|oe|T|3|I|G|A|C|E|Y|G|I|N|r|u|L|'|A|E|H|O|I|A|M|N|E|O|P|T|Y|O|a|B|e|n|I|C|T|a|B|r|e|n|p|y|b|R|E|s|i|r|e|l|e|A|C|D|E|M|O|T|W|D|H|-|-|-|-|'|'|,|''|''|-|...|'|''|-|!|-|!|0|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|W|K|T|P|G|A|B|H|L||P|R||1-3|-|-|A|-|-|-|-| -|-|-|-|!|-| |(|)|[|]|-|s| -|4|-|(|)|-| -||A|B|D|E|F|H|I|M|N|O|R|S|T|V|W|Y| -|A|E|H|I|M|P|R|T|W|Y|a|b|e|f|g|h|i|l|o|p|r|s|t|u|y|e|K|a|l|r|u|y|A|e|i|l|m|e|f|r|y|A|D|G|I|J|K|L|O|S|T|W|X|Y|a|e|f|h|i|n|o|p|r|s|t|w|x|z|A|H|N|R|T|a|b|c|d|e|i|m|n|o|p|r|s|t|u|v|y|B|C|D|E|I|L|M|N|O|R|S|U|d|e|i|n|o|s|t"


But this does:

Replace="????|????|????|????|????|????|????|?|?|?|?|?| - -| - - | --- | -- |_ _|_ |ä|ç|~|¦|§|©|«|¬|­|®|±|²|³|´|µ|·|¹|»|½|Æ|Ð|Ø|ß|æ|ð|ø|?|?|?|?|?|Œ|œ|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|–|—|‘|’|‚|“|”|•|…|?|?|›|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|™|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????|????"

With="d|e|i|n|o|s|t|-|R|r|H|h| - | - | - | - | - | - |a|c|-|-|S||-|-|-||+-|2|3|'|u|-|1|-|1-2|Ae|D|O|ss|ae|d|o|D|H|i|L|l|Oe|oe|T|3|I|G|A|C|E|Y|G|I|N|r|u|L|'|A|E|H|O|I|A|M|N|E|O|P|T|Y|O|a|B|e|n|I|C|T|a|B|r|e|n|p|y|b|R|E|s|i|r|e|l|e|A|C|D|E|M|O|T|W|D|H|-|-|-|-|'|'|,|''|''|-|...|'|''|-|!|-|!|0|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|W|K|T|P|G|A|B|H|L||P|R||1-3|-|-|A|-|-|-|-| -|-|-|-|!|-| |(|)|[|]|-|s| -|4|-|(|)|-| -||A|B|D|E|F|H|I|M|N|O|R|S|T|V|W|Y| -|A|E|H|I|M|P|R|T|W|Y|a|b|e|f|g|h|i|l|o|p|r|s|t|u|y|e|K|a|l|r|u|y|A|e|i|l|m|e|f|r|y|A|D|G|I|J|K|L|O|S|T|W|X|Y|a|e|f|h|i|n|o|p|r|s|t|w|x|z|A|H|N|R|T|a|b|c|d|e|i|m|n|o|p|r|s|t|u|v|y|B|C|D|E|I|L|M|N|O|R|S|U"
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Thu Feb 20, 2025 3:32 am

120410 / 1D65A - e
120414 / 1D65E - i
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Remove combining-diaeresis/cedilla from 3-byte UTF8 unicodes

Postby Luuk » Thu Feb 20, 2025 5:54 am

In my original reply, the link was for another user, who was also having problems with unicodes and "|" in Replace(3).
So now Im guessing its probably best to avoid using Replace(3), when there's many different unicodes to be replaced?
The original poster seemed to only have problems with some 3-byte unicodes, but now I see its also some 2-byte ones!

It seems like the character-bytes of some characters, keep Replace(3) from interpeting '|' as 'Next-Match' ??
But RegEx(1), Character-Translations, and Javascript dont seem to have any problem with those characters.

Also, to remove combining-diaeresis/cedilla from 3-byte unicodes, the v2 RegEx(1) can use a "Match" like...
\xCC[\xA7\x88]/g

Or if you need it to look more logical...
(.)(\xCC\xA7|\xCC\x88)/g
$1

Also, you could just add either of the regexs into the javascript-code, to keep everything there.
But of course, "Character Translations" still needs 1-line for each character to be translated.

Remember, some 2-byte/3-byte unicodes look identical, so ä (the 2-byte one) would not be converted.
So all other non 3-byte characters would still have to be mapped, and Im not know any of them.
Im just now learning about many of them, as you post the characters to other pages, etc.

If you have any kind of a complete list, to say their conversion-characters...
I'm sure I could convert it into either regex, javascript, or whatever you prefer.
But of course if you did, you'd probably already be fininshed by now, so still experimenting.

Anyway, feel free to post any more problem characters!
Luuk
 
Posts: 803
Joined: Fri Feb 21, 2020 10:58 pm

Re: Character replacement results in 'replacement character'

Postby Admin » Thu Feb 20, 2025 10:33 am

Hi, which characters / unicode codes are not handled correctly in Replace (3) ? I can check if it's a BRU issue... thanks
Admin
Site Admin
 
Posts: 2883
Joined: Tue Mar 08, 2005 8:39 pm

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Thu Feb 20, 2025 11:43 am

I'm just adding them as I find them in filenames, which is a bit painstaking! However, it may just be better to remove whole blocks at a time for things like diacritics, as Luuk mentioned above.

This site may be useful:https://www.fileformat.info/info/unicode/block/index.htmhttps://www.fileformat.info/info/unicode/block/combining_diacritical_marks/index.htmhttps://www.fileformat.info/info/unicode/block/combining_diacritical_marks/list.htm
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Thu Feb 20, 2025 4:41 pm

120394 / 1D64A - ????

It looks like this character may be a problem as well. I had to move it earlier in the Replace (3) list to get it to work properly.
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Remove combining-characters from 3-byte UTF8 unicodes

Postby Luuk » Fri Feb 21, 2025 12:45 am

To remove all combining-characters from http://www.fileformat.info/info/unicode/block/combining_diacritical_marks/list.htm
Then "v2" RegEx(1) could use a "Match" something like...
\xcc[\x80-\xbf]|\xcd[\x80-\xaf]/g

But I would recommend changing af--->a2, since the last 13 combining-characters look exactly like regular letters!
Especially if you prefer mapping any of those characters into 2-letters, instead of just removing the combining-one?
Also, any look-alike characters that dont use these combiners, do still need to be mapped.
Luuk
 
Posts: 803
Joined: Fri Feb 21, 2020 10:58 pm

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Fri Feb 21, 2025 2:33 pm

Based on your previous responses, my current RegEx formula is:

\xcc[\xA7\x88]+/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]+/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]+/g

So would this be correct with the new additions?

\xcc[\xA7\x88\x80-\xbf]|\xcd[\x80-\xa2]+/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]+/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]+/g
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

Re: Character replacement results in 'replacement character'

Postby TheGhost78 » Fri Feb 21, 2025 2:38 pm

120398 / 1D64E / uDE4E - ???? - appears to be another problem character that has to be near the start of the list in Replace (3).
TheGhost78
 
Posts: 173
Joined: Fri Jul 19, 2024 11:25 am

PreviousNext

Return to BRU Support