Bulk Rename Utility

by **TheGhost78** » Fri Feb 21, 2025 3:08 pm

Several character groups are replaced with a double space when ticking Accents in Remove (5):

https://qaz.wtf/u/show.cgi?show=%F0%9D%90%80%F0%9D%92%A6%F0%9D%93%90%F0%9D%94%B8%F0%9D%97%94%F0%9D%98%BD%F0%9D%90%9A%F0%9D%91%92%F0%9D%92%B6%F0%9D%93%AE%F0%9D%94%B6%F0%9D%95%92%F0%9D%97%AE%F0%9D%99%99%E3%80%80&type=string

by **Luuk** » Fri Feb 21, 2025 10:01 pm

So that nobody gets confused, this answer is only for... https://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=4&t=6882&start=15#p19518

Orginally, when you posted about the combining-diaeresis and combining-cedilla, I thought that those-two were the only combining-diacritics!
So I just provided \xcc[\xa7\x88]/g to match them, but once you provided http://www.fileformat.info/info/unicode/block/combining_diacritical_marks/list.htm
I'm changing it to \xcc[\x80-\xbf]|\xcd[\x80-\xa2]/g to match all of them (except the last-13 looking like regular characters).

So you could remove your \xA7\x88, since both of them are already within \x80-\xbf's range.
Also, be careful adding + in front of /g, it wont hurt with your deletions, but replacements are different.
The /g always matches "all", but if you started using RegEx(1) to conduct re-mapping, the difference would be...

[ä]/g
a
AäääZ ----> AaaaZ

[ä]+/g
a
AääääZ ---> AaZ

So the way you're using it so far (even with \xA7\x88), there's not to be any problems! Just make sure re-mappings conduct 1st.
Also, if you wanted to make both of those minor changes, then the whole new "Match" would look like...
\xcc[\x80-\xbf]|\xcd[\x80-\xa2]/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]/g

Also, if your remaps of both € and £ conduct before RegEx(1), then (?!€|£) is not needed to prevent their deletions.

by **Luuk** » Sun Feb 23, 2025 1:51 am

Many apologies!! If deciding to make both changes, then the whole "Match" to instead look like...
\xcc[\x80-\xbf]|\xcd[\x80-\xa2]/g(?X)(?!€|£)[\xc2-\xf4][\x80-\xbf]+/g(?X)[^]A-Za-z0-9@_',;!£$€%&=#~ `^\-+[(){}.]/g

During the editing, I accidentally removed ALL of the + signs, instead of just the added-ones (again NO problem with deletions).
The original [\x80-\xbf]+ is critical, to match 1-or-more trailing-unicode-bytes (for all 2, 3, and 4-byte characters) to be deleted.
The (?!€|£) is only needed to exempt those 2-characters from deletion, if they're not getting replaced before RegEx(1) conducts?

by **TheGhost78** » Sun Feb 23, 2025 3:04 pm

Thanks, Luuk. No, I'm keeping the pound and Euro signs in the filenames.

by **Luuk** » Sun Feb 23, 2025 10:08 pm

TheGhost78 wrote:Several character groups are replaced with a double space when ticking Accents in Remove (5):
https://qaz.wtf/u/show.cgi?show=%F0%9D%90%80%F0%9D%92%A6%F0%9D%93%90%F0%9D%94%B8%F0%9D%97%94%F0%9D%98%BD%F0%9D%90%9A%F0%9D%91%92%F0%9D%92%B6%F0%9D%93%AE%F0%9D%94%B6%F0%9D%95%92%F0%9D%97%AE%F0%9D%99%99%E3%80%80&type=string

1st-paragraph: https://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=4&t=6882#p19471

by **Admin** » Thu Feb 27, 2025 4:36 am

TheGhost78 wrote:Several character groups are replaced with a double space when ticking Accents in Remove (5):

https://qaz.wtf/u/show.cgi?show=%F0%9D%90%80%F0%9D%92%A6%F0%9D%93%90%F0%9D%94%B8%F0%9D%97%94%F0%9D%98%BD%F0%9D%90%9A%F0%9D%91%92%F0%9D%92%B6%F0%9D%93%AE%F0%9D%94%B6%F0%9D%95%92%F0%9D%97%AE%F0%9D%99%99%E3%80%80&type=string

This is a bug with Remove -> Accents and characters which are surrogate pairs, like the examples you posted.
In a UTF16 encoded string (what BRU supports), each character is a 16-bit code unit. Characters outside the Basic Multilingual Plane (BMP) are represented using a surrogate pair—a combination of two code units:
- The high surrogate is in the range 0xD800–0xDBFF.
- The low surrogate is in the range 0xDC00–0xDFFF.
A character that is a surrogate pair confuses Remove -> Accents in BRU.
We will fix this in the next build.

by **Admin** » Tue Mar 04, 2025 12:21 am

Released:
viewtopic.php?f=1&t=6903

Bulk Rename Utility

Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Matching hex-bytes with RegEx(1)

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'

Remove(5) "Accents"

Re: Character replacement results in 'replacement character'

Re: Character replacement results in 'replacement character'