Convert Non-Breaking Space to Space

Bulk Rename Utility How-To's

Convert Non-Breaking Space to Space

Postby blueboy714 » Mon Oct 10, 2022 8:56 pm

First time posting
I have non-breaking spaces (hex A0) that I need to convert to a space (hex 20). The only thing I can find is that says to convert from A0 to 20 - but that doesn't make any sense since I have "20" and "A0" character strings.

Do I need to use regular expressions? If so how do I convert? I can't find anything anywhere on this forum or in the BRU app.

Thanks

blueboy714
blueboy714
 
Posts: 3
Joined: Mon Oct 10, 2022 8:51 pm

Convert Non-Breaking Space to Space

Postby Luuk » Tue Oct 11, 2022 12:08 am

Greetings Blueboy71.
Wwith Replace(3) can put a non-breaking-space inside of "Replace", and then a space inside of "With".

With Regex(1) must first put a checkmark inside for "v2", so then a "Match" and "Replace" like...
\xC2\xA0
\x20
Or to replace both non-breaking-spaces and narrow-non-breaking-spaces, it can be more like...
\xE2\x80\xAF|\xC2\xA0/g
\x20
Luuk
 
Posts: 771
Joined: Fri Feb 21, 2020 10:58 pm

Re: Convert Non-Breaking Space to Space

Postby therube » Tue Oct 11, 2022 7:58 pm

I have non-breaking spaces

How did you come about to realize that?


Oh, & I'll note that you can paste that NBSP into 1:RegEx, Match:  , v2, just as you can in 3:Replace.
(PS: I posted a NBSP after "Match: ", above. Not sure how it will turn out in the forum ;-).)
therube
 
Posts: 1360
Joined: Mon Jan 18, 2016 6:23 pm

Re: Convert Non-Breaking Space to Space

Postby Luuk » Wed Oct 12, 2022 1:02 am

With RegEx(1), if wanting to replace the standard-non-breaking-spaces (\xC2\xA0), its ok pasting them into the "Match".
Also the "Match" should have /g added to the end, if wanting to replace many of them inside of the same filename.

If wanting to replace the narrow-non-breaking-spaces, the pasted-character will not conduct properly!
Must either put \ before the pasted-character, or use \xE2\x80\xAF to say narrow-non-breaking-space.
Im doubt the forum to present these characters properly, so just using the \x-formats.

For anybody who is curious, its because the narrow-non-breaking-space is a 3-byte character...
viewtopic.php?f=2&t=5720#p16366
Luuk
 
Posts: 771
Joined: Fri Feb 21, 2020 10:58 pm

Re: Convert Non-Breaking Space to Space

Postby blueboy714 » Wed Oct 12, 2022 5:35 pm

Luuk wrote:Greetings Blueboy71.
Wwith Replace(3) can put a non-breaking-space inside of "Replace", and then a space inside of "With".

With Regex(1) must first put a checkmark inside for "v2", so then a "Match" and "Replace" like...
\xC2\xA0
\x20
Or to replace both non-breaking-spaces and narrow-non-breaking-spaces, it can be more like...
\xE2\x80\xAF|\xC2\xA0/g
\x20


Thanks @Luuk - I thought I knew Regex - although it's been a few years since I retired. I have a couple of questions:

Question 1 - How do put a "non-breaking space" in "Replace(3)"?
Question 2 - specific to "20" and "A0" - Isn't "C2" Â (A with circumflex) - why do I want to convert that? I want to convert a simple space to a non-breaking space or visa versa.

Can you give me the exact Regex code to do this?

Thanks
blueboy714
 
Posts: 3
Joined: Mon Oct 10, 2022 8:51 pm

Re: Convert Non-Breaking Space to Space

Postby therube » Wed Oct 12, 2022 8:23 pm

1. copy & paste
so if you know you have a NBSP in a file name, you can "copy" that character - even though you can't see it, & paste it into 3:Replace - which also will not "show" it, except that you will "see" that it takes up 1 character position in the box.

2. oh that's a tough one (& something that generally twists my mind, in particular with something you cannot "see")
in UTF-8, in hex notation, a NBSP is "C2 A0" (\xC2\xA0)
how a program interprets that NBSP varies, even times within a single program
a DIR will simply display what looks like a space
a ls (a UNIX DIR) if you will, will show a character that looks like, á
this board, may yet again interpret things differently, so once i hit submit in this reply, what display may change from what i'm looking at currently
& so while a  may look like Â, interpreted as UTF-8, it looks like (might look like ?) Â in ANSI (in Notepad++ [the comma-part is part of the "character"]) & is \xC2 interpreted as ASCII or 00C2 in Unicode
and, ? & Â are different too ;-)
^--- see that, the ?, which is really an "A" (looking character), mighty similar looking to the  - but not, is not displayed "correctly" on this board.
also, one's "locale" may have an effect on what is seen
confused? me to :-).

the filename "jsonview-1.1.1 .zip", below, has a NBSP after the final '1', before the '.zip'
& as displayed by various programs:
Code: Select all
ls.exe -l | clip.exe
-rw-rw-rw-  1 RUBEN7 0  28507 2021-07-15 13:00 jsonview-1.0.2.xpi - Copy.xpi.zip
-rw-rw-rw-  1 RUBEN7 0  28442 2017-05-21 02:53 jsonview-1.0.2.xpi.zip
-rw-rw-rw-  1 RUBEN7 0 248874 2022-10-12 12:00 jsonview-1.1.1á.zip
-rw-rw-rw-  1 RUBEN7 0 247166 2021-07-15 13:17 jsonview-1.2.3 - Copy.zip
-rw-rw-rw-  1 RUBEN7 0 245316 2021-07-15 13:16 jsonview-1.2.3.zip
-rw-rw-rw-  1 RUBEN7 0 245986 2021-07-15 13:08 jsonview-1.2.4.zip
-rw-rw-rw-  1 RUBEN7 0 388038 2021-07-15 13:06 jsonview-2.3.0.zip

NDIR.EXE, Version 2.52
        28,507 a____ Jul 15, 2021 13:00:53 jsonview-1.0.2.xpi - Copy.xpi.zip
        28,442 a____ May 21, 2017 02:53:09 jsonview-1.0.2.xpi.zip
       248,874 a____ Oct 12, 2022 12:00:06 jsonview-1.1.1á.zip
       247,166 a____ Jul 15, 2021 13:17:17 jsonview-1.2.3 - Copy.zip
       245,316 a____ Jul 15, 2021 13:16:43 jsonview-1.2.3.zip
       245,986 a____ Jul 15, 2021 13:08:50 jsonview-1.2.4.zip
       388,038 a____ Jul 15, 2021 13:06:29 jsonview-2.3.0.zip

DIR json*
07/15/2021  01:00 PM            28,507 jsonview-1.0.2.xpi - Copy.xpi.zip
05/21/2017  02:53 AM            28,442 jsonview-1.0.2.xpi.zip
10/12/2022  12:00 PM           248,874 jsonview-1.1.1 .zip
07/15/2021  01:17 PM           247,166 jsonview-1.2.3 - Copy.zip
07/15/2021  01:16 PM           245,316 jsonview-1.2.3.zip
07/15/2021  01:08 PM           245,986 jsonview-1.2.4.zip
07/15/2021  01:06 PM           388,038 jsonview-2.3.0.zip
Last edited by therube on Wed Oct 12, 2022 8:43 pm, edited 3 times in total.
therube
 
Posts: 1360
Joined: Mon Jan 18, 2016 6:23 pm

Convert Non-Breaking Space to Space

Postby Luuk » Wed Oct 12, 2022 8:34 pm

With Replace(3) can just copy a non-breaking-space from any filename, then paste it into your "Replace" box.
Then you could just put a space inside of the "With" box to conduct all of your replacements.
============================================================================================

With RegEx(1) just put a checkmark inside for "v2", with a "Match" and "Replace" like...
(\xC2\xA0|\xE2\x80\xAF)+/g
\x20
This to replace both regular and narrow non-breaking-spaces with 1-space, and will never replace any characters like Â.

Here is the UTF8-chart from the link in my last post...
Code: Select all
==UTF8==    Byte1    Byte2    Byte3    Byte4     [Ranges] to match them with "v2" regexs
(ASCII)     00-7F    -----    -----    -----     [\x00-\x7f]
(2-bytes)   C2-DF    80-BF    -----    -----     [\xc2-\xdf][\x80-\xbf]
(3-bytes)   E0–EF    80-BF    80-BF    -----     [\xe0-\xef][\x80-\xbf][\x80-\xbf]
(4-bytes)   F0–F4    80-BF    80-BF    80-BF     [\xf0-\xf4][\x80-\xbf][\x80-\xbf][\x80-\xbf]

So with the UTF8-characters that bru is conducting, these characters will be either 1, 2, 3, or 4-bytes long.
If the 1st-byte is C2, its saying that 1-more byte (inbetween 80-BF) will follow to say "the whole character".
And with UTF8, the  character will always be encoded with 2-bytes, so its like... \xC3\x82

In the very ancient days, like Windows-98 and before, these characters were encoded with extended-ASCII, so only using 1-byte like \xC2.
I think maybe they needed to be very conservative with the bytes back then, so not even considering using more than 1-byte per character?
But this application will never conduct on any those older Windows anyways.
Luuk
 
Posts: 771
Joined: Fri Feb 21, 2020 10:58 pm

Re: Convert Non-Breaking Space to Space

Postby blueboy714 » Wed Oct 12, 2022 9:14 pm

Thanks - I will play around with it. Seems strange no one really seems knows how BRU's RegEx works - or at least struggles with it. BRU RegEx is different than what I have used in the past.
blueboy714
 
Posts: 3
Joined: Mon Oct 10, 2022 8:51 pm

Convert Non-Breaking Space to Space

Postby Luuk » Wed Oct 12, 2022 9:48 pm

If needing more help to understand how the RegEx(1) conducts, can always just ask, but so far all of your questions have been answered.
Luuk
 
Posts: 771
Joined: Fri Feb 21, 2020 10:58 pm


Return to How-To