Accidentally figured out a way to crash BRU

Post any Bulk Rename Utility support requirements here. Open to all registered users.

Need both end-of-line chars in unicode rename-pairs file

Postby Luuk » Wed Sep 28, 2022 11:00 pm

@therube...
My other post is very misleading, because it should say that lines should always end with either '0d0a' or '0a0d'.
But since UTF-16 uses 2-bytes for every character, each end-of-line character will always have '00' in front of it.
This why Im just using \r and \n in the beginning, but in UTF-16 this does mean \r==000d and \n==000a.

Im not even realized this until I did see the last post, because not actually looking inside the files with a hex-editor.
So when Im saying a lonely end-of-line character, its either '000d' or '000a' without being joined to the other one!
But everything else does so far seem to be correct.
Luuk
 
Posts: 706
Joined: Fri Feb 21, 2020 10:58 pm

Re: Accidentally figured out a way to crash BRU

Postby therube » Thu Sep 29, 2022 4:56 pm

We are technically one version away from the latest version

3.4.4.0 & Win7 here.
Looking at the literally "blank" file in hex that cause the crash, it is pretty clearly related to it being this:
FE FF 00 0A

If I try to import that, I get, OH, wait, I had "FF FE" rather then "FE FF".

With "FE FF" I then do get the crash :-).

Oh, so FF FE is Unicode & FE FF is Unicode BE (Big Endian).
therube
 
Posts: 1319
Joined: Mon Jan 18, 2016 6:23 pm

Re: Accidentally figured out a way to crash BRU

Postby therube » Thu Sep 29, 2022 5:16 pm

So, a ...

Unicode BE file (so it starts with $FE FF$)
that ends in $00 0A$ or $00 0D$

will cause a crash.


If it ends in $00 0A 00 0D$ (or $00 0D 00 0A$), then no crash.


(Now, don't ask me what a Unicode / BE file should look like or what type of "ending marker" it should have, but certain combinations do cause a crash.)


And...

If you take a "normal" Unicode BE, whatever normal might be, & lets say it simply ends with a "character" rather then 000a or anything like that, so
FE FF 00 63 00 72 00 61 00 73 00 68 00 7C 00 6D 00 65
& you then edit that file, say in Gvim, forcing it to be in UNIX format (rather then "DOS"), it adds a $00 0A$ to the end of the file,
FE FF 00 63 00 72 00 61 00 73 00 68 00 7C 00 6D 00 65 00 0A
which will then cause a crash.


Good find :-).
therube
 
Posts: 1319
Joined: Mon Jan 18, 2016 6:23 pm

Re: Accidentally figured out a way to crash BRU

Postby howtocrashBRU » Thu Sep 29, 2022 6:47 pm

That is pretty much what we are seeing. Thank you for trying to figure it out! It seems if you start with a UNIX 16 bit text file to try to accommodate a wide variety of modern languages, BRU doesn't seem to like importing that text file to use for sorting. BRU is a Windows program so it's not exactly a main issue but you can somewhat easily go from a list on UNIX to then move it to Windows thinking it would work and it is, well, to the user just a text file anyway and then it crashes.

Hopefully it would be an easy fix though for a future version and thank you for trying this out and taking a look into it!
howtocrashBRU
 
Posts: 14
Joined: Wed Sep 28, 2022 12:00 am

Re: Accidentally figured out a way to crash BRU

Postby howtocrashBRU » Thu Sep 29, 2022 6:59 pm

Here is a helpful guide on the EOL character differences. Doesn't go into the different little or big endian ones as much though or how UTF-8 is different than UTF-16 and how UTF-16 has little and big endian types. Big endian (BE) might be FF 01 02 03 and little endian (LE) might be 03 02 01 FF but the data is otherwise "the same".

Anyway, it does show how to easily check for this in Notepad++ if hex isn't your preference (though trivially easy in UNIX with hexdump).

Lower right of Notepad++ will clearly state the ASCII / UTF mode and if it is setup as a Windows or UNIX file format for that file, which that and looking at the hex directly makes it a lot easier to double check during troubleshooting.

https://www.loginradius.com/blog/engineering/eol-end-of-line-or-newline-characters/

What is an End of Line (EOL) character:
It is a character in a string which represents a line break, which means that after this character, a new line will start. There are two basic new line characters:

LF (character : \n, Unicode : U+000A, ASCII : 10, hex : 0x0a): This is simply the '\n' character which we all know from our early programming days. This character is commonly known as the ‘Line Feed’ or ‘Newline Character’.

CR (character : \r, Unicode : U+000D, ASCII : 13, hex : 0x0d) : This is simply the 'r' character. This character is commonly known as ‘Carriage Return’.

As matter of fact, \r has also has a different meaning. In older printers, \r meant moving the print head back to the start of line and \n meant starting a new line.

OS support:
Unix: Unix systems consider '\n' as a line terminator. Unix considers \r as going back to the start of the same line.

Mac (up to 9): Older Mac OSs consider '\r' as a newline terminator but newer OS versions have been made to be more compliant with Unix systems to use '\n' as the newline.

Windows: Windows has a different style of newline, Windows supports the combination of both CR and LF as the newline character - '\r\n'.

How to check:
There are lots ways to check this. I use Notepad++ as my text editor for this because it is easy to use and is widely used by developers.

Notepad++ show all characters:
Open any text file and click on the pilcrow (¶) button. Notepad++ will show all of the characters with newline characters in either the CR and LF format. If it is a Windows EOL encoded file, the newline characters of CR LF will appear (\r\n). If the file is UNIX or Mac EOL encoded, then it will only show LF (\n).

Notepad++ extended search:
Press the key combination of Ctrl + Shift + F and select 'Extended' under the search mode. Now search '\r\n' - if you find this at end of every line, it means this is a Windows EOL encoded file. However, if it is '\n' at the end of every line, then it is a Unix or Mac EOL encoded file.

How to convert:
Let's stick with Notepad++ for this, too. Open any file that you would like to convert, click on the Edit menu, scroll down to the EOL conversion option, and select the format that you would like to convert the file to.
howtocrashBRU
 
Posts: 14
Joined: Wed Sep 28, 2022 12:00 am

Re: Accidentally figured out a way to crash BRU

Postby Admin » Fri Sep 30, 2022 10:12 am

Thank you, we will need to fix this for next release, it definitely should not crash... if it encounters a file that it can't process it should just inform the user.
It would be nice if BRU could also support UTF-8 as well which I believe it's the default for Excel CSV.
Thanks for the report for now!
Admin
Site Admin
 
Posts: 2354
Joined: Tue Mar 08, 2005 8:39 pm

Learning about EOL-characters and UTF-16 rename-pairs

Postby Luuk » Fri Sep 30, 2022 2:23 pm

Many thanks for posting this information about the EOL characters, and also to the admins, for granting this a priority.
So now Im realizing why its good, that there is no .bru line-option to automatically load a rename-pairs file when starting.
Because if crashing, then a user must find his .ini/.bru-files to edit/rename them, before he could ever use the bru again!!
So this definitely to be an issue for many users, because the crashes must be fixed, before even considering any auto-loads.

At 1st Im not understand why, if the old printers and type-writers used both \r and \n, why should this even matter to our computers??
But apparently the old type-writers always needed fingers to push their buttons, so they started making some, with having data-lines.
So then computers could send \r for going to the beginning of a line like our 'Home' button, and then start modifying the same-line.

So like sending "\r____" could underline the 1st-4 characters, or could use \rSpaces and some-characters, to make some-characters bolder.
So apparently, this why the EOL-format of text-files was so important, in case ever needing to send the file away, for typing or printing.
At least now I do finally understand, where this war over the EOL-characters did get started from!

So again, many thanks to everyone for posting and helping to learn different things, and maybe also discovering some possible solutions.
With notepad++, Im often discover many different EOL-strings inside of the same file, and even besides what has already been posted here.
So Im just like to describe what the difference is, between using the notepad++ EOL-conversion, and the regex-replacement.

The EOL-conversion replaces every \r or \n with \r\n, unless its already paired like \r\n, so like...
\r\n -----> \r\n
\n\r -----> \r\n\r\n
\r\r\r ---> \r\n\r\n\r\n

Also, if Line-1 ends with \r\n, the menu for "Edit, EOL-conversion, Windows" will not be granted.
The regex-replacement converts all of those strings into 1-\r\n, and wont care about the 1st-line.
Again, many thanks to everyone!
Luuk
 
Posts: 706
Joined: Fri Feb 21, 2020 10:58 pm

Re: Accidentally figured out a way to crash BRU

Postby howtocrashBRU » Fri Sep 30, 2022 5:54 pm

As usual, it's one of those things nobody wants to fix since everybody now uses it this way that started with manual typing so instead we just work around it. Ironically, looking at you ASCII. We need a million plus characters!

Sounds like it's not too bad to work around or at least let the user know about before proceeding with crashing. Thank you for looking into this and at least it is a somewhat minor issue anyway though one that is likely to be encountered a fair bit if one imports UNIX documents since they are typically automatically saved that way. Would have just used UTF-8 but it completely ruined the import when doing that so had to use UTF-16 here and the easiest approach at the time was to just save it to UNIX and then move it back to the Windows import.

Thank you for also pointing out the Notepad++ nuance here too. Subtle; but worth noting.
howtocrashBRU
 
Posts: 14
Joined: Wed Sep 28, 2022 12:00 am

Re: Accidentally figured out a way to crash BRU

Postby therube » Mon Oct 03, 2022 4:41 pm

(Unrelated - except that it deals with /r/n..., heh.

"- Fix a bug that prevented downloading of urls from a file if the file use "rn" as line separator."
media-downloader 2.6.0)
therube
 
Posts: 1319
Joined: Mon Jan 18, 2016 6:23 pm

Re: Accidentally figured out a way to crash BRU

Postby romelitzs » Thu Feb 09, 2023 2:18 pm

Im often using UCS-2 LE BOM, but its because Im not usually have any 'surrogate' characters, and prefer the notepad++.exe.
romelitzs
 
Posts: 2
Joined: Thu Feb 09, 2023 2:16 pm

Re: Accidentally figured out a way to crash BRU

Postby romelitzs » Thu Feb 16, 2023 2:00 pm

Hopefully it would be an easy fix though for a future version and thank you for trying this out and taking a look into it!
[url="https://panoramacharter.ltd/"]panoramacharter.ltd[/url]

[url="https://19216811.vin/"]192.168.1.1[/url]
romelitzs
 
Posts: 2
Joined: Thu Feb 09, 2023 2:16 pm

Previous

Return to BRU Support