Non-english characters are unsupported

A swapping-ground for Regular Expression syntax

Non-english characters are unsupported

Postby Zoli1972 » Wed Jan 06, 2010 9:25 pm

Hi,

Unfortunately, I found out, that non-english-characters (like äöüßáíó, etc...) are not supported by the regular expressions function in Version 2,7,1,1.

Can you give an update with working RegExp on non-english characters?

Zoli
Zoli1972
 
Posts: 3
Joined: Wed Jan 06, 2010 9:22 pm

Re: Non-english characters are unsupported

Postby Stefan » Wed Jan 06, 2010 10:59 pm

Zoli1972 wrote:Hi,

Unfortunately, I found out, that non-english-characters (like äöüßáíó, etc...) are not supported
by the regular expressions function in Version 2,7,1,1.

Can you give an update with working RegExp on non-english characters?

Zoli


That's RegEx.

EDIT: here i was wrong, BRUs RE-engine didn't support this at all. Read the next posts for an solution.
You can use \xFF where FF are 2 hexadecimal digits
Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in character classes.
http://www.regular-expressions.info/reference.html


For my codepage
ä=E4
ö=F6
ü=FC
Ü=DC
...
Use windows tool 'charmap' to see your codepage
Last edited by Stefan on Sat Jan 30, 2010 4:26 pm, edited 1 time in total.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Non-english characters are unsupported

Postby Zoli1972 » Fri Jan 29, 2010 5:30 pm

That's RegEx


Oh, sorry. I just took the term "RegExp" out of a function in the programming language "AutoIt", where it is similarly named "RegExp", and I meant, it would be the right term in english. Looks like I was wrong, huh? :-) Thanks for correcting.

I was unable to - RegEx - my filenames using your solution. My Windows-charmap says i.e. "E4" for an "ä", just like yours, but renaming a filename containing that character still doesn't work.

I was using the following RegEx line (example):
Match: ([A-Za-z ,!.\xE4]*)...
Replace: \1...

Please, tell me, if there's something wrong with it.

Zoli
Zoli1972
 
Posts: 3
Joined: Wed Jan 06, 2010 9:22 pm

Re: Non-english characters are unsupported

Postby Stefan » Fri Jan 29, 2010 10:25 pm

Zoli1972 wrote:
That's RegEx


Oh, sorry. I just took the term "RegExp" out of a function in the programming language "AutoIt", where it is similarly named "RegExp", and I meant, it would be the right term in english. Looks like I was wrong, huh? :-) Thanks for correcting.

I was unable to - RegEx - my filenames using your solution. My Windows-charmap says i.e. "E4" for an "ä", just like yours, but renaming a filename containing that character still doesn't work.

I was using the following RegEx line (example):
Match: ([A-Za-z ,!.\xE4]*)...
Replace: \1...

Please, tell me, if there's something wrong with it.

Zoli


:D I meant "That's RegEx" which doesn't support umlauts, not BRU :D
The abbreviation "RegExp" is fine too, as it is RE too, sorry i didn't make myself clear enough.

And you're right, it seams that BRUs RegEx engine doesn't support umlauts at all. Or more correct i think... the engine is not implemented completely.
Trying
Match: .*
Repla: A
on "Köln.txt" does nothing
whereas on "Hamburg.txt" i get "A.txt" as result.

The same with
Match: .\xF6.+
on "Köln.txt" does nothing.

Sorry to put you in the wrong direction :roll:


I just tested with ReNamer by Den4b and saw that ReNamers RegEx engine do support umlaute
Expression: (ö)
Replace: A$1
Test on: Köln.txt
Result: KAöln.txt

An another test:
Expression: (.[A-Za-z ,!.\xF6]*.+)
or
Expression: (.[A-Za-z ,!.ö]*.+)
Replace: $1A
Test on: Köln.txt
Result: KölnA.txt

So please try this ReNamer => http://www.den4b.com/downloads.php?project=ReNamer
Use the latest beta.

If you have problems to build up the correct RegEx... just let us know.

All the best to get your work done.


--

Oh, i forgot:
if you want to stand with BRU you "could" do an very dump work around:
1.) Find one umlaut like ä and replace by #
use
Repl.(3)
Replace: ä
With: #

Then do your RegEx search and replace.
Then replace # back by ä.

Just an Idea,...Maybe this is worth for you to do this.
But rather as with ReNamer, BRU will rename the files in real three times then,
whereas with ReNamer you can use several rules to just see it in the preview only,
and use the last combination of all steps to build the new name, and then rename only one time.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Non-english characters are unsupported

Postby Zoli1972 » Sat Jan 30, 2010 12:24 pm

Hi,

I tested it too, and also worked for me like a charm :-)

Thanks a lot.
My last question is, how can I save my RegEx-Settings for certain files in ReNamer? Unlike BRU, ReNamer seems to have no function for that. I just don't want to re-type my felt 1kilometer long RegEx-line everytime I need it.

Zoli
Zoli1972
 
Posts: 3
Joined: Wed Jan 06, 2010 9:22 pm

Re: Non-english characters are unsupported

Postby Stefan » Sat Jan 30, 2010 3:50 pm

Zoli1972 wrote:Hi,

I tested it too, and also worked for me like a charm :-)

Thanks a lot.
My last question is, how can I save my RegEx-Settings for certain files in ReNamer? Unlike BRU, ReNamer seems to have no function for that.
I just don't want to re-type my felt 1kilometer long RegEx-line everytime I need it.

Zoli

Zoli> I tested it too,
What of my suggestions did you have done?

Zoli> how can I save my RegEx-Settings for certain files in ReNamer
Ah, i see ReNamers Help doesn't helps much here, because the help is not yet finished.
I find there only ==> "You can save the stack of rules and re-use it later"

In the ReNamer Wiki there is more explained ==> http://www.den4b.com/wiki/ReNamer:Using_presets
In short: use the "Presets"-menu to save/store/modify/load Rules and whole Rules-Sets with all settings, included your RegEx's
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU


Return to Regular Expressions