Converting Japanese mismatched dakuten and handakuten

A swapping-ground for Regular Expression syntax

Converting Japanese mismatched dakuten and handakuten

Postby Inserio » Sun Apr 09, 2017 8:46 am

So I have files including text such as ????????. If you know Japanese, that should be ????????, but it, in fact, is not. Basically, the dakuten and handakuten symbols (the two dots and the circle, respectively) in the top right of a few of the characters and the characters themselves are being treated as separate characters, though you can't actually highlight them separately in standard text.

In any case, I've already figured out the difficult work of separating those characters with the following regex
Code: Select all
(.)[`?`?]
(Note: that last character is not a *, it's just the weirdness of the two Japanese symbols in the character class)


However, what that does is capture the character without the vocalizing mark. I.e., ? matches ? in the first capturing group, whereas my ultimate goal is for it to turn into ?. If you're confused why ? and ? are not the same, read my first paragraph again.

I'm not sure how to turn an unvoiced character into a voiced character with regex. If it's helpful, this was my latest regex attempt for matching and includes every voiced character I could think of.
Code: Select all
([??????????]?[??????????]?[??????????]?[??????????]?[??]?)[`?]|([??????????]?)[`?]

However, it's not going to return anything because all it's matching is the dakuten or the handakuten portion. To actually match the respective character I'd need to put it as unvoiced in my matching string. Is there a way to do it other than one at a time? For example, a finished one at a time solution would look like this:
Code: Select all
Match: (.*)(?)[`?](.*)
Replace: \1?\3


Any ideas?


Edit: it seems this forum isn't designed to handle Japanese text. So I'll send you to the website I was using to test it with with some sample text and the above matching string.
https://regex101.com/r/olkxL2/6
Inserio
 
Posts: 2
Joined: Sun Apr 09, 2017 8:26 am

Re: Converting Japanese mismatched dakuten and handakuten

Postby therube » Sun Apr 09, 2017 2:51 pm

(
it seems this forum isn't designed to handle Japanese text

How do you like that, it can't.
Strange.
)
therube
 
Posts: 1314
Joined: Mon Jan 18, 2016 6:23 pm

Re: Converting Japanese mismatched dakuten and handakuten

Postby Inserio » Sun Apr 09, 2017 5:17 pm

Did you check out the link? I also included everything from the forum post in the bottom of the test string section.
Perhaps thinking about it in a different way would help. Basically, I think I'm trying to do a whooooooole bunch of individual if/then's. If 1, then replace with "one", if "2" then "two"—would be a similar process. In all, about 52 separate times. Is there a way to do that in a single regex?
Inserio
 
Posts: 2
Joined: Sun Apr 09, 2017 8:26 am


Return to Regular Expressions