Removing Unicode and some special characters

A swapping-ground for Regular Expression syntax

Removing Unicode and some special characters

Postby kenneyc » Thu Feb 02, 2017 2:20 am

Hey there. I have a crazy large amount of directories, subdirectories, and files where a good chunk of the files and directory names have various Windows invalid characters, and unicode characters that are causing problems. I'm trying to rename all files/folders inside of a directory (and down recursively), to change invalid characters to "_".

I tried power shell, but it had some issues with the path depth. Some forums recommended I look at this utility instead. In PS, the unicode range I used to try to remove was \u0020-\u007F, as well as removing "/", "*", and a few other invalid chars (I think "?" was one of them too).

What would be a regular expression to change these out for an underscore? They are anywhere in the name, possibly many times (names like "Accounting 2012/2013", "files 08/13-06/14", "bob's burgers [invalid char].docx" and so on).

Thanks for any help anyone has, I am getting to the end of the things I know to try to clean this mess up. (Someone migrated an old Mac server to a Windows server and didn't clean the file names in the process, the Mac server is gone, and now the project has been tossed into my lap.)
kenneyc
 
Posts: 2
Joined: Thu Feb 02, 2017 2:13 am

Re: Removing Unicode and some special characters

Postby KenP » Thu Feb 02, 2017 2:40 am

Rather than regular expressions I would try Special (14) - Character Translations.

Start by selecting the main directory that holds the sub-folders.

Filters (12)
Folders: checked
Files: checked
Subfolders: checked

Click on the icon under Character Translations and the Character Translations dialogue box will open, in the box type the characters that you want to replace and the character that you want to replace them with, type one set on each line.

Example:
/=_
?=_
?=_

Click OK to close the Character Translations dialogue box and select the files and folders you want to rename.
KenP
 
Posts: 199
Joined: Sat Jul 30, 2016 11:25 am

Re: Removing Unicode and some special characters

Postby kenneyc » Thu Feb 02, 2017 3:42 am

Oh nice! What about the Unicode set? They show up as square with a ? in them in a command prompt, and most of them show as a messed up version of a bullet point in windows explorer.
kenneyc
 
Posts: 2
Joined: Thu Feb 02, 2017 2:13 am

Re: Removing Unicode and some special characters

Postby Admin » Thu Feb 02, 2017 11:28 pm

Maybe try
Remove (5) -> High
This will remove all ASCII chars > 127 maybe it will work for your case too.
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm


Return to Regular Expressions