Help with duplicates file names!!

A swapping-ground for Regular Expression syntax

Help with duplicates file names!!

Postby Artex » Tue Dec 29, 2009 3:30 pm

I have about 20K mp3s that I am renaming, trying to reduce the number of duplicate files by only keeping a single file with the highest bitrate. Before I do that, I need to find all the dupes by file name (not by Mp3 tag).

The problem I am running into is as follows:
Typically I would remove the (starting number-space-dash-space) to leave just the (Artist-Space-Dash-Space-SongName.mp3). In this situation, however, nothing would be done because there would be three files with the same name. How can I get around this?

Example:
01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3

Is there any way (with a reg exp) to remove the starting number-space-dash-space and then if duplicates are found, perhaps append a number at the end? That would at least help sort the files until I can hand pick the best one.

01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_1.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_2.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_3.mp3
Artex
 
Posts: 2
Joined: Tue Dec 29, 2009 3:22 pm

Re: Help with duplicates file names!!

Postby Stefan » Tue Dec 29, 2009 5:31 pm

Artex wrote:I have about 20K mp3s that I am renaming, trying to reduce the number of duplicate files by only keeping a single file with the highest bitrate. Before I do that, I need to find all the dupes by file name (not by Mp3 tag).

The problem I am running into is as follows:
Typically I would remove the (starting number-space-dash-space) to leave just the (Artist-Space-Dash-Space-SongName.mp3). In this situation, however, nothing would be done because there would be three files with the same name. How can I get around this?

Example:
01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3

Is there any way (with a reg exp) to remove the starting number-space-dash-space and then if duplicates are found, perhaps append a number at the end? That would at least help sort the files until I can hand pick the best one.

01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_1.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_2.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_3.mp3


Hi Artex, welcome.

> there would be three files with the same name. How can I get around this?
> and then if duplicates are found, perhaps append a number at the end?
You can't do this whit RegEx because RE is just an pattern matching system.
To make an distinction like "IF nameA already exists THEN DO..." you need an scripting language...

...or the renaming app should handle this for you.
If you take an look into the menu of BRU or read the help, you will find :
Options Menu > Prevent Duplicates
This option allows to you to overcome the situation whereby a rename would fail because a file with the same name already exists. If you try to rename a file, and there's already a file with the same name, the software will make up subsequent attempt to rename the file but with a "_1" suffix. If this fails it will try with "_2" as the suffix, and will continue up to "_99". The limit of 99, and the separator character (underscore, _) are currently fixed and cannot be changed.

You can try (with an copy of your folder in case something went wrong) if this works for you.


----


> Is there any way (with a reg exp) to remove the starting number-space-dash-space

number-space-dash-space-theRest
RegEx(1)
Match: (.+?- )(.+)
Repla: \2

Explanation:
(.+?- ) is captured in group 1 and means: match all till an dash, but non-greedy due the '?', that's till the first dash, following by an space. Holds "101 - "
(.+) ... is captured in group 2 and means: find all till the end. Holds "Metallica - Enter Sandman"
\2 ..... means: give me back what was captured in group 2




This solution is untested, so try with an copy of your folder in case something went wrong
See this older threads for an RegEx syntax overview:
=> Getting Started: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
=> Go ahead: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27


HTH? :D
If yes: please help two others too.



.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Help with duplicates file names!!

Postby Artex » Tue Dec 29, 2009 7:57 pm

Wow, sorry I missed that pretty obvious menu option. Your REGExp worked perfectly in conjunction with the "prevent duplicates' option. I just copied a couple of these as a test, and the results were exactly what I wanted. This is going to save me a bunch of time. Your input is greatly appreciated!

01 - Test.mp3 -> Test.mp3
02 - Test.mp3 -> Test_1.mp3
03 - Test.mp3 -> Test_2.mp3
04 - Test.mp3 -> Test_3.mp3
Artex
 
Posts: 2
Joined: Tue Dec 29, 2009 3:22 pm

Re: Help with duplicates file names!!

Postby giles » Mon Apr 24, 2017 7:47 pm

Hi all, I've just downloaded the bulk renaming utility and, although I used to know a little about use of reg expressions that was a long time ago and I'm not so hot with them now, nor do I kknow whether what I'm trying to do is a reg expression matter or not! In addition I've been blind for 9 years now so it takes me longer to get to grips with how programs are laid out and what information I need to type where.

I found this thread whilst looking for a way to replace underscores with spaces in filenames. Each week I get a download of various newspapers, and the filenames use underscores instead of spaces, which gets irritating when my talking devices announce every single one! This is an example filename from this week's Guardian newspaper:

The Guardian_-_22_April_2017_-_Cook_-_HTML.html

I'd like to simply replace those _ characters with spaces, resulting in The Guardian - 22 April 2017 - Cook - HTML.html

I tried to do this by putting _ in the Match and a space in the Replace box (I think that might be the reg expression section ... when I press tab to move from one control to the next the one I come to next after Replace is the Include EST checkbox) ... when that didn't work I tried the same in the next text box (pressing tab it is the one after the combo box for choosing keep / remove / fixed) but it didn't work there either.

I also tried using _-_ as the search text and ' - ' in the Replace boxes, again without success.

Any basic help or pointers to what I'm doing wrong would be greatly appreciated :)

Thank you,

Giles
giles
 
Posts: 2
Joined: Mon Apr 24, 2017 7:04 pm

Re: Help with duplicates file names!!

Postby KenP » Mon Apr 24, 2017 8:05 pm

You can use Character Translations to replace underscores with spaces.

Special (14)
Character Translations.

  1. Select Character Translations.
    .
  2. In the dialogue box that opens type "_= " without the quotes (underscore=space)
    (If you want to use any other character translations type them on individual lines)
    .
  3. Select the files you want to rename.
    .
  4. check and rename the files.

Clicking the reset button does not delete the character translations so after you've renamed the files be sure to either deselect Character translations or click on the Status icon to reopen the dialogue box and delete the character translations you've set.
KenP
 
Posts: 199
Joined: Sat Jul 30, 2016 11:25 am

Re: Help with duplicates file names!!

Postby giles » Mon Apr 24, 2017 8:27 pm

KenP wrote:You can use Character Translations to replace underscores with spaces.


Thank you for that quick answer, Ken. That sounds perfect and I'll have a go with your instructions now :)

Giles
giles
 
Posts: 2
Joined: Mon Apr 24, 2017 7:04 pm


Return to Regular Expressions


cron