Help with duplicates file names!!

A swapping-ground for Regular Expression syntax

Help with duplicates file names!!

Postby Artex » Tue Dec 29, 2009 3:30 pm

I have about 20K mp3s that I am renaming, trying to reduce the number of duplicate files by only keeping a single file with the highest bitrate. Before I do that, I need to find all the dupes by file name (not by Mp3 tag).

The problem I am running into is as follows:
Typically I would remove the (starting number-space-dash-space) to leave just the (Artist-Space-Dash-Space-SongName.mp3). In this situation, however, nothing would be done because there would be three files with the same name. How can I get around this?

Example:
01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3

Is there any way (with a reg exp) to remove the starting number-space-dash-space and then if duplicates are found, perhaps append a number at the end? That would at least help sort the files until I can hand pick the best one.

01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_1.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_2.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_3.mp3
Artex
 
Posts: 2
Joined: Tue Dec 29, 2009 3:22 pm

Re: Help with duplicates file names!!

Postby Stefan » Tue Dec 29, 2009 5:31 pm

Artex wrote:I have about 20K mp3s that I am renaming, trying to reduce the number of duplicate files by only keeping a single file with the highest bitrate. Before I do that, I need to find all the dupes by file name (not by Mp3 tag).

The problem I am running into is as follows:
Typically I would remove the (starting number-space-dash-space) to leave just the (Artist-Space-Dash-Space-SongName.mp3). In this situation, however, nothing would be done because there would be three files with the same name. How can I get around this?

Example:
01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman.mp3

Is there any way (with a reg exp) to remove the starting number-space-dash-space and then if duplicates are found, perhaps append a number at the end? That would at least help sort the files until I can hand pick the best one.

01 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_1.mp3
101 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_2.mp3
26 - Metallica - Enter Sandman.mp3 = Result => Metallica - Enter Sandman_3.mp3


Hi Artex, welcome.

> there would be three files with the same name. How can I get around this?
> and then if duplicates are found, perhaps append a number at the end?
You can't do this whit RegEx because RE is just an pattern matching system.
To make an distinction like "IF nameA already exists THEN DO..." you need an scripting language...

...or the renaming app should handle this for you.
If you take an look into the menu of BRU or read the help, you will find :
Options Menu > Prevent Duplicates
This option allows to you to overcome the situation whereby a rename would fail because a file with the same name already exists. If you try to rename a file, and there's already a file with the same name, the software will make up subsequent attempt to rename the file but with a "_1" suffix. If this fails it will try with "_2" as the suffix, and will continue up to "_99". The limit of 99, and the separator character (underscore, _) are currently fixed and cannot be changed.

You can try (with an copy of your folder in case something went wrong) if this works for you.


----


> Is there any way (with a reg exp) to remove the starting number-space-dash-space

number-space-dash-space-theRest
RegEx(1)
Match: (.+?- )(.+)
Repla: \2

Explanation:
(.+?- ) is captured in group 1 and means: match all till an dash, but non-greedy due the '?', that's till the first dash, following by an space. Holds "101 - "
(.+) ... is captured in group 2 and means: find all till the end. Holds "Metallica - Enter Sandman"
\2 ..... means: give me back what was captured in group 2




This solution is untested, so try with an copy of your folder in case something went wrong
See this older threads for an RegEx syntax overview:
=> Getting Started: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
=> Go ahead: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27


HTH? :D
If yes: please help two others too.



.
Stefan
 
Posts: 508
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Re: Help with duplicates file names!!

Postby Artex » Tue Dec 29, 2009 7:57 pm

Wow, sorry I missed that pretty obvious menu option. Your REGExp worked perfectly in conjunction with the "prevent duplicates' option. I just copied a couple of these as a test, and the results were exactly what I wanted. This is going to save me a bunch of time. Your input is greatly appreciated!

01 - Test.mp3 -> Test.mp3
02 - Test.mp3 -> Test_1.mp3
03 - Test.mp3 -> Test_2.mp3
04 - Test.mp3 -> Test_3.mp3
Artex
 
Posts: 2
Joined: Tue Dec 29, 2009 3:22 pm


Return to Regular Expressions