repositioning words in filenames

Post any Bulk Rename Utility support requirements here. Open to all registered users.

repositioning words in filenames

Postby stulinds » Tue Jan 26, 2010 2:43 pm

I am a professional botanist and have several hundred photographs of plants labelled in this general format:

Middleton et al 4614 Schizaea dichotoma PK_4766.JPG

The "Middleton et al" (= the name of the plant collector) is constant in all file names, the number after that (= the plant's collection number) is variable but is always 4 digits, and the photo identifier (in this case PK_4766) is always 2 letters (= photographer's initials) and 4 digits (= the photo number) joined by an underscore. The 2 words (always 2 words but always different and of variable lengths) between the plant's collection number and the photograph identifier represent the Latin name of the plant and it is this that I would like to move (but do not know how to do this efficiently).

I would like all plant names to be moved from their current position to the beginning of the file name. Everything else will stay the same.

So the above file name would become:

Schizaea dichotoma Middleton et al 4614 PK_4766.JPG

How can this be done using BRU?
stulinds
 
Posts: 1
Joined: Tue Jan 26, 2010 2:07 pm

Re: repositioning words in filenames

Postby Stefan » Tue Jan 26, 2010 8:51 pm

stulinds wrote:I am a professional botanist and have several hundred photographs of plants labelled in this general format:

Middleton et al 4614 Schizaea dichotoma PK_4766.JPG

The "Middleton et al" (= the name of the plant collector) is constant in all file names, the number after that (= the plant's collection number) is variable but is always 4 digits, and the photo identifier (in this case PK_4766) is always 2 letters (= photographer's initials) and 4 digits (= the photo number) joined by an underscore. The 2 words (always 2 words but always different and of variable lengths) between the plant's collection number and the photograph identifier represent the Latin name of the plant and it is this that I would like to move (but do not know how to do this efficiently).

I would like all plant names to be moved from their current position to the beginning of the file name. Everything else will stay the same.

So the above file name would become:

Schizaea dichotoma Middleton et al 4614 PK_4766.JPG

How can this be done using BRU?


Hi stulinds, welcome.
Good explained challenge, that's it is like it should be done always.


Note:
Test this with test files first!
Select one or more files in the Name column to watch how the New Name will be.


FROM:
Middleton et al 4614 Schizaea dichotoma PK_4766.JPG
TO:
Schizaea dichotoma Middleton et al 4614 PK_4766.JPG

You can do this by using regular expressions.
First split your old name into parts as you have described already:
"Middleton et al" "4 digits" "2 words" "2 letters -- underscore -- 4 digits"

Now build an RegEx:
(.+) (\d\d\d\d) (.+) (\w\w_\d\d\d\d)


Explanation:
The (...) will group that what is found by the regex for re-use this later with \n syntax, where n is the number of the backreferencing group.
(.+) ==> "Middleton et al" ==> \1
(\d\d\d\d) ==> "4 digits" ==> \2
(.+) ==> "2 words" ==> \3
(\w\w_\d\d\d\d) ==> "2 letters -- underscore -- 4 digits"

Hint: the (.+) construct means:
. (an dot) ==> search one char/digit/whitespace/anysign
+ ==> search one-or-more of that what we have defined just before, that was the dot.
This will find any char and the blank of "Middleton et al" and also "Schizaea dichotoma"


Now just replace with what the regex have found, but use an new order:
\3 \1 \2 \4



Test it, use
RegEx(1)
Match: (.+) (\d\d\d\d) (.+) (\w\w_\d\d\d\d)
Repla: \3 \1 \2 \4



HTH? :D
If yes: please help two others too.




------------------------------------------------------------------------------related infos:
Depending on the file names one could simplify the regex and build it also like:
(.+) (\d+) (.+) (\w\w_\d+) where i didn't search for four digits \d\d\d\d, but for one or more digits \d+
(.+) (\d+) (.+) (\w+_\d+) where i didn't search for two chars \w\w, but for one or more chars \w+
(.+) (\d{4}) (.+) (\w\w_\d{4}) where \d{4} means: search for exactly 4 digits, the same as \d\d\d\d
(.+) (\d{4}) (.+) (\w+_\d{4})

See this both oldest threads in the "Regular Expressions" forum for an RegEx syntax overview:
=> Getting Started: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
=> Go ahead: http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU


Return to BRU Support