by trm2 » Sat May 16, 2020 1:55 pm
Page 213 I believe, in the manual under the heading, 'parentheses' #2 explains about Capture Groups and references to those values
the \2 you refer to is a reference to the value captured in Capture Group 2.
Simply put the first part - the RegEx string parses the string (breaks it apart) so portions can be isolated and the
values of those portions 'captured'.
The second part - the Replacement string, is used to re-arrange those parts, add new characters, or remove characters by
not including some parts that were isolated but not referenced back in the Replacement String.
It is the Replacement string that determines what you see under the New name column (if successful New name will be Green - if not, New Name
will be in normal black text and if in error, New Name may be in Red as invalid) . And when you click on Rename - that is
what you get.
Read the manual - it is pretty thorough. When Volume II is released (sooner than expected by the way - I thought Fall - it may be much sooner), there will
be a companion RegEx Manual in there as well.
as for (.*) the . - explained on page 208 - will match against any character.
the * - explained on page 209, is a Greedy Quantifier explained as a Repetition Operator on that page but explained more thoroughly later on under Greedy and
Lazy. It will match zero or more times meaning - capture as much of the filename (string) as possible.
It appears at the beginning to 'eat up the string and capture it in the first Capture Group - the first set of parentheses indicated by (.*). This would be Capture Group 1
referenced in the Replacement String as \1.
The spaces between the components that make up the RegEX are equally important. You are right that \s is a <space> character - page 210.
You see the evaluation is done by what is called the RegEx Engine - think of it as a train on a track (heck I'm giving away my volume II info). It moves forward and backwards
trying to satisfy a match. When it moves back, however, it can change values that were previously captured.
In this case
the RegEx, (.*) \s-\s(.)
is just how the therube described it
It captures the string by moving forward - captures the entire thing in Capture Group 1. It has reached the end of the string (called EOL - End of Line).
The next component, the \s searches for a <space> since it has reached the end, it moves backward (called backtracking) and will match the first <space> it finds.
In your example -
Metallica - Enter Sandman
capture Group 1 = that entire string value.
The first <space> it encounters moving backwards is the <space> after 'Enter'
This changes the value of Capture Group 1 to the preceding characters leading up to that <space>
Capture Group 1 = Metallica - Enter
The \s value (the <space>) has no parentheses so the value is not captured (retained).
The Next component, the <hyphen> changes things again. (by the way people out there it is a hyphen not a dash - a dash is actually a rare
character - In appearance it is an elongated hyphen - everyone here confuses the two quite a bit (100%))
The RegEx Engine moves forward through the string looking to match a <hyphen> - it can't find any and reaches the end once more, so it moves backwards again
this time it finds a <hyphen> located after 'Metallica <space>' This again is not 'Captured'
This changes the PREVIOUS \s from the <space> after 'Enter' to the <space> after Metallica.
You see how the values change?
So when the <space> changed, this in turn changed Capture Group 1 from 'Metallica - Enter' to 'Metallica'
Now the word 'Metallica' has been isolated (parsed).
The Next component is also a <space>, the \s.
If you recall after the <hyphen> was matched, 'that' is the current position of where the RegEx Engine is on the track.
So in the string:
Metallica - Enter Sandman
the \s will match against the <space> that follows the <hyphen> preceding Enter - again not captured. Why not Captured - the <space> <hyphen> <space>?
Not needed. Simple as that. If we want a <hyphen> or <space> in New name, these can be added back in using the Replacement String.
So What's left?
the (.*).
This indicates to the RegEx Engine to move forward and capture from that point on (remember it is Greedy and will match zero or more times - meaning eat everything in sight)
to the end of the string (the EOL). This value gets 'captured' in a another Capture Group the (.*) and is Capture Group 2. Because Capture Group 2 designation comes after Capture Group 1.
So each parentheses group in this RegEx represents a Capture Group (there are actually parentheses groups in RegEx that are NOT capture Groups - Non marking Groups).
So the rest of the string is captured in Capture Group 2 and this will include the position of the RegEx Engine (currently at the <space> preceding 'Enter') forward so Capture Group 2 = 'Enter Sandman'
So now we have isolated 'Metallica' and 'Enter Sandman'. The Parsing is finished. The RegEx String is finished and the filename string is finished - why?
#1 - no more components of the RegEx String remain - all have been evaluated
#2 - reached EOL and 'typically' when .* is used at the end of the RegEx the Engine cannot backtrack
#3 - the filename string is what is referred to as exhausted - meaning no more matches can be made even if there were more components of the RegEx String remaining.
So this is where the fun comes in. The RegEx has done its job. There are no errors.
What is left to do is to use the Replacement String to piece together (rearrange) the filename in the order you want.
Remember the Replacement string can reference the values 'captured' by referencing them using the format \ <reference number>
Where reference number refers to the Capture Group designation.
In addition literal characters (a literal is a character that can be reproduced on a keyboard) can be added.
therube used both the <space> and <hyphen> literal characters in the Replacement String
Here is what you have:
Capture Group 1 = Metallica
Capture Group 2 = Enter Sandman
Replacement String = \2 <space> <hyphen> <space> \1
He has done as you requested - he rearranged the two words and separated them by <space> <hyphen> <space>
so the final result = Enter Sandman - Metallica
Ta Dah! Look for Volume II where full analysis like I just presented are done with a heck of a lot more insights, notations, contributions, etc.
coming in as fast as a time as I can do it (at current page count 1,650 pages - not a typo)!