Since this thread is sticky, i use this to link to other posts:
First, i mention here our two very first posts of this regex sub-forum (at the last page in the meantime):
Getting Started - Overview over the RegEx syntax (Regular expressions, regexes. RegExp, RE)
http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
- Code: Select all
Regular expressions (regexes) are like super-intelligent wildcards.
If you learn regexes, you too can become super-intellegent (and have more fun using BRU).
WILDCARDS
---------
Wildcards are probably familiar to all BRU users.
You can experiment with wild cards using Windows Explorer:
Windows Explorer > "Search" dialog box > "Search for files or folders named"
Summary:
? means "any single character"
* means "zero or more characters"
------------------------------------------------------------
Expression Matches Doesn't match
---------- --------------------------- -------------------
Notes* Notes Notes_2005_0302.txt aNotes
*Notes* Notes Notes_2005_0302.txt
?Notes* aNotes aNotes_2005_0302.txt Notes_2005_0302.txt
------------------------------------------------------------
REGULAR EXPRESSIONS
-------------------
Regexes are the same general idea as wildcards, but are considerably more powerful.
Please glance at the table, then refer to the discussion, below.
-----------------------------------------------
Equivalent
Regex Wildcard Matches
----- ---------- ----------------------------
cat cat The literal characters "cat"
. ? Any single character
.. ?? Any two characters
... ??? Any three characters
.* * Zero or more characters
..* ?* One or more characters
...* ??* Two or more characters
.+ ?* One or more characters
..+ ??* Two or more characters
Discussion:
* means "the preceding character occurs zero or more times"
+ means "the preceding character occurs one or more times"
Why would you ever want a character to occur zero or more times? It means that the character is optional. For example: ru.*n matches run, ruin, and ruffian
On the other hand, ru.+n matches ruin and ruffian, but not run (because we need at least one character between the "u" and "n".)
Note that there are two equivalent ways of saying "one or more characters": ..* and .+
-----------------------------------------------
There are no wildcard equivalents to the following regexes (at least, not in Windows Explorer).
----------------------------------------------------------------
Regex Matches
------------- -------------------------------------------------
\. A period.
\t A tab character
\n A newline character
ca?t "c" followed by zero or one "a", followed by "t"
ca*t "c" followed by zero or more "a", followed by "t"
ca+t "c" followed by one or more "a", followed by "t"
[efgh] any one of efgh
[e-h] any one of efgh
[a-cF-H] any one of abcFGH
[e-h]* any one of efgh, occurring zero or more times
[e-h]+ any one of efgh, occurring one or more times
[a-c][e-h]+ any one of abc; followed by any one of efgh, occurring one or more times
([a-c][e-h])+ any one of abcefgh, occurring one or more times.
Discussion:
\ \ in front of a regex operator changes it to an ordinary ascii character.
\ \ Also refers to non-printable ascii characters such as tab and newline (\t and \n).
? means "the preceding character occurs zero or one time"
? * + are called "quantifiers" because they specify the number of times a regex expression must occur
[] always refers to a single character, picked from all those in the square brackets
() parentheses are used for grouping expressions together.
()+ means the expression in the parentheses occurs one or more times
----------------------------------------------------------------
BACKREFERENCING!!!!
-------------------
In addition to "grouping", there is a second, more powerful use for parentheses, called "backreferencing". The idea is that you can save the matching characters to be used later. For example, suppose you want to change date format from 12-31-2005 to 2005_1231.
Use this as your "search-regex":
(12)-(31)-(2005)
and use this as your "replace-regex":
\3_\1\2
In backreferencing, \1 always refers to the contents of the first pair of parentheses in the search-regex, \2 refers to the contents of the second pair, and \3 to the contents of the third pair.
Understanding and using backreferencing is essential if you want to take advantage of the powerful regex capability of BRU.
OTHER PROGRAMS
--------------
Here are some programs that can help you get comfortable with regexes, before you start changing your filenames with BRU.
1. TextPad is shareware with an unlimited trial duration [url]http://www.textpad.com/[/url]
This is my favorite text editor. The main thing you need to know is that the grouping symbol is \( \) instead of (). Otherwise, regex-gurus-in-training can assume Textpad regexes are identical to BRU.
To get started:
- Open a text file
- Search menu > Find...
- Make sure that you've selected the "Regular expressions" check box.
- Type in a regex, and click the Find Next button.
To try out the above backreferencing example:
In a text file, type 12-31-2005
- Search menu > Replace...
- Make sure that you've selected the "Regular expressions" check box.
- Find what: \(12\)-\(31\)-\(2005\)
- Replace with: \3_\1\2
- Click the "Find Next" button
- Click the "Replace" button.
2. Visual Regex [url]http://laurent.riesterer.free.fr/regexp/[/url]
Visual Regex is unique because it highlights each regex group () with a different color, then highlights the matching text in the same color. This lets you see what group is matching what text, helps you debug the regex, and helps you learn more about regular expressions.
3. Regex Buddy [url]http://www.regexbuddy.com/[/url]
4. Regex Coach [url]http://weitz.de/regex-coach/[/url]
5. Regex Designer [url]http://www.radsoftware.com.au/regexdesigner/[/url]
6. The Regulator [url]http://regex.osherove.com/[/url]
Go ahead - Some interesting sides about reg ex,
http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27
Other threads with examples from common interest will follow:
My RegEx hints:
Expression:. --> one piece of a sign (char, digit, sign, blank)
a --> the char "a" itself literally
abc --> the string "abc" literally itself
(aa|bb) --> one of the alternatives "aa" or "bb", what ever is found first
(aa|bb|cc)--> one of the alternatives "aa" or "bb" or "cc", what ever is found first
[ab3-] --> one from list ("a" or "b" or "3" or "-") NOTE: the hyphen must be at the very begin, or at the end. NOT in between!
[^ab3-] --> one sign but none from the ones from this list (no "a", no "b", no "3" and no hyphen)
[^-] --> one sign (char, digit, whitespace, punctuation) but not a hyphen
[a-z] --> one lower case char from the range "a", "b", "c", "d".... till "z"
[A-Z] --> the same, but match upper case letters.
[a-d] --> one from the range "a", "b", "c" or "d"
[A-D] --> the same, but match upper case letters.
Note: all this A-Za-z thinggy will only match plain 7-bit ASCII chars (english alphabet), no umlauts or ascents or such.
3 --> the digit "3"
2013 --> the number "2013" literally
[6-9] --> one piece of any digit from the range "6", "7", "8" and "9"
[^6-9] --> one piece of any sign (char, digit, whitespace, punctuation) but not a "6", "7", "8" or "9"
\w --> one any letter, digit or underscore
\d --> one any digit from the range "0", "1", "2", "3" till "9"
\s --> one blank
- --> one hyphen literally
_ --> one underscore,
\. --> one dot literally (the dot has to be "escaped" with an backslash, because it is a RegEx MetaChar (see above))
\\ --> one backslash literally (the backslash has to be "escaped" with an backslash, because it is a RegEx MetaChar)
(...) --> group an expression to apply operators or for backreference. Instead of the three dots write your expression.
(Note: those groups are counted from left to right and can be nested too)
\W, \D, \S, --> opposite of lower case \w \d \s
\W, \D, \S means: match one of ANY sign, but NOT if it is a sign of the character class \w or \d or \s
\W --> match one sign but NOT a word sign, \D --> match one sign but NOT a digit, \S --> match one sign but NOT a whitespace
Note: all of the above match only one single piece of a sign!
To match more than one piece, just double them:
aa --> match two 'a' literally
aaa --> match three 'a' literally
aaaa --> match four 'a' literally
\d\d\d\d --> match exactly four single digits like '1962' or '2013'
... --> (three dots) match three of any (maybe different) signs
\s\s --> match two blanks
or use a another meta sign as quantifier.
Quantifier:
* --> match greedy zero or more times the previous expression
+ --> match greedy one or more times the previous expression
{3} --> match exactly 3 times the previous expression
{3,} --> match greedy but at least 3 times the previous expression
{,5} --> match greedy zero-or-more up to 5 times the previous expression
{3,5} --> match greedy 5, or 4, or 3 times the previous expression
? --> behind * or + or {,} will limit the match to as few as (non-greedy)
? --> behind an expression matches on zero or one occurrence
Example:
\d+ --> match one-or-two-or-three-or....-or-as-may-as-possible pieces of any digit. Like '3', or '42', or '123', or '5782332'
\d* --> match zero(none)-or-one-or-two-or....-or-as-many-as-possible pieces of any digit. Like ' ', or '3', or '42', or '123', or '5782332'
\d{4}--> match exactly four of any digits. Like '1962' or '2013' or '1234'
\d{2,4} --> match two, or three, or four of any digits. Like '08' or '2013' or '123'. Works greedy, will get you rather '2013' than '08'
\d{2}|\d{4) --> match exactly two or four digit. But tries to match two first and then stops, even on '2013' it will get you only '20' and will never try to match four digits
a{4} --> match four 'a' s
(Ho){3} --> match three times 'Ho' >>> 'HoHoHo'
(the ){2} --> match doubled 'the '
Boundaries:
\b --> Match at word boundary. Example: "\bfun\b" on "my fun function" will match 'fun' only.
\A or ^ --> at start of file name. Example: "^fun" on "fun function" will match first 'fun' only.
\Z or $ --> at end of file name. Example: "on$" on "onto my fun function" will match last 'on' only.
Meta signs:\ --> use the escape character "\" in front of an meta sign, to match an meta sign itself
Meta signs are: ., \, (, ), [, {, }, +, *, ?, |, ^, $
Example:
\. --> one dot literally
\\ --> one backslash literally
backreference on replacement:\1 - insert here what was matched by first (...)-group
\2 - insert here what was matched by second (...)-group
\3 ... \9 - insert what was matched by third, fourth, fifth,... till ninth group
(Note: some flavours use $1 syntax instead of \1)
(Note: those backreference groups are counted from left to right and can be nested too)
(1 ... (2 ... (3... ))) (4 ... ) (5 ... ) (6 ... (7 ... ) )
Some RegEx implementation allow additional rules like: - Named groups: (?<abc>pattern) >> (?P<my_description-here>pattern)
- Non-capturing group: (?:pattern)
- Comments: (?#comment)
- Positive lookahead: (?=pattern)
- Negative lookahead: (?!pattern)
- Positive lookbehind: (?<=pattern)
- Negative lookbehind: (?<!pattern)
(and many more > http://www.regular-expressions.info/refadv.html)
NOTE: for BRU, depending on what you want to do, you have to match mostly the whole file name, not only the part you are interested in.
Example: "Interpret 2013 - Song title.mp3"
Right way:
Match: "(.+) \d\d\d\d - (.+)"
Replace: "\1 \2"
Wrong way:
Match: "\d\d\d\d"
Replace: ""
Greed, greedy, OR non-greedy, reluctant: By default, *, ?, +, and {min,max} are greedy because they consume all characters up through the last possible one that still satisfies the entire pattern.
To instead have them stop at the first possible character, follow them with a question mark. For example, the pattern <.+> (which lacks a question mark)
means: "search for a <, followed by one or more of any character, followed by a >". To stop this pattern from matching the entire string <em>text</em>,
append a question mark to the plus sign: <.+?>. This causes the match to stop at the first '>' and thus it matches only the first tag <em>
Example:
".+(\d\d)" on "Album1987" will gets you "87", because ".+" is greedy and matches "19" too.
".+?(\d\d)" on "Album1987" will gets you "19", because the '?' on ".+?" makes that expression non-greedy and matches only till it find two digits firstly.
More example about Greedy lazy match- Code: Select all
Greedy lazy match
The RegEx : "(.+) - (.+)"
Will match: "Artist - Album" - "Title"
Explanation: Match greedy untill the last hyphen.
/(.+) - (.+)/
1st Capturing group (.+)
.+ matches any character (except newline)
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
- matches the characters - literally
2nd Capturing group (.+)
.+ matches any character (except newline)
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
http://www.debuggex.com/
The RegEx : "(.+?) - (.+)"
Will match: "Artist" - "Album - Title"
Explanation: Match lazy non-greedy untill the first hyphen.
/(.+?) - (.+)/
1st Capturing group (.+?)
.+? matches any character (except newline)
Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
- matches the characters - literally
2nd Capturing group (.+)
.+ matches any character (except newline)
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
http://www.debuggex.com/
The RegEx : "(.+) - (.+?)"
Will match: "Artist - Album" - "T"itle
Explanation: Match lazy non-greedy untill the first hyphen plus one-or-more of any sign.
/(.+) - (.+?)/
1st Capturing group (.+)
.+ matches any character (except newline)
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
- matches the characters - literally
2nd Capturing group (.+?)
.+? matches any character (except newline)
Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
http://www.debuggex.com/
The RegEx : "(.+) - (.+) - (.+)"
Will match: "Artist" - "Album" - "Title"
The RegEx : "(.+ - .+) - (.+)"
Will match: "Artist - Album" - "Title"
The RegEx : "(.+ - )(.+ - .+)"
Will match: "Artist" - "Album - Title"
Explanation: Because you have make clear the delimiters positions.
(not 100% accurate for simpleness)
Find more information:http://www.regular-expressions.info/reference.html
http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
http://www.rexegg.com/
http://www.debuggex.com/
http://regex101.com/
########################## my template for my answers #########################
BEFORE (origin name):
Interpret 2013 - Song title.ext
AFTER (wanted new name):
Interpret - Song title.ext
Rule (what we want in plain english):
SOLUTION (our way to success):
USE (this rules/methods):
RegEx(1)
Search: "(.+) - (.+)"
Replace: "\2 - \1"
"[__] Include Ext." is unchecked.
Don't use the quotes "", they are only there for clarifying where the pattern begins and ends.
INSTRUCTIONS (how to use and which option to set):
= This solution is provide by my tests or assumption based on my experiences in the past.
I can give no guarantee that your computer will not explode and delete all your files.
The solution is based on the provided information and may not work for other file name pattern.
= Remember to test this with some test files first. And always do a backup before you manipulate your important real files!
= Select a few files in the Name column to see what happens in the NewName column.
= Menu "Options > Ignore... > File Extensions" is unchecked.
= My pattern '.ext' stands for any file extension like '.mp3' or '.txt', as that often doesn't matters.
= Sometimes I use the sign '~' instead a real space for better recognizability, like: "Interpret~~-~~Song.mp3" to "Interpret~-~Song.mp3"
= More about RegEx can be found there >>> http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=96
(that's: Board index ‹ Bulk Rename Utility ‹ Regular Expressions > "Getting help with Regular Expressions")
EXPLANATION (what have we done here step-by-step?):
HTH?
########################## /my template for my answers #########################
.