Since this thread is sticky, i use this to link to other posts:
First, i mention here our two very first posts of this regex sub-forum (at the last page in the meantime):
Getting Started - Overview over the RegEx syntax (Regular expressions, regexes. RegExp, RE)
http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=5
- Code: Select all
Regular expressions (regexes) are like super-intelligent wildcards.
If you learn regexes, you too can become super-intellegent (and have more fun using BRU).
WILDCARDS
---------
Wildcards are probably familiar to all BRU users.
You can experiment with wild cards using Windows Explorer:
Windows Explorer > "Search" dialog box > "Search for files or folders named"
Summary:
? means "any single character"
* means "zero or more characters"
------------------------------------------------------------
Expression Matches Doesn't match
---------- --------------------------- -------------------
Notes* Notes Notes_2005_0302.txt aNotes
*Notes* Notes Notes_2005_0302.txt
?Notes* aNotes aNotes_2005_0302.txt Notes_2005_0302.txt
------------------------------------------------------------
REGULAR EXPRESSIONS
-------------------
Regexes are the same general idea as wildcards, but are considerably more powerful.
Please glance at the table, then refer to the discussion, below.
-----------------------------------------------
Equivalent
Regex Wildcard Matches
----- ---------- ----------------------------
cat cat The literal characters "cat"
. ? Any single character
.. ?? Any two characters
... ??? Any three characters
.* * Zero or more characters
..* ?* One or more characters
...* ??* Two or more characters
.+ ?* One or more characters
..+ ??* Two or more characters
Discussion:
* means "the preceding character occurs zero or more times"
+ means "the preceding character occurs one or more times"
Why would you ever want a character to occur zero or more times? It means that the character is optional. For example: ru.*n matches run, ruin, and ruffian
On the other hand, ru.+n matches ruin and ruffian, but not run (because we need at least one character between the "u" and "n".)
Note that there are two equivalent ways of saying "one or more characters": ..* and .+
-----------------------------------------------
There are no wildcard equivalents to the following regexes (at least, not in Windows Explorer).
----------------------------------------------------------------
Regex Matches
------------- -------------------------------------------------
\. A period.
\t A tab character
\n A newline character
ca?t "c" followed by zero or one "a", followed by "t"
ca*t "c" followed by zero or more "a", followed by "t"
ca+t "c" followed by one or more "a", followed by "t"
[efgh] any one of efgh
[e-h] any one of efgh
[a-cF-H] any one of abcFGH
[e-h]* any one of efgh, occurring zero or more times
[e-h]+ any one of efgh, occurring one or more times
[a-c][e-h]+ any one of abc; followed by any one of efgh, occurring one or more times
([a-c][e-h])+ any one of abcefgh, occurring one or more times.
Discussion:
\ \ in front of a regex operator changes it to an ordinary ascii character.
\ \ Also refers to non-printable ascii characters such as tab and newline (\t and \n).
? means "the preceding character occurs zero or one time"
? * + are called "quantifiers" because they specify the number of times a regex expression must occur
[] always refers to a single character, picked from all those in the square brackets
() parentheses are used for grouping expressions together.
()+ means the expression in the parentheses occurs one or more times
----------------------------------------------------------------
BACKREFERENCING!!!!
-------------------
In addition to "grouping", there is a second, more powerful use for parentheses, called "backreferencing". The idea is that you can save the matching characters to be used later. For example, suppose you want to change date format from 12-31-2005 to 2005_1231.
Use this as your "search-regex":
(12)-(31)-(2005)
and use this as your "replace-regex":
\3_\1\2
In backreferencing, \1 always refers to the contents of the first pair of parentheses in the search-regex, \2 refers to the contents of the second pair, and \3 to the contents of the third pair.
Understanding and using backreferencing is essential if you want to take advantage of the powerful regex capability of BRU.
OTHER PROGRAMS
--------------
Here are some programs that can help you get comfortable with regexes, before you start changing your filenames with BRU.
1. TextPad is shareware with an unlimited trial duration [url]http://www.textpad.com/[/url]
This is my favorite text editor. The main thing you need to know is that the grouping symbol is \( \) instead of (). Otherwise, regex-gurus-in-training can assume Textpad regexes are identical to BRU.
To get started:
- Open a text file
- Search menu > Find...
- Make sure that you've selected the "Regular expressions" check box.
- Type in a regex, and click the Find Next button.
To try out the above backreferencing example:
In a text file, type 12-31-2005
- Search menu > Replace...
- Make sure that you've selected the "Regular expressions" check box.
- Find what: \(12\)-\(31\)-\(2005\)
- Replace with: \3_\1\2
- Click the "Find Next" button
- Click the "Replace" button.
2. Visual Regex [url]http://laurent.riesterer.free.fr/regexp/[/url]
Visual Regex is unique because it highlights each regex group () with a different color, then highlights the matching text in the same color. This lets you see what group is matching what text, helps you debug the regex, and helps you learn more about regular expressions.
3. Regex Buddy [url]http://www.regexbuddy.com/[/url]
4. Regex Coach [url]http://weitz.de/regex-coach/[/url]
5. Regex Designer [url]http://www.radsoftware.com.au/regexdesigner/[/url]
6. The Regulator [url]http://regex.osherove.com/[/url]
Go ahead - Some interesting sides about reg ex,
http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=27
Other threads with examples from common interest will follow:
My RegEx hints:
Expression:
. --> one any single character (char, digit, sign, blank)
a --> the char "a", abc --> the string "abc" literally
(aa|bb) --> one of the alternatives "aa" or "bb"
[ab3-] --> one from list ("a" or "b" or "3" or "-")
[^ab3-] --> one sign but none from the list
[a-z] --> one from the range "a", "b" till "z"
3 --> the digit "3", 2013 --> the number "2013" literally
[6-9] --> one any digit from the range "6", "7" till "9"
\w --> one any letter, digit or underscore
\d --> one any digit from "0", "1" till "9"
\s --> one blank, - --> one hyphen literally
_ --> one underscore, \. --> one dot literally
(...) --> group an expression to apply operators or for backreference
\W \D \S- opposite of lower case \w \d \s
Quantifier:
* --> match greedy zero or more times the previous expression
+ --> match greedy one or more times the previous expression
{3} --> match exactly 3 times the previous expression
{3,} --> match greedy but at least 3 times the previous expression
{3,5} --> match greedy 5, or 4, or 3 times the previous expression
? --> behind * or + or {,} will limit the match to as few as (non-greedy)
? --> behind an expression matches on zero or one occurrence
Boundaries:
\b --> Match at word boundary, \A or ^ --> at start, \Z or $ --> at end of file name
Meta signs:
\ --> use the escape character "\" to match an meta sign itself
Meta signs are: ., \, (, ), [, {, }, +, *, ?, |, ^, $
backreference on replacement:
\1 - insert here what was matched by first (...)-group
\2 - insert here what was matched by second (...)-group
\3 ... \9 - insert what was matched by third, fourth, fifth,... till ninth group
(Note: some flavors use $1 syntax instead of \1)
NOTE: for BRU you have to match always the whole file name, not only the part you are interested in.
Greed, greedy, OR non-greedy, reluctant:
By default, *, ?, +, and {min,max} are greedy because they consume all characters up through the last possible one that still satisfies the entire pattern.
To instead have them stop at the first possible character, follow them with a question mark. For example, the pattern <.+> (which lacks a question mark)
means: "search for a <, followed by one or more of any character, followed by a >". To stop this pattern from matching the entire string <em>text</em>,
append a question mark to the plus sign: <.+?>. This causes the match to stop at the first '>' and thus it matches only the first tag <em>
Find more information:
http://www.regular-expressions.info/reference.html
http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
########################## my template #########################
BEFORE:
AFTER:
LEGEND:
USE:
RegEx(1)
Search: "(.+) - (.+)"
Replace: "\2 - \1"
INSTRUCTIONS:
Don't use the quotes "", they are only for clarifying.
"[ ] Include Ext." is unchecked.
"Options > Ignore... > File Extensions" is unchecked.
Select a few files in the Name column to see what happens in the NewName column.
More about RegEx there >>> http://www.bulkrenameutility.co.uk/forum/viewtopic.php?f=3&t=96
Remember to test this with test files first. And always do backups before you manipulate your important real files!
EXPLANATION:
.