Getting Started

A swapping-ground for Regular Expression syntax

Getting Started

Postby DBeaton » Wed Mar 09, 2005 11:50 pm

Code: Select all
Regular expressions (regexes) are like super-intelligent wildcards. If you learn regexes, you too can become super-intellegent (and have more fun using BRU).

WILDCARDS
---------
Wildcards are probably familiar to all BRU users.
You can experiment with wild cards using Windows Explorer:
Windows Explorer > "Search" dialog box > "Search for files or folders named"
Summary:
? means "any single character"
* means "zero or more characters"

------------------------------------------------------------
Expression  Matches                      Doesn't match
----------  ---------------------------  -------------------
Notes*      Notes Notes_2005_0302.txt    aNotes
*Notes*     Notes Notes_2005_0302.txt
?Notes*     aNotes aNotes_2005_0302.txt  Notes_2005_0302.txt
------------------------------------------------------------

REGULAR EXPRESSIONS
-------------------
Regexes are the same general idea as wildcards, but are considerably more powerful.
Please glance at the table, then refer to the discussion, below.
-----------------------------------------------
       Equivalent
Regex  Wildcard    Matches
-----  ----------  ----------------------------
cat    cat         The literal characters "cat"
.      ?           Any single character
..     ??          Any two characters
...    ???         Any three characters
.*     *           Zero or more characters
..*    ?*          One or more characters
...*   ??*         Two or more characters
.+     ?*          One or more characters
..+    ??*         Two or more characters

Discussion:
* means "the preceding character occurs zero or more times"
+ means "the preceding character occurs one  or more times"
Why would you ever want a character to occur zero or more times? It means that the character is optional. For example: ru.*n matches run, ruin, and ruffian
On the other hand, ru.+n matches ruin and ruffian, but not run (because we need at least one character between the "u" and "n".)
Note that there are two equivalent ways of saying "one or more characters": ..* and .+
-----------------------------------------------

There are no wildcard equivalents to the following regexes (at least, not in Windows Explorer).
----------------------------------------------------------------
Regex          Matches
-------------  -------------------------------------------------
\.             A period.
\t             A tab character
\n             A newline character
ca?t           "c" followed by zero or one  "a", followed by "t"
ca*t           "c" followed by zero or more "a", followed by "t"
ca+t           "c" followed by one  or more "a", followed by "t"
[efgh]         any one of efgh
[e-h]          any one of efgh
[a-cF-H]       any one of abcFGH
[e-h]*         any one of efgh, occurring zero or more times
[e-h]+         any one of efgh, occurring one or more times
[a-c][e-h]+    any one of abc; followed by any one of efgh, occurring one or more times
([a-c][e-h])+  any one of abcefgh, occurring one or more times.

Discussion:
\   \ in front of a regex operator changes it to an ordinary ascii character.
\   \ Also refers to non-printable ascii characters such as tab and newline (\t and \n).
?   means "the preceding character occurs zero or one time"
? * + are called "quantifiers" because they specify the number of times a regex expression must occur
[]  always refers to a single character, picked from all those in the square brackets
()  parentheses are used for grouping expressions together.
()+ means the expression in the parentheses occurs one or more times
----------------------------------------------------------------

BACKREFERENCING!!!!
-------------------

In addition to "grouping", there is a second, more powerful use for parentheses, called "backreferencing". The idea is that you can save the matching characters to be used later. For example, suppose you want to change date format from 12-31-2005 to 2005_1231.
Use this as your "search-regex":
(12)-(31)-(2005)
and use this  as your "replace-regex":
\3_\1\2
In backreferencing, \1 always refers to the contents of the first pair of parentheses in the search-regex, \2 refers to the contents of the second pair, and \3 to the contents of the third pair.

Understanding and using backreferencing is essential if you want to take advantage of the powerful regex capability of BRU.

OTHER PROGRAMS
--------------
Here are some programs that can help you get comfortable with regexes, before you start changing your filenames with BRU.

1. TextPad is shareware with an unlimited trial duration [url]http://www.textpad.com/[/url]
This is my favorite text editor. The main thing you need to know is that the grouping symbol is \( \) instead of (). Otherwise, regex-gurus-in-training can assume Textpad regexes are identical to BRU.
To get started:
- Open a text file
- Search menu > Find...
-   Make sure that you've selected the "Regular expressions" check box.
-   Type in a regex, and click the Find Next button.

To try out the above backreferencing example:
In a text file, type 12-31-2005
- Search menu > Replace...
-   Make sure that you've selected the "Regular expressions" check box.
-   Find what: \(12\)-\(31\)-\(2005\)
-   Replace with: \3_\1\2
-   Click the "Find Next" button
-   Click the "Replace" button.


2. Visual Regex [url]http://laurent.riesterer.free.fr/regexp/[/url]
Visual Regex is unique because it highlights each regex group () with a different color, then highlights the matching text in the same color. This lets you see what group is matching what text, helps you debug the regex, and helps you learn more about regular expressions.

3. Regex Buddy [url]http://www.regexbuddy.com/[/url]

4. Regex Coach [url]http://weitz.de/regex-coach/[/url]

5. Regex Designer [url]http://www.radsoftware.com.au/regexdesigner/[/url]

6. The Regulator [url]http://regex.osherove.com/[/url]
DBeaton
 
Posts: 2
Joined: Wed Mar 09, 2005 6:14 pm

Postby Brum » Thu Mar 10, 2005 2:05 am

Wow most impressive. And here's me thinking that BRU was just a little programe I use, (A lot) for renaming my groups of photo's.
I can see now that I was misstaken.
Brum
 
Posts: 2
Joined: Thu Mar 10, 2005 1:43 am
Location: UK

Postby Bill S » Thu Mar 10, 2005 8:09 pm

Excellent explanation & tutorial DBeaton...
Bill S
 
Posts: 2
Joined: Wed Mar 09, 2005 9:04 pm
Location: Central NYS - USA

Postby Admin » Fri Mar 11, 2005 8:12 am

I'll second that!

Whilst I incorporated the code to make Regular Expressions work, I still approach them with fear!


Jim
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Getting Started

Postby Dustydog » Tue Aug 23, 2016 1:36 am

Just to put my two bits in - the same guy that makes RegExBuddy makes RegExMagic, which will do its best to walk you through creating a new RegEx. It's no panacea, and you still need to understand what you're doing (to a degree, especially depending on difficulty), but it does make things faster, and it comes with a very nice grep tool built in. It produces RegExes or RegEx code in pretty much any flavor you can think of, it works from a sample and you can keep poking at it until you get it right, and if you do decide to run it via the built-in grep, it shows you a preview of what's going to happen before you run it (sound familiar).

I'm intending on trying BRU's txt file rename feature soon with its help.

At the very least, it screams through changes within multiple files in multiple directories where before I would have had to struggle my way through writing a batch file. Not nearly as analytical as RegExBuddy, but very handy. The fellow also makes an expensive, exceedingly powerful, industrial strength grep that if you need it, you'll know. Long money back guarantee, and brief full-featured trial (a week?). After studying RegExes a bit, I'm certainly convinced that for what they do, they're easier than writing appropriate batch files most of the time - those often scare me more.
Dustydog
 
Posts: 11
Joined: Wed Mar 23, 2016 3:32 am

Re: Getting Started

Postby FileMangler » Sat Sep 03, 2016 6:49 am

When following the links to learning resources and help tools, I always end up with syntax that seems to work only partially for BRU. I've finally come to realize that there is one essential info missing:

WHAT FLAVOR or DIALECT of RegEx does BRU stick to?
Perl 5 or PCRE2 or what is it?
FileMangler
 
Posts: 9
Joined: Sun Aug 07, 2016 11:19 pm

Re: Getting Started

Postby Admin » Tue Sep 06, 2016 8:56 am

Hi, BRU supports PCRE regular expressions.
http://perldoc.perl.org/perlre.html
Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Getting Started

Postby FileMangler » Tue Sep 06, 2016 1:30 pm

Thanks. Very important info.
However, now I am totally confused. When I choose PCRE - 10.10 or 10.20 - from the engine dropdown of RegExBuddy, a quite respectable tool that has never let me down so far - it will generate regex that does NOT work in BRU. :? :x :cry:

The same pattern DOES work in every other PCRE software I tried, such as AutoHotKey or Text Editors and even other renamers. This explains why I cannot rely on the learning resources mentioned here in the forum. It may or may not work in BRU. An example are backreferences that PCRE specifies as $1 and $2 etc. but BRU expects \1 and \2 while all other PCRE engines I tried interpret it as literal 1 and 2 and not as backreference. :x
FileMangler
 
Posts: 9
Joined: Sun Aug 07, 2016 11:19 pm

Re: Getting Started

Postby Admin » Thu Sep 08, 2016 1:25 am

Admin
Site Admin
 
Posts: 2343
Joined: Tue Mar 08, 2005 8:39 pm

Re: Getting Started

Postby regexbuddy » Fri Sep 09, 2016 3:23 am

PCRE itself cannot search-and-replace and does not have a replacement string syntax at all. RegexBuddy disables its Replace mode if you select PCRE as your application.

If BRU uses PCRE for its regex support, then it needs to be using something else or something of its own invention for replacement strings.

In RegexBuddy, Delphi, PHP preg, and R are examples for programming languages with regex support based on PCRE that have their own incompatible replacement syntax. If you select R then RegexBuddy uses the \1 syntax for backreferences in the replacement string.

PCRE2 rather different from PCRE. Though the regex syntax is similar, the replacement syntax is totally new and the API is totally different. You can't select PCRE2 in RegexBuddy for an application that uses PCRE.

http://www.regular-expressions.info/backref.html talks about backreferences in the regular expression itself which is not what you are talking about here.

http://www.regular-expressions.info/replacebackref.html talks about backreferences in the replacement string. There's no mention of PCRE on that page, because PCRE cannot search-and-replace.

Kind regards,
Jan Goyvaerts
regexbuddy
 
Posts: 1
Joined: Fri Sep 09, 2016 2:42 am


Return to Regular Expressions