RegEx Q: How to Remove All characters after a Number Series?

A swapping-ground for Regular Expression syntax

RegEx Q: How to Remove All characters after a Number Series?

Postby henry » Wed Aug 01, 2007 9:24 pm

I have hundreds of files (and folders) with names such as this
(and will have tons more on a continuing basis, so any help is greatly appreciated!!

Example:
---------

My File v1.2.3 msgid_2586720_My File v1.[2].[3]

The filename is always variable length, any number of characters, alpha-numeric, ANY keyboard valid character. basically.

Two parts of the name that are set:

1) "msgid" which immediately follows the base filename (& is often, but not always, preceded by a single space)

2) the 7-digit number (always 7) (which has an _ [underscore character] both before & after it.

Ideally, what I want is a new filename/foldername that transforms the above to:

My File v1.2.3 _2586720

IE, retain the base filename; delete the "msgid" ; keep the 7-digit number with just the first underscore (_); and delete everything after the number.

Obviously, getting rid of the "msgid" is no problem, and any extra spaces, thanks to Bulk Rename Utility!

Again, every part of the filename is of indeterminate length (ie, not fixed) +except+ for the "msgid" and the number.

I know a bit about regular expressions, (enough to do basic ops & a few advanced ones) but this one has me really stumped.

I'm guessing regex is the best/only way to tackle this problem?

Thanks in advance for any help you can provide. I did search through this (and other) forums for any pointers, but couldn't find anything that seemed to answer my problem. If I've overlooked a posting about a similar situation, please excuse me & you only need to paste in that link here, so as not to waste anyone's time.

Thanks again for a _wonderful_ utility!!
henry
 
Posts: 7
Joined: Wed Aug 01, 2007 8:18 pm

Postby Stefan » Wed Aug 01, 2007 10:31 pm

Hi henry, welcome!

FROM:
>My File v1.2.3 msgid_2586720_My File v1.[2].[3]
TO:
>My File v1.2.3 _2586720


SEARCH: (.+)(\s*msgrid_)(\d{7})
Hint:.......|1.|.........2........|....3....| = backreference groups

REPLACE: \1_\3



This means:
(.+) =========>find all, till
(\s*msgrid_) ===>none or more space followed by 'msgrid' and by an underscore
(\d{7}) =======> followed by 7 digits

Replace by group 1 = My File v1.2.3
followed by an underscore written yourself
followed by group 3 = the seven digits


(Untested)

The trick is to split filename in parts and replace only the one you need.
The groups are for refer back to them (backreferences) and for be sure to split really on THAT position.

EDIT:

See here for more help
http://www.bulkrenameutility.co.uk/Foru ... ic.php?t=5
http://www.bulkrenameutility.co.uk/Foru ... c.php?t=27
Last edited by Stefan on Fri Aug 03, 2007 7:42 am, edited 2 times in total.
Stefan
 
Posts: 736
Joined: Fri Mar 11, 2005 7:46 pm
Location: Germany, EU

Postby Admin » Wed Aug 01, 2007 10:32 pm

Hi,

Others are much more adpet at REs than me, but my first stab would be this:

Match: (.*)(msgid)(_)(\d+)
Replace: \1\3\4

Box 11 (Extension): Remove


There will be more elegant options, and options which handle more situations, but this solves the one you gave as an example.


Jim
Admin
Site Admin
 
Posts: 3091
Joined: Tue Mar 08, 2005 8:39 pm

Thanks so much!

Postby henry » Thu Aug 02, 2007 10:37 pm

Totally amazing! It works, it really really works!

I can't believe how fast both of your responses were, either!

What an excellent renaming utility! I'm definitely going to be donating a bit of dough!

Thanks again.
henry
 
Posts: 7
Joined: Wed Aug 01, 2007 8:18 pm


Return to Regular Expressions