always the same kind of (complex) swapping needed ...

A swapping-ground for Regular Expression syntax

always the same kind of (complex) swapping needed ...

Postby jhsgalaxy » Sat Aug 14, 2021 2:08 pm

Hello all,
I need help as I don't comprehend much of what is said in technical help.

I always get this file-format: (this is an example) >> 08-13 02-17-00_TV-Sender HD Title of recording <<
however, what I would like to have is as follows: >> Title of recording 2021-08-13 + 02'17 <<
Unfortunately it is NOT possible to change that in the settings of the recording software.
Therefore I need to use BRU.
As you can see, it is not only the swapping of parts of the filename but also adding and deleting and changing some.
I do NOT need the TV-broadcaster, but I need to add the year of the recording. Also I don't need the seconds of the time, that's truly stupid !
Furthermore, some broadcaster do have the letters HD as an extra word and some not, some have a double name, some not; so, all this makes a lot of variables to consider.
That's why I don't get it done. I barely manage a simple swapping of 2 words containing numbers ... but that's it.

Is there anybody who is able to help me on that? I'd appreciate it very much.
Thanx in advance regards JHS :roll:
jhsgalaxy
 
Posts: 5
Joined: Wed Dec 28, 2016 2:51 am

Re: always the same kind of (complex) swapping needed ...

Postby Luuk » Sun Aug 15, 2021 6:19 am

You will probably have to give more samples, and explain where to find 'year of recording', so this regex guesses...
1) Year of recording == 2021.
2) Broadcasters == _TV-Broadcaster or _TV-DoubleName-Broadcaster (possibly followed by " HD ").
3) Filenames are like the example format, so they can use a 'Match' and 'Replace' like...

^(\d\d-\d\d) (\d\d)-(\d\d)-\d\d_TV-[^ ]+(?: HD | )?(.+)
\4 2021-\1 + \2'\3
Luuk
 
Posts: 691
Joined: Fri Feb 21, 2020 10:58 pm

Re: always the same kind of (complex) swapping needed ...

Postby jhsgalaxy » Sun Aug 15, 2021 11:40 am

thanx for that first hint.

I thought the year of recording might come from the system-date.
However, that's no issue, as i can add that manually by the "add prefix" function before swapping the file name.

the filename always looks like this: >> 08-13 02-17-00_TV-broadcaster XY HD Title of recording <<
or: >> 08-13 02-17-00_TV-broadcaster abc Title of recording <<
or: >> 08-13 02-17-00_TV-broadcaster channel 3 Title of recording <<
or: >> 08-13 02-17-00_TV-broadcaster sender5 HD Title of recording <<
including underscores and even dots or not
and so on.
However, I already delete all underscores and dots in a first step using BRU

The differences are, first of all in the 'title of recording'
and then, of course, in the name of the broadcaster, as this can be anything from one to several words including digits or not

It would already help to have the broadcaster at last, so to speak, to get by swapping this format:
>>Title of recording 2021-08-13 02-17-00 TV-Sender HD<<
Then I could, in a third step manually delete the part of the filenames after the double 0 by the batch delete function, if necessary.

My main issue is the swapping because this contains always different content, never the same.
jhsgalaxy
 
Posts: 5
Joined: Wed Dec 28, 2016 2:51 am

Identify 2 space separated strings from one another

Postby Luuk » Sun Aug 15, 2021 6:36 pm

Since both "broadcaster name" and "Title of recording" are space-separated words, no regex will be able to separate them without rules.
You could use ' HD ' to separate some of them, but its a very bad idea, because making it harder to discover rules for names without HD.

If its me, I would either build a list for known broadcaster-names or known broadcaster name-formats, so then testing my regexs like...
^(\d\d-\d\d) (\d\d)-(\d\d)-\d\d_TV-broadcaster (?:.*? HD|[Cc]hannel \d+|[^ ]+) (.+)
\4 2021-\1 + \2'\3

But I would NOT use it to remove any broadcaster names, until after Im certain that it matches at least 90% of them without any failures.
This because the more broadcaster names that you remove, the harder it will be to modify your rules for matching the remaining ones.
And then Im certainly saving it, especially if your application cannot support using a separator-string after the broadcaster name.

The only three rules for broadcaster name in the above regex are...
1) Anything ending with ' HD '.
2) 'Channel' or 'channel' followed by 'space' and some digits.
3) Any one-word (any 1-or-more characters without spaces)

More rules will need to be inserted before that last one-word matcher, because matches are found in that exact 1-2-3 order.
So dont use it as-is, because 3 removes the first-word from any multiple-word broadcaster name not first matched by 1 or 2.
This will make it much more difficult to improve your list of rules, before finally saving it.
Luuk
 
Posts: 691
Joined: Fri Feb 21, 2020 10:58 pm

Re: always the same kind of (complex) swapping needed ...

Postby jhsgalaxy » Thu Sep 02, 2021 8:51 pm

Hello,
I thank you very much for your replies. Parts of it gave me the idea, and the trouble with the unwanted channel and / or broadcasternames is solved by a function called "remove".
However, what is left is a date and a name like this:
00-00 00-00-00 title of broadcasting 2021-

Therefore what i would still need, is a kind of regex formula which is able to take the first 2 parts of digits and dashes and simply put them to the end of the filename behind the 2021- which is the same everywhere.
What I try to get is this: title of broadcasting 2021-00-00 00-00-00 (all these zeros are just placeholders for date and time)

I tried something like ^([0-9]*[^a-z]*[^A-Z]*[0-9]*)([^a-z])([0-9]*[^a-z]*[0-9]*[^a-z]*[0-9]*)([^0-9]*)(.*) and then next line replace with 2021-\3\1\2\4, but unfortunately I do not fully succeed.

My trouble still appears to be the amount of words of each individual title.
Could you possibly help ??
Thank you in advance :-)
jhsgalaxy
 
Posts: 5
Joined: Wed Dec 28, 2016 2:51 am

Re: always the same kind of (complex) swapping needed ...

Postby Luuk » Fri Sep 03, 2021 9:01 am

08-13 02-17-00_Some Broadcaster Name HD Title of recording =====> Title of recording 2021-08-13 + 02'17

Since both BroadcasterName and Title of recording are space-separated, no regex could separate them, without first having some rules.
The reason I suggest not removing BroadcasterName, was to first build a list of known BroadcasterNames or BroadcasterName-formats.
This because its much easier to build a list of known BroadcasterName/formats, than building a list for known Title of recordings.
Once complete, the regex just removes BroadcasterName, so that you have a guaranteed Title of recording.

IF you already removed all BroadcasterNames, then it should have been conducted with RegEx(1), so that renaming is now complete?
If you only removed some partial text of most BroadcasterNames? Then I probably cannot help for matching your Title of Recordings.
My recommendation would be moving the partially-removed BroadcasterNames to another folder, so then letting your list build up again.
Then once you have enough BroadcasterNames, complete the rules for Regex(1), so then everything gets renamed properly the first time.

IF you need 08-13 02-17-00 Title of recording 2021- ====> Title of recording 2021-08-13 02-17
^(\d\d-\d\d \d\d-\d\d)-\d\d (.+ 2021-)$
\2\1

IF you need 08-13 02-17-00 Title of recording 2021- ====> Title of recording 2021-08-13 + 02'17
^(\d\d-\d\d )(\d\d)-(\d\d)-\d\d (.+ 2021-)$
\4\1+ \2'\3
Luuk
 
Posts: 691
Joined: Fri Feb 21, 2020 10:58 pm


Return to Regular Expressions


cron