Is this possible with regular expressions? (tricky)

General discussions about Advanced Find and Replace
Post Reply
levil
Posts: 3
Joined: Mon Apr 10, 2006 6:39 pm

Is this possible with regular expressions? (tricky)

Post by levil »

I have a .txt file with a lot of paragraphs. (NOTE: the paragraphs are separated by one PARAGRAPH MARK)

Can I do the following: I know exactly the beginning of each paragraph, and I know how it ends (obviously, with the paragraph mark).
Can I APPEND some text after the PARAGRAPH MARK?
(context for advanced users: I want to add a closing tag in xml files)

I will illustrate this:
Example of a paragraphs:
(note that these are actually xml tags you will see)

<name>blablablalblabla, text ended by a PARAGRAPH MARK
<contact>blablablabla, text ended by a PARAGRAPH MARK

Remember: I want to add the closing tags.

I want to search for:
<name>, then some arbitrary text (characters as well as numbers, even punctuation etc...) and then the PARAGRAPH MARK
THEN,
I want to append </name> after the PARAGRAPH MARK.

The same goes for <contact>:
append </contact> after the PARAGRAPH MARK.

How do I setup such a SEARCH AND APPEND? My problem is twofold:
1. How do I express the paragraph mark: is this a carriage return, a line feed, a new line?
2. I know that a . stands for one arbitrary character. How do I formulate multiple arbitrary characters?

I am sure this can be done, but I'm losing my hair because I'm keep pulling them out! I can't figure this out. :shock:

Can somebody give some advice. It would be really appreciated! :)

Levil

Abacre
Site Admin
Posts: 1223
Joined: Mon Jan 31, 2005 5:32 pm

Post by Abacre »

Go to main menu - Action - Options - Batch Replace
check "Modifier S"

Go to Batch replace tab, check on "Use regular expressions".
Put into the grid, search for:
<name>.*PARAGRAPH MARK

Replace with:
$0</name>

That's all I verified it works perfectly.

So:
> 1. How do I express the paragraph mark: is this a carriage return, a line feed, a new line?
It's not clear what do you mean. What do you have in your files?

> 2. I know that a . stands for one arbitrary character. How do I formulate multiple arbitrary characters?
Multiple even empty chars will be expressed as .*
Multiple, at least one any char will be expressed as .+
Kind regards,
Abacre Limited
http://www.abacre.com
support@abacre.com

levil
Posts: 3
Joined: Mon Apr 10, 2006 6:39 pm

Post by levil »

Hi Roman,

thank you for your quick reply. I think I did not make myself very clear when I spoke about the paragraph mark. I didn't mean that literally, but I mean the paragraph character itself like this:
Image

So, the before and after should be: (this is an excerpt from my document)
Image

The problem is, that I don't know how to formulate and use this paragraph mark in a regex. Is it \r or \l or \f or ...?

I hope you understand the problem of the paragraph mark.

Do you have a solution for this?

Thanks in advance for your time and help!

Levil

Abacre
Site Admin
Posts: 1223
Joined: Mon Jan 31, 2005 5:32 pm

Post by Abacre »

You should open your file with hex editor.
Probably you have standard \r\n separator.
You may send your file to support@abacre.com and we may look at it.
Kind regards,
Abacre Limited
http://www.abacre.com
support@abacre.com

levil
Posts: 3
Joined: Mon Apr 10, 2006 6:39 pm

Post by levil »

Hi Roman,

I think I first have to figure out what encoding and line ending I should use.

Because when I try to search something without the paragraph mark in it, for example: acti.
then normally, I would think that the word activ gets found. But it doesn't.
(while when I search for acti, it finds it, because no regex is used.

So, it seems, that whenever I try to use regex in the find field, nothing gets found. (I did mark the "use regular expressions" in the options, and tried several modifiers (don't know actually what they are), but it doesn't work.

Originally, I started with a word 2003 document, wich I converted to a txt document (and tried several times with different line ending options (CR only, LF only, CR/LF), and different encodings (unicode, UTF-8, Windows West-European). In each case, the regex doesn't work in the find dialog.

Is this maybe because a txt document loses all its formatting?
I know of the function: "search in MS word files". So I tried it with the original doc itself, (without the use of regex in the find field, just find a particular word), but when i inserted some text with the insert before function, my Word document wasn't readable anymore in Word.

I'm going to send you the file to support@abacre.com. If you find some time, it would be great if you could take a look at it.

Thanks in advance,

Levil

Post Reply