AFR don't find UTF-8 characters in UTF-8 Files without BOM !

General discussions about Advanced Find and Replace
Post Reply
tom7418
Posts: 1
Joined: Sat Sep 12, 2009 7:16 pm

AFR don't find UTF-8 characters in UTF-8 Files without BOM !

Post by tom7418 »

Hello,

I would like to make a find&replace for a batch of files looking for utf-8 character in UTF-8 without BOM files (so ANSI files contening utf-8 characters).
AFR go through the file only in ANSI mode and don't try to find character matching.
Exemple:
Try to find the character "é" on a file save as UTF-8 without BOM. AFR will find nothing.
But if i add BOM to the file, now he found it. So I suspect AFR to parse the file in ANSI only and not trying to match to special characters which are not ANSI (as it should do to detect UTF-8 file without BOM)

So my question:
Do you know how too use AFR with UTF8 without BOM files ?
This is a critical problem, i think.

Thanks.
Tom.

bfmy
Posts: 1
Joined: Tue Sep 15, 2009 3:49 am

Re: AFR don't find UTF-8 characters in UTF-8 Files without BOM !

Post by bfmy »

hello,

i have the same problem and found an "external" solution.

i use "EditPad Lite" a free text editor (google it for website)

in the editor window :
- menu "convert"
- Text encoding
select "unicode, UTF-8"
OK

paste your text : copyright © 2009 - Le Café - èôéç

- menu "convert" - "text encoding"
select "Window 1252: western european"
with "radio button" to "interpret the original data as ..."

OK

copyright © 2009 - Le Café - èôéç

... and the new string works well with "AFR"

better solutions are welcomed !

cheers, bfmy

sorin
Posts: 6
Joined: Thu Nov 05, 2009 10:22 am

Re: AFR don't find UTF-8 characters in UTF-8 Files without BOM !

Post by sorin »

I'm also interested about this issue. I think AFR must be able to find Unicode text in various encodings so it should be an option to search text files without BOM as ANSI and/or UTF-8.

I don't want to open the pandora's box regarding Unicode support: but, in the future, it will be nice to add possibility to search Unicode text by ignoring the normalization form used in the file :D

Post Reply