Author |
|
davidg Newbie
Joined: 10 April 2007
Online Status: Offline Posts: 2
|
Posted: 10 April 2007 at 6:00pm | IP Logged
|
|
|
Hello,
I am using MailBee.IMAP4 to read email messages from a mail server. I then read the message subject and body (via AltBodyText property) and insert into an XML Document. I have found that an 'invalid character was found in text content' error is thrown by the XML parser when there are japanese characters in the email body. I have not run into this problem with other languages. I also do not receive an error when I paste the japanese characters directly into an XML document. I have tried setting IMAP4.CodepageMode = 0 AND IMAP4.Codepage = 65001 but this did not help. Do you have any suggestions to help me solve this problem?
Thank you,
David
|
Back to Top |
|
|
Andrew AfterLogic Support
Joined: 28 April 2006 Location: United States
Online Status: Offline Posts: 1189
|
Posted: 11 April 2007 at 7:31am | IP Logged
|
|
|
XML documents must not contain certain characters because they can break its structure. First of all, you should specify Japanese encoding for the XML document in your XML-parser. Also, text containing non-latin characters should be encapsulated in CDATA section.
If this doesn't help, you should encode the text into UTF-8 encoding before passing to your XML parser, for instance through ToUTF8 method.
Besides that, some special characters which is used in XML format must be encoded to XML-entities like < > etc.
To learn more about XML, please refer to http://www.w3schools.com/xml/ or similar resources.
Best regards,
Andrew
|
Back to Top |
|
|
davidg Newbie
Joined: 10 April 2007
Online Status: Offline Posts: 2
|
Posted: 11 April 2007 at 8:55am | IP Logged
|
|
|
Hi Andrew,
Thanks for your response.
I am using UTF-8 encoding in my XML document, Japanese characters are valid UTF-8, and I am using CDATA sections in my XML. I have also tried using the ToUTF8 method without success.
I am able to manually add the exact same Japanese characters directly to an XML document without error. I only receive the error when I attempt to add the characters to the XML document using Message.AltBodyText or Message.BodyText. This leads me to believe this is not an XML or XML parser issue. I am speculating that MailBee adds some low level ASCII characters (i.e. below ASCII 32) during the conversion which creates the invalid XML. Is that possible?
Also, I did not have any problems with Korean or Chinese characters (or any other language).
Do you have any other suggestions?
Thanks,
David
|
Back to Top |
|
|
Andrew AfterLogic Support
Joined: 28 April 2006 Location: United States
Online Status: Offline Posts: 1189
|
Posted: 12 April 2007 at 3:52am | IP Logged
|
|
|
To let us investigate the issue, could you please provide us with a sample of message containing Japanese characters and a part of the source code of your application which tries to add Japanese message body into XML document? Please use our support form for this purpose.
Best regards,
Andrew
|
Back to Top |
|
|