Search The ForumSearch   RegisterRegister  LoginLogin

MailBee IMAP4

 AfterLogic Forum : MailBee IMAP4
Subject Topic: Japanese characters & invalid UTF-8 Post ReplyPost New Topic
Author
Message << Prev Topic | Next Topic >>
davidg
Newbie
Newbie


Joined: 10 April 2007
Online Status: Offline
Posts: 2
Posted: 10 April 2007 at 6:00pm | IP Logged Quote davidg

Hello,

I am using MailBee.IMAP4 to read email messages from a mail server. I then read the message subject and body (via AltBodyText property) and insert into an XML Document. I have found that an 'invalid character was found in text content' error is thrown by the XML parser when there are japanese characters in the email body. I have not run into this problem with other languages. I also do not receive an error when I paste the japanese characters directly into an XML document. I have tried setting IMAP4.CodepageMode = 0 AND IMAP4.Codepage = 65001 but this did not help. Do you have any suggestions to help me solve this problem?

Thank you,
David
Back to Top View davidg's Profile Search for other posts by davidg
 
Andrew
AfterLogic Support
AfterLogic Support


Joined: 28 April 2006
Location: United States
Online Status: Offline
Posts: 1189
Posted: 11 April 2007 at 7:31am | IP Logged Quote Andrew

XML documents must not contain certain characters because they can break its structure. First of all, you should specify Japanese encoding for the XML document in your XML-parser. Also, text containing non-latin characters should be encapsulated in CDATA section.

If this doesn't help, you should encode the text into UTF-8 encoding before passing to your XML parser, for instance through ToUTF8 method.

Besides that, some special characters which is used in XML format must be encoded to XML-entities like < > etc.

To learn more about XML, please refer to http://www.w3schools.com/xml/ or similar resources.

Best regards,
Andrew
Back to Top View Andrew's Profile Search for other posts by Andrew
 
davidg
Newbie
Newbie


Joined: 10 April 2007
Online Status: Offline
Posts: 2
Posted: 11 April 2007 at 8:55am | IP Logged Quote davidg

Hi Andrew,

Thanks for your response.

I am using UTF-8 encoding in my XML document, Japanese characters are valid UTF-8, and I am using CDATA sections in my XML. I have also tried using the ToUTF8 method without success.

I am able to manually add the exact same Japanese characters directly to an XML document without error. I only receive the error when I attempt to add the characters to the XML document using Message.AltBodyText or Message.BodyText. This leads me to believe this is not an XML or XML parser issue. I am speculating that MailBee adds some low level ASCII characters (i.e. below ASCII 32) during the conversion which creates the invalid XML. Is that possible?

Also, I did not have any problems with Korean or Chinese characters (or any other language).

Do you have any other suggestions?

Thanks,
David
Back to Top View davidg's Profile Search for other posts by davidg
 
Andrew
AfterLogic Support
AfterLogic Support


Joined: 28 April 2006
Location: United States
Online Status: Offline
Posts: 1189
Posted: 12 April 2007 at 3:52am | IP Logged Quote Andrew

To let us investigate the issue, could you please provide us with a sample of message containing Japanese characters and a part of the source code of your application which tries to add Japanese message body into XML document? Please use our support form for this purpose.


Best regards,
Andrew
Back to Top View Andrew's Profile Search for other posts by Andrew
 

If you wish to post a reply to this topic you must first login
If you are not already registered you must first register

  Post ReplyPost New Topic
Printable version Printable version

Forum Jump

Powered by Web Wiz Forums version 7.9
Copyright ©2001-2004 Web Wiz Guide