View Full Version : XML errors due to chinese characters


zyrorl
03-04-2006, 03:05 AM
Well parsing XML from the KML with chinese characters breaks almost all parsers with UTF-8 encoding. While google earth glosses over this error, most parsers dont, including internet explorer.

Someone using a name that has chinese characters on it took over suva, those characters should be banned from people's names.

"Suva (b壭)
Owned By: b壭
Armies Inside: 2
Value: 893,354

View city details

Directions: To here - From here"

Illegal characters should not be allowed to be entered into people's names.
Either that or the chinese character set has to be used so it doesnt kill xml parsers.

However... the best way to deal with this is to simply convert to UTF-8

There are two ways of doing this... there's the utf8_encode (http://au.php.net/manual/en/function.utf8-encode.php) function in PHP that will fix the problem... or to use the mbstrings library (which needs to be compiled with php) in order to convert it to UTF-8.

The XML is declared as UTF-8 with illegal non UTF-8 characters when a user with BIG5 or SJIS or other whacky character sets are used by gewar users.

Pestilence
03-04-2006, 03:15 AM
I can't read any of these chinese names, they just show up as boxes, like wingdings or something.

zyrorl
03-04-2006, 03:21 AM
I can't read any of these chinese names, they just show up as boxes, like wingdings or something.

they basically screw everything up... ive provided a solution hopefully they will adopt it...

in the mean time ive put a stopgap solution by forcing character set conversions before i parse through xml/kml files.

KingEmperor24
03-04-2006, 03:30 AM
they basically screw everything up... ive provided a solution hopefully they will adopt it...

in the mean time ive put a stopgap solution by forcing character set conversions before i parse through xml/kml files.

I have posted this before, but a simple solution would be to restrict people from registring with non-ASCII characters. I'm not sure if Luke can control this on the forum, but maybe if someone uses non-ASCII characters when they click "Get Started" It redirects them to a post or a rule where it says "Only ASCII Characters allowed. Please create a different acount following the rules." This way there will not be any playing people with chinese characters.

axio
03-04-2006, 03:37 AM
For those interested - if you want to see all the chinese characters in google earth instead of square boxes:

Right click in IE -> encoding -> chinese (i used simplified)
It will then ask if you want to install the language pack, click ok.
Then set your language set back to western european (or whatever you were previously using)
It may require a reboot and your windows install disk

theres probably some fancy file you can download but this way takes about 5 seconds heh.

birq
03-04-2006, 03:50 AM
I have posted this before, but a simple solution would be to restrict people from registring with non-ASCII characters. I'm not sure if Luke can control this on the forum, but maybe if someone uses non-ASCII characters when they click "Get Started" It redirects them to a post or a rule where it says "Only ASCII Characters allowed. Please create a different acount following the rules." This way there will not be any playing people with chinese characters.

The problem with this version of vBulletin is that you can't tell it to only accept certain characters. You CAN tell it which ones to not accept, but not the other way around. That means that Luke would have to enter ever character in Chinese Simplified, Turkish (at least the the ones that don't have an 8859-1 counterpart), every character in Thai, every character in Japanese, etc.

The new version of vB lets you specify the characters to accept, but that version breaks GEWar.

And ******, it's totally under control of vBulletin -- it's nothing that has been developed in the game. We're limited by what it can do.

Someone mentioned a solution of renaming the users. This breaks GEWar. The only real solution is to delete the users and let them re-register. That won't stop new users from making new accounts with Unicode characters.

We're still discussing how to go about dealing with this.

p0wderfinger
03-04-2006, 03:53 AM
Is there any way to have forum and game usernames separate? Like you have to make a "nickname" for the game after registering for the forums, and then limit the game names to regular characters.

Pestilence
03-04-2006, 04:01 AM
I would just like a way to tell which box name is attacking me so that I can get their past info and see how far out their armies are on google earth. Right now I just got a bunch of boxes that all look the same. That's the main problem I see with it right now, I could care less what the name actually means.

KingEmperor24
03-04-2006, 04:07 AM
The problem with this version of vBulletin is that you can't tell it to only accept certain characters. You CAN tell it which ones to not accept, but not the other way around. That means that Luke would have to enter ever character in Chinese Simplified, Turkish (at least the the ones that don't have an 8859-1 counterpart), every character in Thai, every character in Japanese, etc.

The new version of vB lets you specify the characters to accept, but that version breaks GEWar.

And ******, it's totally under control of vBulletin -- it's nothing that has been developed in the game. We're limited by what it can do.

Someone mentioned a solution of renaming the users. This breaks GEWar. The only real solution is to delete the users and let them re-register. That won't stop new users from making new accounts with Unicode characters.

We're still discussing how to go about dealing with this.

I thought that might be the case. P0wder, I like what your idea trys to accomplish, but sending pms would become very difficult; from the nickname you have to find the forum name.
How about there is a small script in the homebase placer that goes something like this:
If non-ASCII Characters are used, then do not allow placement of homebase. This solves the game's problems but not the forum. I'm more concerned about the game.

axio
03-04-2006, 04:49 AM
For those interested - if you want to see all the chinese characters in google earth instead of square boxes:

Right click in IE -> encoding -> chinese (i used simplified)
It will then ask if you want to install the language pack, click ok.
Then set your language set back to western european (or whatever you were previously using)
It may require a reboot and your windows install disk

theres probably some fancy file you can download but this way takes about 5 seconds heh.

i forget to mention this method will also enable chinese characters in Firefox (FF shares stuff) and should in other browsers.

Here is what my Firefox looks like after installing the language pack.

http://img233.imageshack.us/img233/2532/test3ws.gif

speedfreak227
03-04-2006, 04:59 AM
For those interested - if you want to see all the chinese characters in google earth instead of square boxes:

Right click in IE -> encoding -> chinese (i used simplified)
It will then ask if you want to install the language pack, click ok.
Then set your language set back to western european (or whatever you were previously using)
It may require a reboot and your windows install disk

theres probably some fancy file you can download but this way takes about 5 seconds heh.
and for those of us using Firefox?

speedfreak227

birq
03-04-2006, 05:02 AM
and for those of us using Firefox?

speedfreak227

You can set encoding in Firefox, too: View -> Character Encoding -> UTF-8

That's all it should take. If that doesn't do it, get a Mac. If that still doesn't do it, try Linux. :)

axio
03-04-2006, 05:05 AM
and for those of us using Firefox?

speedfreak227

read my updated post in this thread :) it applies to FF too.

KingEmperor24
03-04-2006, 05:23 AM
You can set encoding in Firefox, too: View -> Character Encoding -> UTF-8

That's all it should take. If that doesn't do it, get a Mac. If that still doesn't do it, try Linux. :)

If that doesn't do it, use FreeBSD

birq
03-04-2006, 05:25 AM
If that doesn't do it, use FreeBSD

I was going to suggest writing an operating system next, but FreeBSD, Solaris, Solbourne, MS-DOS 2.1 and GeoWorks are all good options too.

speedfreak227
03-04-2006, 06:10 AM
You can set encoding in Firefox, too: View -> Character Encoding -> UTF-8

That's all it should take. If that doesn't do it, get a Mac. If that still doesn't do it, try Linux. :)

i tried that but it didn't display the ?????? any differently. i tried refreshing the page and reloading the link to top players. neither worked.

can we kill them now? ;uzi ???? :shoot

speedfreak227

zyrorl
03-04-2006, 06:11 AM
The problem with this version of vBulletin is that you can't tell it to only accept certain characters. You CAN tell it which ones to not accept, but not the other way around. That means that Luke would have to enter ever character in Chinese Simplified, Turkish (at least the the ones that don't have an 8859-1 counterpart), every character in Thai, every character in Japanese, etc.

The new version of vB lets you specify the characters to accept, but that version breaks GEWar.

And ******, it's totally under control of vBulletin -- it's nothing that has been developed in the game. We're limited by what it can do.

Someone mentioned a solution of renaming the users. This breaks GEWar. The only real solution is to delete the users and let them re-register. That won't stop new users from making new accounts with Unicode characters.

We're still discussing how to go about dealing with this.

Rubbish. You can modify it so you can convert the characters in the usernames to UTF-8.

Those characters are not UTF-8. vBulletin or not, it makes no difference, you can make those modifications, and if you're unable to ill gladly lend a helping hand, im an experienced programmer with an expertise in PHP (proven).

Renaming users would not break GEWar if done correctly.

Its 100% possible to fix the unicode characters.

Now the best part is this.. YOU DONT HAVE to modify vBulletin at all. All you have to do is make sure the XML and KML outputs are modified by converting everything to UTF-8.

Yes thats right, as you're generating the city data, army data etc etc and outputting it as kml, a simple line like $kml_data=utf8_convert($original_screwedup_bad_dat a_with_bad_charsets) and then outputting $kml_data rather than the$original_screwedup_bad_data_with_bad_charsets.

There are many many many other solutions i could present if you wish but i think thats the easiest and most satisfactory one, as well as the most effortless and ensures you're outputting standards compliant xml/kml output.

Easy fix.

speedfreak227
03-05-2006, 08:51 AM
so what's happening with this problem?

speedfreak227

chinarex
03-16-2006, 03:12 PM
i forget to mention this method will also enable chinese characters in Firefox (FF shares stuff) and should in other browsers.

Here is what my Firefox looks like after installing the language pack.

http://img233.imageshack.us/img233/2532/test3ws.gif

欣赏这个兄弟熟练的使用WINDOWS系统的各个国家语音支持能力。..我的计算机就可以同时显示你的德国 .英国,意大利,俄罗斯,法国以及日本鬼语.....
为了尊重一些对手和玩家我宁可花点点时间安装这些功能软件....但你们这些老英文人.不能抨击我们的国家 和我们的文字。....怪你们计算机水平能力差运行WINDOWS2000以上系统的计算机都可以做到的基 本功能被你们说成是泥斯湖怪兽....太可笑了...

Luke
03-16-2006, 03:17 PM
Working on the problem. It causes more problems then just the XML errors.

We delete every new user that signs up with Chinese signs, and some older(from a week back) that didn't have much money and troops.
The other users will have a chance to change their usernames to english ones.

Melek~Taus
03-16-2006, 03:29 PM
欣赏这个兄弟熟练的使用WINDOWS系统的各个国家语音支持能力。..我的计算机就可以同时显示你的德国 .英国,意大利,俄罗斯,法国以及日本鬼语.....
为了尊重一些对手和玩家我宁可花点点时间安装这些功能软件....但你们这些老英文人.不能抨击我们的国家 和我们的文字。....怪你们计算机水平能力差运行WINDOWS2000以上系统的计算机都可以做到的基 本功能被你们说成是泥斯湖怪兽....太可笑了...

for those who don't feel like translating:

Appreciates this Brother the skilled use WINDOWS system each national pronunciation to support the ability. My computer may simultaneously demonstrate your Germany England, Italy, Russia, French as well as Japanese clever language in order to respect some matches and plays the family I rather a water-drop design time to install these function software but your these old English people Cannot attack our country and our writing. Blamed your computer horizontal ability difference to move the basic function which above WINDOWS2000 the system computer all was allowed to achieve by you to talk into is putty Si lake 怪兽 too is laughable ~Babel fish

Luke
03-16-2006, 04:01 PM
To chinarex:

Please ONLY talk English on this forum.

wmax351
03-17-2006, 06:30 AM
Just Post in Big Letters:


Registering with Special Characters will result in a permanent IP ban!


When The guy registers, IP ban him.

p0wderfinger
03-17-2006, 06:32 AM
To chinarex:

Please ONLY talk English on this forum.
No no that's not how you get his attention.

If I may clarify:

Please English talk ONLY on forum this.

grimsacre
03-17-2006, 10:46 AM
No no that's not how you get his attention.

If I may clarify:

Please English talk ONLY on forum this.Him ban just!

apollosmith
03-18-2006, 02:07 AM
I hate to bring this topic back up, but the chinese characters are still causing problems in the KML files. For instance, you can't see any details for the armies owned by China_??? (not sure what the ?'s are cuz I'm not gunna install chinese fonts). All you see in GE is the army name - no info about where it's going or when it's gunna get there. This clearly gives players like this a distinct advantage over the rest of us, especially if the army name is edited, you won't even know who owns it.

axio
03-18-2006, 03:53 AM
interesting observation apollosmith, i have the language pack install and tried naming one of my armies "CHINA_钓鱼岛70561" and it said "Your army name cannot exceed 15 characters." ... a bug?

KingEmperor24
03-18-2006, 04:01 AM
interesting observation apollosmith, i have the language pack install and tried naming one of my armies "CHINA_钓鱼岛70561" and it said "Your army name cannot exceed 15 characters." ... a bug?

I have noticed stuf like this before. Automatic naming does not have to comply with the 15 character limit. I have an army called "KingEmperor2476812." (18 characters)

coconut
03-29-2006, 09:34 PM
Edited:nevermind