I was fetching data from a website using its API which was returning the data in JSON format. The issue was when there where some umlaut characters in the JSON. It would return its UNICODE, for e.g. Münich would be Mu\u0308nich.

我使用其以JSON格式返回数据的API从网站获取数据。问题是在JSON中有一些变音字符。它将返回其UNICODE,例如慕尼黑将是Mu \ u0308nich。

When I passed this JSON string to the constructor of the org.codehaus.jettison.json.JSONObject, Mu\u0308nich was converted to Munich (n has an umlaut). Wrong.

当我将这个JSON字符串传递给org.codehaus.jettison.json.JSONObject的构造函数时,Mu \ u0308nich被转换为慕尼黑(n有一个变音符号)。错误。

I realized this very late (after fetching the entire data). Now I use the following method to convert it back to the Unicode form i.e. I pass Munich (n has an umlaut) to the method and it returns Mu\u0308nich.

我很晚才意识到这一点(在获取整个数据之后)。现在我使用以下方法将其转换回Unicode格式,即我将慕尼黑(n有一个变音符号)传递给方法,它返回Mu \ u0308nich。

I want to somehow convert this Mu\u0308nich to Münich. Any ideas?

我想以某种方式将这个Mu \ u0308nich转换成Münich。有任何想法吗?

Please note the conversion is needed only for u\u0308 to ü and o\u0308 to ö and a\u0308 to ä and so on.

请注意,仅需要转换为u和u0308到ü和o \ u0308到ö以及\ u0308到ä等等。

Method used to convert back -

用于转换回来的方法 -

public static String escapeUnicode(String input) {
    StringBuilder b = new StringBuilder(input.length());
    Formatter f = new Formatter(b);
    for (char c : input.toCharArray()) {
        if (c < 128) {
            b.append(c);
        } else {
            f.format("\\u%04x", (int) c);
        }
    }
    return b.toString();
}

1 个解决方案

#1


3

These are called Diacritics and you can use Normalizer to combine diacritics into single unicode characters.

这些被称为Diacritics,您可以使用Normalizer将变音符号组合成单个unicode字符。

Use the normalize method and as Form NFKC. This will first decompose the full string into diacritics and then do a composition to return 'real' unicode umlauts.

使用normalize方法和Form NFKC。这将首先将完整的字符串分解为变音符号,然后进行合成以返回“真实的”unicode变音符号。

So: 'München' stays 'München' and 'Mu\u0308nchen' will become 'München'

所以:''慕尼黑'留'慕尼黑','慕\ '308nchen'将成为'慕尼黑'

You then will have the string in a single format, not using diacritics anymore and easily portable and displayable.

然后,您将拥有单一格式的字符串,不再使用变音符号,并且易于移植和显示。

If you work with texts from different platforms, some normalization is crucial or you will end up with the problems you described.

如果您使用来自不同平台的文本,则一些规范化至关重要,否则您最终会遇到所描述的问题。

更多相关文章

  1. Java区分大小写字母数字和符号

随机推荐

  1. Android属性之build.prop,及property_get/
  2. Android 在 LinearLayout 添加分割线 div
  3. 如何给你的Android 安装文件(APK)瘦身
  4. Android(安卓)给 app默认权限(不弹窗申请
  5. Android 动态logo bootanimation.zip 制
  6. 【EditText】Android设置EditText不可编
  7. View动画
  8. Android(安卓)一个简单的自定义WheelView
  9. Android 圆角图片,基于Glide4.9 的 Bitmap
  10. android app 启动会白屏的解决办法