I have huge NSString with HTML text inside. The length of this string is more then 3.500.000 characters. How can i convert this HTML text to NSString with plain text inside. I was using scanner , but it works too slowly. Any idea ?

我有巨大的NSString,里面有HTML文本。该字符串的长度超过3.500.000个字符。如何将此HTML文本转换为带有纯文本的NSString。我使用的是扫描仪,但效果太慢了。任何想法 ?

7 个解决方案

#1


66

It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:

这取决于您所定位的iOS版本。从iOS7开始,有一种内置方法,不仅可以去除HTML标记,还可以将格式设置为字符串:

Xcode 9/Swift 4

Xcode 9 / Swift 4

if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
    print(attributedString)
}

You can even create an extension like this:

你甚至可以创建这样的扩展:

extension String {
    var htmlToAttributedString: NSAttributedString? {
        guard let data = self.data(using: .utf8) else {
            return nil
        }

        do {
            return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
        } catch {
            print("Cannot convert html string to attributed string: \(error)")
            return nil
        }
    }
}

Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.

请注意,此示例代码使用UTF8编码。您甚至可以创建函数而不是计算属性,并将编码添加为参数。

Swift 3

斯威夫特3

let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
                                              options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
                                              documentAttributes: nil)

Objective-C

Objective-C的

[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

If you just need to remove everything between < and > (dirty way!!!), which might be problematic if you have these characters in the string, use this:

如果你只需要删除 <和> 之间的所有内容(脏方式!!!),如果你在字符串中有这些字符可能会有问题,请使用:

- (NSString *)stringByStrippingHTML {
   NSRange r;
   NSString *s = [[self copy] autorelease];
   while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
     s = [s stringByReplacingCharactersInRange:r withString:@""];
   return s;
}

更多相关文章

  1. 点击后如何使弹出文本消失?
  2. js去掉html标签和去掉字符串文本的所有的空格
  3. 如何将添加到ajax html编辑器的文本保存为html文档?
  4. 如何将每个单词都包含在一个span中,同时保留文本格式
  5. 文本输入占位符不在IE和Firefox中显示
  6. knitr html输出中的字符串太长
  7. 从数组中构建越来越长的字符串
  8. php 读取文本文件
  9. 如何解析命令行字符串来使用regex获取每个参数?

随机推荐

  1. Android(安卓)面试(四):Android(安卓)Servic
  2. Android支付宝沙箱环境使用教程
  3. Android(安卓)Studio 第五十五期 - Studi
  4. JAVA与Android(安卓)世界级序列化危机与
  5. Android(安卓)进阶
  6. 如何设置Android的AVD模拟器可以输入中文
  7. Android(安卓)toast的获取
  8. 面向开发者的最佳 Android(安卓)库列表
  9. Android(安卓)如何实现带滚动条的TextVie
  10. Tinker Android热补丁