使用Objective-C将HTML文本转换为纯文本
I have huge NSString
with HTML text inside. The length of this string is more then 3.500.000 characters. How can i convert this HTML text to NSString
with plain text inside. I was using scanner , but it works too slowly. Any idea ?
我有巨大的NSString,里面有HTML文本。该字符串的长度超过3.500.000个字符。如何将此HTML文本转换为带有纯文本的NSString。我使用的是扫描仪,但效果太慢了。任何想法 ?
7 个解决方案
#1
66
It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:
这取决于您所定位的iOS版本。从iOS7开始,有一种内置方法,不仅可以去除HTML标记,还可以将格式设置为字符串:
Xcode 9/Swift 4
Xcode 9 / Swift 4
if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
print(attributedString)
}
You can even create an extension like this:
你甚至可以创建这样的扩展:
extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = self.data(using: .utf8) else {
return nil
}
do {
return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("Cannot convert html string to attributed string: \(error)")
return nil
}
}
}
Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.
请注意,此示例代码使用UTF8编码。您甚至可以创建函数而不是计算属性,并将编码添加为参数。
Swift 3
斯威夫特3
let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
Objective-C
Objective-C的
[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
If you just need to remove everything between <
and >
(dirty way!!!), which might be problematic if you have these characters in the string, use this:
如果你只需要删除 <和> 之间的所有内容(脏方式!!!),如果你在字符串中有这些字符可能会有问题,请使用:
- (NSString *)stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
更多相关文章
- 点击后如何使弹出文本消失?
- js去掉html标签和去掉字符串文本的所有的空格
- 如何将添加到ajax html编辑器的文本保存为html文档?
- 如何将每个单词都包含在一个span中,同时保留文本格式
- 文本输入占位符不在IE和Firefox中显示
- knitr html输出中的字符串太长
- 从数组中构建越来越长的字符串
- php 读取文本文件
- 如何解析命令行字符串来使用regex获取每个参数?