如何使用sed删除特殊字符
I would like to find out how to use sed to ONLY remove the space AND the bizarre characters from the following echo command:
我想了解如何使用sed仅从以下echo命令中删除空格和奇怪的字符:
echo -e "A \xd8\xa8"
So I tried:
所以我尝试过:
echo -e "A \xd8\xa8" | sed -r "s/[^[:print:]]//g"
but doesn't remove anything,
但是没有删除任何东西,
echo -e "A \xd8\xa8" | sed -r "s/[^[:alnum:]]//g"
only removes the space
只删除空间
echo -e "A \xd8\xa8" | sed -r "s/[^[:alpha:]]//g"
(same result),
echo -e "A \xd8\xa8" | sed -r "s/[^[:ascii:]]//g"
returns an error (invalid character class name), and
返回错误(无效的字符类名称),和
echo -e "A \xd8\xa8" | sed -r "s/[^\w ]//g"
removes everything...
Expected result: "A"
预期结果:“A”
Any ideas ?
有任何想法吗 ?
thanks!
3 个解决方案
#1
2
If you want sed
to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.
如果你想sed不考虑例如阿拉伯字符是字母(它们是),您需要设置一个不考虑它们的区域设置。
The "C" locale only considers the basic character set, i.e. only [A-Za-z]
are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):
“C”语言环境仅考虑基本字符集,即仅[A-Za-z]是字母。我假设你想要的是删除那个不是该范围内的角色的所有东西(你的问题很模糊你真正想要的东西):
echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C
Output:
00000000 41 0a
00000002
更多相关文章
- Linux Shell编程(15)——操作字符串
- linux中常用时间和字符串之间相互转化
- Bash脚本删除目录中多个文件名末尾的'x'字符数量?
- 文本文件到字符串数组?
- 字符串处理函数strcat和strtok
- 对linux字符设备的理解(整体架构)
- 嵌入式Linux要学哪些东西?你真的造吗?
- gdb捕获syscall条件和字符串比较
- Linux源码包里有个scripts文件夹,里面放的东西起什么作用?