通过分隔符计数和位置从数据框中提取特定文本
16lz
2021-01-22
Learning regular expressions and stumbled into a bit of a wall. I have the following dataframe:
学习正则表达式,偶然发现了一点墙。我有以下数据帧:
item_data=pandas.DataFrame({'item':['001','002','003'],
'description':['Fishing,Hooks,12-inch','Fishing,Lines','Fish Eggs']})
For each description, I want to be extract everything prior to the second comma ",". If there is no comma, then the original description is retained
对于每个描述,我想在第二个逗号“,”之前提取所有内容。如果没有逗号,则保留原始描述
Results should look like this:
结果应如下所示:
item_data=pandas.DataFrame({'item':['001','002','003'],
'description':['Fishing,Hooks,12-inch','Fishing,Lines','Fish Eggs'],
'new_description':['Fishing,Hooks','Fishing,Lines', 'Fish Eggs']})
Any pointers would be much appreciated.
任何指针都将非常感激。
Thanks.
2 个解决方案
#1
1
Using a regexp...
使用正则表达式...
re.sub("^([^,]*,[^,]*),.*$", "\\1", x)
meaning is
^
start of string(
start capture[^,]
anything but a comma*
zero or more times,
a comma[^,]
anything but a comma*
zero or more times)
end of capture,
another comma.*
anything$
end of string
^字符串的开始
(开始捕捉
[^,]除了逗号之外的任何东西
*零次或多次
一个逗号
[^,]除了逗号之外的任何东西
*零次或多次
)捕获结束
,另一个逗号
$ end of string
Replacing with the content of group 1 (\1
) drops whatever is present after the second comma
替换为组1(\ 1)的内容会删除第二个逗号后出现的内容
更多相关文章
- 正则将长数字转为英式写法(从后向前3个数字一个逗号)
- 嵌入式Linux要学哪些东西?你真的造吗?
- Linux源码包里有个scripts文件夹,里面放的东西起什么作用?
- 今天看了一整天的汇编语言,真发现语言这东西只是一种思想!
- 比较mysql中的两个逗号分隔值并获取匹配的计数
- fragment 状态保存时怎么执行一些需要在onResume、onPause方法里
- jswdk/jsdk/jdk到底分别是什么东西