从正则表达式中浏览和提取字符类
So the problem is a bit reversed: I have a regular expression and would like to extract possibilities from it. I don't have a string yet, I just want to know what would match. An example could be:
所以问题有点逆转:我有一个正则表达式,并希望从中提取可能性。我还没有字符串,我只是想知道什么是匹配的。一个例子可能是:
import re
license = re.compile("^[0-9]{3}[A-Z][0-9]{3}$")
I know that when using re.DEBUG, the list of character classes is displayed in order. Static characters are shown as well. This would be just what I'd like to get, a list of objects representing "parts" of my regular expression. The first would represent the beginning of the string, the next would represent a character class including from 0 to 9 and repeated three times, and so on.
我知道在使用re.DEBUG时,将按顺序显示字符类列表。还显示了静态字符。这就是我想要的,一个表示正则表达式“部分”的对象列表。第一个表示字符串的开头,下一个表示字符类,包括从0到9并重复三次,依此类推。
Is that possible at all using regular expressions? I know it's supposed to be the other way around.
这有可能使用正则表达式吗?我知道它应该是相反的方式。
Thanks for your help,
谢谢你的帮助,
1 个解决方案
#1
0
This was very helpful, thank you. So if anyone needs it, I'll post what I found. The way to do this is to explore the regular expression. This is not straight-forward, it needs a bit of additional coding, but if your use case is simple (like mine), you don't have to worry too much about branching or other advanced features.
这非常有帮助,谢谢。所以如果有人需要它,我会发布我发现的内容。这样做的方法是探索正则表达式。这不是直截了当的,它需要一些额外的编码,但如果你的用例很简单(比如我的),你不必过于担心分支或其他高级功能。
>>> re.sre_parse.parse("^[0-9]{3}[A-Z][0-9]{3}$").data
[('at', 'at_beginning'), ('max_repeat', (3, 3, [('in', [('range', (48, 57))])])), ('in', [('range', (65, 90))]), ('max_repeat', (3, 3, [('in', [('range', (48, 57))])])), ('at', 'at_end')]
>>>
So we have the information here, we just need to handle them as neatly as possible. Some third-party libraries allow random generation with basic regular expressions, but this would work out-of-the-box in Python (probably very old versions).
所以我们在这里有信息,我们只需要尽可能整齐地处理它们。一些第三方库允许使用基本正则表达式进行随机生成,但这在Python中可能是开箱即用的(可能是非常旧的版本)。
更多相关文章
- 简单的python爬取网页字符串内容并保存
- scikit-learn:在标记化时不要分隔带连字符的单词
- 你怎么检查python字符串是否只包含数字?
- Python - 去除字符串首尾填充
- python - pandas或者sklearn中如何将字符形式的标签数字化
- Python处理字符串
- python list range 字符串的截取 如 text[1:5]
- python的list要打印中文字符
- Python——字符格式化