I am writing a Python 3 script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function). I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value. The goal is to group each word with the same numerical value into a dictionary. I am having great trouble recombining the split words as numbers and adding them together. I am completely stuck with this script (it is not complete yet.

我正在编写一个Python 3脚本,它将文本文件中的单词转换为数字(我自己的,而不是ASCII,所以没有ord函数)。我已经将每个字母分配给一个整数,并希望每个单词都是其字母数值的总和。目标是将具有相同数值的每个单词分组到字典中。我很难将拆分词重新组合成数字并将它们加在一起。我完全坚持使用这个脚本(它尚未完成。

**Btw, I know the easier way of creating the l_n dictionary below, but since I've already written it out, I am a little lazy to change it for now, but will do so after the completion of the script.

**顺便说一下,我知道下面创建l_n字典的简单方法,但是因为我已经把它写出来了,我现在有点懒于改变它,但是在脚本完成后会这样做。

l_n = {
    "A": 1, "a": 1,
    "B": 2, "b": 2,
    "C": 3, "c": 3,
    "D": 4, "d": 4,
    "E": 5, "e": 5,
    "F": 6, "f": 6,
    "G": 7, "g": 7,
    "H": 8, "h": 8,
    "I": 9, "i": 9,
    "J": 10, "j": 10,
    "K": 11, "k": 11,
    "L": 12, "l": 12,
    "M": 13, "m": 13,
    "N": 14, "n": 14,
    "O": 15, "o": 15,
    "P": 16, "p": 16,
    "Q": 17, "q": 17,
    "R": 18, "r": 18,
    "S": 19, "s": 19,
    "T": 20, "t": 20,
    "U": 21, "u": 21,
    "V": 22, "v": 22,
    "W": 23, "w": 23,
    "X": 24, "x": 24,
    "Y": 25, "y": 25,
    "Z": 26, "z": 26,
    }

words_list = []

def read_words(file):
    opened_file = open(file, "r")
    contents = opened_file.readlines()

    for i in range(len(contents)):
        words_list.extend(contents[i].split())

    opened_file.close()

    return words_list

read_words("file1.txt")
new_words_list = list(set(words_list))

numbers_list = []
w_n = {}

def words_to_numbers(new_words_list, l_n):
    local_list = new_words_list[:]
    local_number_list = []

    for word in local_list:
        local_number_list.append(word.split())
        for key in l_n:
            local_number_list = local_number_list.replace( **#I am stuck on the logic in this section.**

words_to_numbers(new_words_list, l_n)
print(local_list)

I've tried looking for an answer on stackoverflow but was unable to find an answer.

我试过在stackoverflow上寻找答案,但无法找到答案。

Thank you for your help.

感谢您的帮助。

3 个解决方案

#1


6

You will have to handle punctuation but you just need to sum the value of each words letters and group them which you can do with a defaultdict:

您将不得不处理标点符号,但您只需要将每个单词字母的值相加并将它们分组,您可以使用defaultdict:

lines = """am writing a Python script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function).
I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value.
The goal is to group each word with the same numerical value into a dictionary.
I am having great trouble recombining the split words as numbers and adding them together"""

from collections import defaultdict

d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Output:

输出:

from pprint import pprint as pp

pp(dict(d))
{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['(my', 'same'],
 39: ['adding'],
 41: ['ASCII,'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own,'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value.', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ["letters'"],
 100: ['writing'],
 102: ['function).'],
 109: ['recombining'],
 118: ['dictionary.']}

sum(l_n.get(ch,0) for ch in word) gets the sum of all the letters in the word, we use that as the key and just append the word as the value. The defaultdict handles repeated keys so we end you with all the words that have the same sum grouped in lists.

sum(单词中为ch的l_n.get(ch,0))得到单词中所有字母的总和,我们将其用作键,只需将单词作为值附加。 defaultdict处理重复的键,因此我们将结束列表中具有相同总和的所有单词。

Also as John commented you can simply store a set of lowercase letters in the dict and call .lower sum(l_n.get(ch,0) for ch in word.lower())

同样,John评论说你可以简单地在dict中存储一组小写字母,并在word.lower()中为ch调用.lower sum(l_n.get(ch,0))

If you want to remove all punctuation you can use str.translate:

如果要删除所有标点符号,可以使用str.translate:

from collections import defaultdict
from string import punctuation
d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        word = word.translate(None,punctuation)
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Which would output:

哪个输出:

{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['my', 'same'],
 39: ['adding'],
 41: ['ASCII'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ['letters'],
 100: ['writing'],
 102: ['function'],
 109: ['recombining'],
 118: ['dictionary']}

If you don't want duplicate words appearing then use a set:

如果您不想出现重复的单词,请使用集合:

d = defaultdict(set)
....
d[sum(l_n.get(ch,0) for ch in word)].add(word)

更多相关文章

  1. scikit-learn:在标记化时不要分隔带连字符的单词
  2. [LeetCode] 244. Shortest Word Distance II 最短单词距离 II
  3. python1.返回一个字符串中出现次数第二多的单词 2.字符串中可能
  4. 从数据库sql中删除一个单词
  5. sqlserver2008r2查找非中文字母数字出现的第一个位置
  6. 仅在SQL Server数据库中显示包含3个单词的名称
  7. 744.寻找比目标字母大的最小字母(Find Smallest Letter Greater
  8. Java区分大小写字母数字和符号
  9. java 正则表达式查找某段字符串中所有小写字母开头的单词并统计

随机推荐

  1. 分析PHP URL中特殊字符引起的问题(+,\,=
  2. 一个算法示例:PHP实现开心消消乐
  3. 分享三种php生成二维码的方法
  4. 快看!这里有一个PDOStatement::bindParam
  5. PHP+JavaScript实现刷新继续保持倒计时的
  6. 用PHP的FFI调用cjieba
  7. PHP基础案例二:计算学生年龄
  8. PHP结合MySQL实现千万级数据处理
  9. PHP基础案例一:展示学生资料卡
  10. php如何调用phantomJS截图