1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10
article ='''
Big data analytics and business analytics
by Duan, Lian; Xiong, Ye
Over the past few decades, with the development of automatic identification, data capture and storage technologies, 
people generate data much faster and collect data much bigger than ever before in business, science, engineering, education and other areas. 
Big data has emerged as an important area of study for both practitioners and researchers. 
It has huge impacts on data-related problems. 
In this paper, we identify the key issues related to big data analytics and then investigate its applications specifically related to business problems.
'''

split = article.split()
print(split)

#使用空格替换标点符号
article = article.replace(",","").replace(".","").replace(":","").replace(";","").replace("?","")


#大写字母转换成小写字母
exchange = article.lower();
print(exchange)

#生成单词列表
list = exchange.split()
print(list)

#生成词频统计
dic = {}
for i in list:
    count = list.count(i)
    dic[i] = count
print(dic)

#排除特定单词
word = {'and','the','with','in','by','its','for','of','an','to'}
for i in word:
    del(dic[i])
print(dic)

#排序
dic1= sorted(dic.items(),key=lambda d:d[1],reverse= True)
print(dic1)

#输出词频最大的前十位单词
for i in range(10):
    print(dic1[i])

更多相关文章

  1. 【python 编程】网页中文过滤分词及词频统计
  2. 从数据库sql中删除一个单词
  3. 仅在SQL Server数据库中显示包含3个单词的名称
  4. java 正则表达式查找某段字符串中所有小写字母开头的单词并统计

随机推荐

  1. 最近用php写了一个从mysql数据库随机读取
  2. 创造一个多态关系与教义
  3. mysql中 character set 和collation关系
  4. Simple MySQL-C ORM - 简化C语言访问MySQ
  5. 通用的增删改查方法(反射)附带MySQL数据库
  6. 与MySQL服务器进行大量睡眠连接的可能原
  7. Java BoneCP MySQL连接超时
  8. powerdesigner连接MySQL数据库时出现Non
  9. 修改表的列结构
  10. mySQL自动分表问题