python pandas库的应用(类比mysql语言)
16lz
2021-01-22
转载自:http://blog.csdn.net/awayyao/article/details/17785473
- create table 我们尝试创建一张表,保存每天的三个值
from pandas import *Out[6]:
import numpy as np
randn = np.random.randn
random_integers = np.random.random_integers
# create table df (id int, dt datetime, a float, b float, c float, primary key (id));
df = DataFrame(randn(30, 3), date_range('1/1/2013', periods=30), columns=list('abc'))
# select * from df order by id asc limit 7;
df.head(7)
- select 语句
# select a,c from df limit 7Out[7]:
df.loc[:,['a','c']].head(7)
- where 子句
# select a,c from df where dt > "2013-01-03" and dt < "2013-01-10"Out[8]:
df.loc[(df.index>"2013-01-03") & (df.index<"2013-01-10"),['a','c']]
- order 子句
# select a,c from df where dt > "2013-01-03" and dt < "2013-01-10" order by a asc, c descOut[9]:
df.loc[(df.index>"2013-01-03") & (df.index<"2013-01-10"),['a','c']].sort_index(by=['a','c'], ascending=[1,0])
接下来我们尝试新建一张存放每天app_id安装记录的表,app_id是app的id,total是安装数。
- insert 子句
# create table install_log (id int, dt datetime, app_id int, total int, primary key (id))Out[10]:
install_log = DataFrame(columns=['app_id', 'total'])
# insert into install_log values(NULL, '2013-01-01', 100, 234);
# for aggregation, 10 times
app100_install = DataFrame([dict(app_id=100, total=random_integers(1000)) for i in range(30)],index=date_range('1/1/2013', periods=30))
install_log = install_log.append(app100_install)
app101_install = DataFrame([dict(app_id=101, total=random_integers(1000)) for i in range(30)],index=date_range('1/1/2013', periods=30))
install_log = install_log.append(app101_install)
install_log.head(3), install_log.tail(3)
- group by 子句
# select app_id, sum(total), avg(total) from install_log group by app_id;
for column_value, column_rows in install_log.groupby('app_id'):
print column_value, column_rows['total'].sum(), column_rows['total'].mean()
还有另一张表维护app_id和app名字之间的映射关系
In[12]:# create table dimapp (id int, app_id int, app_name varchar(255), primary key (id));Out[12]:
# insert into dimapp values(NULL, 100, 'facebook'), (NULL, 101, 'twitter'));
dimapp = DataFrame([dict(app_id=100, app_name='facebook'), dict(app_id=101, app_name='twitter')])
dimapp = dimapp.set_index('app_id')
dimapp
我们把最后的结果进行join
In[13]:# select dimapp.name, t.sum_total from (select app_id,sum(total) as sum_total from install_log group by app_id)t join dimapp on t.app_id=t.app_idOut[13]:
grouped = install_log.groupby('app_id').aggregate(np.sum)
result = grouped.join(dimapp).set_index('app_name')
result.plot(kind='bar')
更多相关文章
- 如何通过使用where子句与字符串格式(varchar(103),...,10)将103格式化
- 每条sql语句实际上都是一个事物(事物多种类型解读)
- sql 语句 更改默认的sql server 数据库
- 怎么用SQL语句实现表中的一个字段加1啊??
- PHP获取MySQL执行sql语句的查询时间
- 删除2行1 Sql语句mysql
- 使用IN语句缓慢mysql删除查询
- 1.4.6 收集sql语句的执行计划 2
- 在SQL SELECT语句中重用别名字段