I'm trying to merge a (Pandas 14.1) dataframe and a series. The series should form a new column, with some NAs (since the index values of the series are a subset of the index values of the dataframe).

我正在尝试合并(Pandas 14.1)数据帧和一系列。该系列应该与一些NA形成一个新列(因为该系列的索引值是数据帧的索引值的子集)。

This works for a toy example, but not with my data (detailed below).

这适用于玩具示例,但不适用于我的数据(详见下文)。

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D'))
df1

A   B   C   D
2011-01-01  -0.487926   0.439190    0.194810    0.333896
2011-01-02  1.708024    0.237587    -0.958100   1.418285
2011-01-03  -1.228805   1.266068    -1.755050   -1.476395
2011-01-04  -0.554705   1.342504    0.245934    0.955521
2011-01-05  -0.351260   -0.798270   0.820535    -0.597322
2011-01-06  0.132924    0.501027    -1.139487   1.107873

s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D'))
s1

2011-01-01   -1.660578
2011-01-03   -0.209688
2011-01-05    0.546146
Freq: 2D, Name: foo, dtype: float64

pd.concat([df1, s1],axis=1)

A   B   C   D   foo
2011-01-01  -0.487926   0.439190    0.194810    0.333896    -1.660578
2011-01-02  1.708024    0.237587    -0.958100   1.418285    NaN
2011-01-03  -1.228805   1.266068    -1.755050   -1.476395   -0.209688
2011-01-04  -0.554705   1.342504    0.245934    0.955521    NaN
2011-01-05  -0.351260   -0.798270   0.820535    -0.597322   0.546146
2011-01-06  0.132924    0.501027    -1.139487   1.107873    NaN

The situation with the data (see below) seems basically identical - concatting a series with a DatetimeIndex whose values are a subset of the dataframe's. But it gives the ValueError in the title (blah1 = (5, 286) blah2 = (5, 276) ). Why doesn't it work?:

数据的情况(见下文)似乎基本相同 - 用DatetimeIndex连接一个系列,其值是数据帧的子集。但它在标题中给出了ValueError(blah1 =(5,286)blah2 =(5,276))。为什么不起作用?:

In[187]: df.head()
Out[188]:
high    low loc_h   loc_l
time                
2014-01-01 17:00:00 1.376235    1.375945    1.376235    1.375945
2014-01-01 17:01:00 1.376005    1.375775    NaN NaN
2014-01-01 17:02:00 1.375795    1.375445    NaN 1.375445
2014-01-01 17:03:00 1.375625    1.375515    NaN NaN
2014-01-01 17:04:00 1.375585    1.375585    NaN NaN
In [186]: df.index
Out[186]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 271, Freq: None, Timezone: None

In [189]: hl.head()
Out[189]:
2014-01-01 17:00:00    1.376090
2014-01-01 17:02:00    1.375445
2014-01-01 17:05:00    1.376195
2014-01-01 17:10:00    1.375385
2014-01-01 17:12:00    1.376115
dtype: float64

In [187]:hl.index
Out[187]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 89, Freq: None, Timezone: None

In: pd.concat([df, hl], axis=1)
Out: [stack trace] ValueError: Shape of passed values is (5, 286), indices imply (5, 276)

5 个解决方案

#1


34

I had a similar problem (join worked, but concat failed).

我有类似的问题(加入工作,但concat失败)。

Check for duplicate index values in df1 and s1, (e.g. df1.index.is_unique)

检查df1和s1中的重复索引值(例如df1.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True)) or one of the methods here https://stackoverflow.com/a/34297689/7163376 should resolve it.

删除重复的索引值(例如,df.drop_duplicates(inplace = True))或其中一个方法https://stackoverflow.com/a/34297689/7163376应该解决它。

更多相关文章

  1. 检查 NaN 数据值 (C/C++/Python 实现)
  2. 【python网络爬虫三】爬取动态数据及数据入库
  3. 数据挖掘(三)分类模型的描述与性能评估,以决策树为例
  4. 用于Python项目的低内存和最快查询数据库
  5. python爬虫:爬取豌豆荚APP第一页数据信息(selenium)
  6. Pandas 文本数据方法 findall( )
  7. TensorFlow数据集(一)——数据集的基本使用方法
  8. 如何在DataFrame中找到重复的索引?
  9. python常用数据类型-字典

随机推荐

  1. 详解Android轻量型数据库SQLite
  2. 英特尔® Android* USB 驱动程序安装指南
  3. Android 技术专题系列之九 -- 图形系统
  4. 关于查看Android系统源码【Written By Ki
  5. android layout Java代码生成器
  6. Android(安卓)自定义RecyclerView.OnScro
  7. [系统集成] Android 自动构建系统
  8. Android C/C++ 开发
  9. android:gravity和android:layout_gravit
  10. android之buttonBar的设计--style的引用