如何将dict转换为spark map输出

I'm working with spark and python. I would like to transform my input dataset.

我正在使用spark和python。我想转换我的输入数据集。

My input dataset (RDD)

我的输入数据集(RDD)

-------------------------------------------------------------
| id |                  var                                 |
-------------------------------------------------------------
| 1  |"[{index: 1, value: 200}, {index: 2, value: A}, ...]" |
| 2  |"[{index: 1, value: 140}, {index: 2, value: C}, ...]" |
| .. |                      ...                             |
-------------------------------------------------------------

I would like to have this DataFrame (output dataset)

我想有这个DataFrame(输出数据集)

----------------------
| id | index | value |
----------------------
| 1  |  1    | 200   |
| 1  |  2    | A     |
| 1  |  ...  | ...   |
| 2  |  1    | 140   |
| 2  |  2    | C     |
| ...|  ...  | ...   |
----------------------

I create a map function

我创建了一个地图功能

def process(row):
    my_dict = {}
    for item in row['value']:
        my_dict['id'] = row['id']
        my_dict['index'] = item['index']
        my_dict['value'] = item['value']

    return my_dict

I would like to map my process function like this:

我想像我这样映射我的过程函数:

output_rdd = input_rdd.map(process)

Is it possible to do this on this way (or a simpler way)?

是否有可能以这种方式(或更简单的方式)这样做?

1 个解决方案

#1

I found the solution:

我找到了解决方案:

output_rdd = input_rdd.map(lambda row:process(row)).flatMap(lambda x: x)

1 个解决方案

#1

更多相关文章

随机推荐