Im trying to run the example on Apache Spark's MLlib website. Below is my code:

我试图在Apache Spark的MLlib网站上运行这个示例。下面是我的代码:

import sys
import os

os.environ['SPARK_HOME'] = "/usr/local/Cellar/apache-spark/1.2.1"
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python")
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python/build")

try:
    from pyspark import SparkContext, SparkConf
    from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
    print ("Apache-Spark v1.2.1 >>> All modules found and imported successfully.")

except ImportError as e:
    print ("Couldn't import Spark Modules", e)
    sys.exit(1)

# SETTING CONFIGURATION PARAMETERS
config = (SparkConf()
        .setMaster("local")
        .setAppName("Music Recommender")
        .set("spark.executor.memory", "16G")
        .set("spark.driver.memory", "16G")
        .set("spark.executor.cores", "8"))
sc = SparkContext(conf=config)

# Load and parse the data
data = sc.textFile("data/1aa")
ratings = data.map(lambda l: l.split('\t')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10
model = ALS.train(ratings, rank, numIterations)

# Evaluate the model on training data
testdata = ratings.map(lambda p: (p[0], p[1]))
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1])**2).mean()
print("Mean Squared Error = " + str(MSE))

# Save and load model
model.save(sc, "/Users/kunal/Developer/MusicRecommender")
sameModel = MatrixFactorizationModel.load(sc, "/Users/kunal/Developer/MusicRecommender/data")

The code is running till printing the MSE. The last step is to save the model to a directory. I am getting the error 'MatrixFactorizationModel' object has no attribute 'save'(I've pasted last few rows of the log) below:

代码一直运行到打印MSE。最后一步是将模型保存到一个目录中。我得到了错误的“MatrixFactorizationModel”对象没有属性“save”(我粘贴了日志的最后几行):

15/10/06 21:00:16 INFO DAGScheduler: Stage 200 (mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41) finished in 12.875 s
15/10/06 21:00:16 INFO DAGScheduler: Job 8 finished: mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41, took 53.290203 s
Mean Squared Error = 405.148403002
Traceback (most recent call last):
  File "/Users/kunal/Developer/MusicRecommender/collabfiltering.py", line 47, in <module>
    model.save(sc, path)
AttributeError: 'MatrixFactorizationModel' object has no attribute 'save'

Process finished with exit code 1

I have reinstalled and made sure I have the latest version of Spark but that did not help it. I am running this on a 10MB file only which is a tiny split of the larger file.

我已经重新安装并确保我有最新版本的Spark,但这并没有帮助。我只是在一个10MB的文件上运行这个文件,它是较大文件的一个很小的分割。

Operating System: OSX 10.11.1 Beta (15B22c)

操作系统:osx10.11.1 Beta (15B22c)

1 个解决方案

#1


1

It happens because you use Spark 1.2.1 and MatrixFactorizationModel.save method has been introduced in Spark 1.3.0. Moreover documentation you use covers a current version (1.5.1).

这是因为您使用了Spark 1.2.1和MatrixFactorizationModel。在Spark 1.3.0中引入了save方法。此外,您使用的文档包含当前版本(1.5.1)。

Spark documentation urls look like this:

Spark文档url如下所示:

http://spark.apache.org/docs/SPARK_VERSION/some_topic.html

So in your case you should use:

所以在你的情况下你应该使用:

http://spark.apache.org/docs/1.2.1/mllib-collaborative-filtering.html

更多相关文章

  1. 自动完成在VS代码和Python中的自动化对象
  2. 使用自定义qemu二进制文件与libvirt失败?
  3. 【python coding 1:网络检测】ping本地文件里的ip地址
  4. 如何输出NLTK块到文件?
  5. python 读写文本文件
  6. 批量重命名文件——python实现
  7. 在生产中是否应该减少服务器代码?
  8. Django:测试成功加载静态文件
  9. 使用python 3.6将多个文件并行加载到内存中的最佳方法是什么?

随机推荐

  1. android强制隐藏输入法键盘(亲测可用,欢迎
  2. Android中点击按钮的事件处理实现步骤
  3. 反编译出错
  4. Android通过换载体实现再次辉煌
  5. Android中全屏无标题设置
  6. Visibility属性中invisible和gone 区别
  7. Android GPS相关文章
  8. android 用tcpdump抓取网络包
  9. Android--通过ContentResolver取得com.an
  10. Android(安卓)keystore 签名证书的作用以