I have 6 large files which each of them contains a dictionary object that I saved in a hard disk using pickle function. It takes about 600 seconds to load all of them in sequential order. I want to start loading all them at the same time to speed up the process. Suppose all of them have the same size, I hope to load them in 100 seconds instead. I used multiprocessing and apply_async to load each of them separately but it runs like sequential. This is the code I used and it doesn't work. The code is for 3 of these files but it would be the same for six of them. I put the 3rd file in another hard disk to make sure the IO is not limited.

我有6个大文件,每个文件都包含一个字典对象,我使用pickle函数保存在硬盘中。按顺序加载所有这些大约需要600秒。我想开始同时加载它们以加快这个过程。假设它们都具有相同的大小,我希望在100秒内加载它们。我使用multiprocessing和apply_async分别加载它们,但它像顺序一样运行。这是我使用的代码,它不起作用。代码适用于其中3个文件,但其中6个文件的代码相同。我将第3个文件放在另一个硬盘上,以确保IO不受限制。

def loadMaps():    
    start = timeit.default_timer()
    procs = []
    pool = Pool(3)
    pool.apply_async(load1(),)
    pool.apply_async(load2(),)
    pool.apply_async(load3(),)
    pool.close()
    pool.join()
    stop = timeit.default_timer()
    print('loadFiles takes in %.1f seconds' % (stop - start))

1 个解决方案

#1


3

If your code is primarily limited by IO and the files are on multiple disks, you might be able to speed it up using threads:

如果您的代码主要受IO限制且文件位于多个磁盘上,则可以使用线程加快速度:

import concurrent.futures
import pickle

def read_one(fname):
    with open(fname, 'rb') as f:
        return pickle.load(f)

def read_parallel(file_names):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [executor.submit(read_one, f) for f in file_names]
        return [fut.result() for fut in futures]

The GIL will not force IO operations to run serialized because Python consistently releases it when doing IO.

GIL不会强制IO操作运行序列化,因为Python在执行IO时会持续释放它。

Several remarks on alternatives:

关于替代品的几点评论:

  • multiprocessing is unlikely to help because, while it guarantees to do its work in multiple processes (and therefore free of the GIL), it also requires the content to be transferred between the subprocess and the main process, which takes additional time.

    多处理不太可能有所帮助,因为虽然它保证在多个进程中完成工作(因此没有GIL),但它还要求在子进程和主进程之间传输内容,这需要额外的时间。

  • asyncio will not help you at all because it doesn't natively support asynchronous file system access (and neither do the popular OS'es). While it can emulate it with threads, the effect is the same as the code above, only with much more ceremony.

    asyncio根本不会帮助你,因为它本身不支持异步文件系统访问(流行的操作系统也不支持)。虽然它可以用线程模拟它,但效果与上面的代码相同,只有更多的仪式。

  • Neither option will speed up loading the six files by a factor of six. Consider that at least some of the time is spent creating the dictionaries, which will be serialized by the GIL. If you want to really speed up startup, a better approach is not to create the whole dictionary upfront and switch to an in-file database, possibly using the dictionary to cache access to its content.

    这两个选项都不会加速将六个文件加载六倍。考虑到至少有一些时间用于创建字典,这些字典将由GIL序列化。如果您想真正加快启动速度,更好的方法不是提前创建整个字典并切换到文件内数据库,可能使用字典来缓存对其内容的访问。

更多相关文章

  1. 在Python 3.x中将多个字典写入多个csv文件
  2. 如何使用python 3检查文件夹是否包含文件
  3. 如何使用未受标头影响的python导入csv文件,其中第一列为非数值
  4. python在windows中的文件路径问题
  5. 套接字。接受错误24:对许多打开的文件
  6. python如何将一个txt文件里的转化为相应字典
  7. Python之错误异常和文件处理
  8. python解析json文件读取Android permission说明
  9. python 之 logger日志 字典配置文件

随机推荐

  1. 如何提高android代码质量
  2. android中的颜色值
  3. 在android中玩转wcf
  4. Android入门教程(九)之-----取得手机屏幕
  5. android 监听电源键
  6. android 监听webview的超链接点击
  7. Android获取应用程序信息——PackageMana
  8. 銆婄涓€琛屼唬鐮丄ndroid銆嬬瑪璁?/h1
  9. Android(安卓)ContentResolver CallLog
  10. 《Android高级进阶》— Android 书籍