I have put together a simple Python script which reads a large list of algebraic expressions from a text file on separate lines, evaluates the mathematics on each line and puts it into a numpy array. The eigenvalues of this matrix are then found. The parameters A,B,C will then be changed and the program run again, hence a function is used to achieve this.


Some of these text files will have millions of lines of equations, so after profiling the code I found that the eval command accounts for approximately 99% of the execution time. I am aware of the dangers of using eval but this code will only ever be used by myself. All other parts of the code are fast, except the call to eval.


Here is the code where mat_size is set to 500 which represents a 500*500 array meaning 250,000 lines of equations are being read in from the file. I cannot provide the file as it is ~ 0.5GB in size, but have provided an example of what it looks like below and it only uses basic mathematical operations.

这里是mat_size设置为500的代码,它代表500*500数组,意味着从文件中读取了25万行公式。我不能提供这个文件,因为它的大小是~ 0.5GB,但是提供了如下的示例,它只使用基本的数学操作。

import numpy as np
from numpy import *
from scipy.linalg import eigvalsh

mat_size = 500

# Read the file line by line
with open("test_file.txt", 'r') as f:
    lines = f.readlines() 

# Function to evaluate the maths and build the numpy array
def my_func(A,B,C):
    lst = []
    for i in lines:   
        # Strip the \n
        new = eval(i.rstrip())
    # Build the numpy array
    AA = np.array(lst,dtype=np.float64)
    # Resize it to mat_size
    matt = np.resize(AA,(mat_size,mat_size))
    return matt 

# Function to find eigenvalues of matrix
def optimise(x):
    A,B,C = x
    test = my_func(A,B,C)
    return ev[-(1)]   

# Define what A,B,C are, this can be changed each time the program is run
x0 = [7.65,5.38,4.00]

# Print result

A few lines of an example input text file: (mat_size can be changed to 2 to run this file)



I am aware eval is usually bad practice and slow, so I looked for other means to achieving a speed up. I tried methods outlined here but none of these appeared to work. I also tried applying sympy to the problem but that caused a massive slowdown. What is a better way of going about this problem?




From the suggestion to use numexpr instead, I have come across an issue where it grinds to a halt compared to the standard eval. For some instances the matrix elements contain quite a lot of algebraic expressions. Here is an example of just one matrix element, i.e one of the equations in the file (it contains a few more terms not defined in the code above, but can be easily defined at top of the code):



numexpr completely chokes when the matrix elements are of this form, whereas eval evaluates it instantaneously. For just a 10*10 matrix (100 equations in file) numexpr takes about 78 seconds to process the file, whereas eval takes 0.01 seconds. Profiling the code that uses numexpr reveals that the getExprnames and precompile function of numexpr are the causes of the issue with precompile taking 73.5 seconds of the total time and getExprNames taking 3.5 seconds of the time. Why would the precompile cause such a bottleneck in this particular calculation along with the getExprNames? Is this module just not well suited to long algebraic expressions?


1 个解决方案



I found a way to speed eval() up in this particular instance by making use of the multiprocessing library. I read the file in as usual, but then break the list into equal sized sub-lists which can then be processed separately on different CPU's and the evaluated sub-lists recombined at the end. This offers a nice speedup over the original method. I am sure the code below can be simplified/optimised; but for now it works (for instance what if there is a prime number of list elements? this will mean unequal lists). Some rough benchmarks show it is ~ 3 times faster using the 4 CPU's of my laptop. Here is the code:


from multiprocessing import Process, Queue

with open("test.txt", 'r') as h:
    linesHH = h.readlines()

# Get the number of list elements
size = len(linesHH)

# Break apart the list into the desired number of chunks
chunk_size = size/4
chunks = [linesHH[x:x+chunk_size] for x in xrange(0, len(linesHH), chunk_size)]

# Declare variables
A = 0.1
B = 2
C = 2.1
m3 = 1
z3 = 2

# Declare all the functions that process the substrings
def my_funcHH1(A,B,C,que):   #add a argument to function for assigning a queue to each chunk function
    lstHH1 = []
    for i in chunks[0]:
        HH1 = eval(i)

def my_funcHH2(A,B,C,que):    
    lstHH2 = []
    for i in chunks[1]:
        HH2 = eval(i)

def my_funcHH3(A,B,C,que):    
    lstHH3 = []
    for i in chunks[2]:
        HH3 = eval(i)

def my_funcHH4(A,B,C,que):    
    lstHH4 = []
    for i in chunks[3]:
        HH4 = eval(i)

queue1 = Queue()    
queue2 = Queue()  
queue3 = Queue()  
queue4 = Queue()  

# Declare the processes
p1 = Process(target= my_funcHH1, args= (A,B,C,queue1))    
p2 = Process(target= my_funcHH2, args= (A,B,C,queue2))
p3 = Process(target= my_funcHH3, args= (A,B,C,queue3))
p4 = Process(target= my_funcHH4, args= (A,B,C,queue4))

# Start them

HH1 = queue1.get()
HH2 = queue2.get()
HH3 = queue3.get()
HH4 = queue4.get()

# Obtain the final result by combining lists together again.
mergedlist = HH1 + HH2 + HH3 + HH4


  1. 八大经典排序算法基本思想及代码实现(插入排序,希尔排序,选择排序,
  2. 贝叶斯学习 -- matlab、python代码分析(3)
  3. UNIX-LINUX编程实践教程->第八章->实例代码注解->写一个简单的sh
  4. Linux下objdump查看C程序编译后的汇编代码
  5. 软交换FreeSWITCH系统概要和源代码分析预备知识
  6. 如何卸载内核代码中的文件系统
  7. Linux内核源代码情景分析读书笔记(5)-关于fork/clone/vfork
  8. Linux下各类TCP网络服务器的实现源代码
  9. U-Boot启动过程源代码分析(2)-第二阶段


  1. Android(安卓)10个快速开发框架:Afinal、T
  2. Android关于分包方案、插件化动态加载APK
  3. Android学习路线(二十七)键值对(SharedPrefe
  4. Android(安卓)资源加载与匹配
  5. android系统编译jdk版本
  6. Android(安卓)Looper
  7. 两个Android选择文件对话框
  8. Android(安卓)5.0 技术新趋势
  9. Android界面刷新的方法
  10. Android(安卓)Intent 对象详解