I was looking at the source of sorted_containers and was surprised to see this line:

我看着sorted_container的源代码，惊讶地看到这一行:

self._load, self._twice, self._half = load, load * 2, load >> 1

Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:

这里的load是一个整数。为什么在一个地方使用位移，在另一个地方使用乘法?似乎合理的一点是，移位可能比整数除法快2倍，但是为什么不能用移位来代替乘法呢?我对下列案例进行了基准测试:

(times, divide)
(分次)
(shift, shift)
(转变,转变)
(times, shift)
(倍转变)
(shift, divide)
(转变,除)

and found that #3 is consistently faster than other alternatives:

并且发现#3始终比其他选择要快:

# self._load, self._twice, self._half = load, load * 2, load >> 1

import random
import timeit
import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():
    a, b, c = x, 2 * x, x // 2


def test_shift():
    a, b, c = x, x << 1, x >> 1


def test_mixed():
    a, b, c = x, x * 2, x >> 1


def test_mixed_swaped():
    a, b, c = x, x << 1, x // 2


def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swaped),
    }


def get_observations():
    return pd.DataFrame([observe(k) for k in range(100)])

The question:

一个问题:

Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?

我的测试有效吗?如果是，为什么(乘、移)比(移、移)快?

I run Python 3.5 on Ubuntu 14.04.

我在Ubuntu 14.04上运行Python 3.5。

Edit

编辑

Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.

上面是问题的原始陈述。丹·盖兹在他的回答中给出了一个极好的解释。

For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.

为了完整起见，这里有一些大小写x，当乘法优化不适用时。

1 个解决方案

#1

134

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.

这似乎是因为小数字的乘法在CPython 3.5中得到了优化，而小数字的左移却没有。正左移位总是创建一个较大的整型对象来存储结果，作为计算的一部分，而对于您在测试中使用的那种乘法，特殊的优化避免了这种情况，并创建了一个大小正确的整型对象。这可以在Python的整数实现的源代码中看到。

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)

因为Python中的整数是任意精度的，所以它们被存储为整数“位数”数组，并且每个整数位数的比特数受到限制。因此，在一般情况下，涉及整数的操作不是单个操作，而是需要处理多个“数字”的情况。在pyport。h，这个位限制定义为64位平台上的30位，否则是15位。(从这里开始我就叫它30，这样解释就简单了。但是请注意，如果您使用32位的Python编译，那么基准测试的结果将取决于x是否小于32,768。

When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

当一个操作的输入和输出保持在30位范围内时，操作可以用一种优化的方式来处理，而不是一般的方式。整数乘法实现的开始如下:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don't have long long then we're almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we're using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

因此，当两个整数相乘时，每一个都是30位的数字，这是用CPython解释器直接相乘，而不是用整数作为数组。(在一个正整数对象上调用MEDIUM_VALUE()只获取它的前30位数字。)如果结果适合一个30位的数字，PyLong_FromLongLong()将在相对较少的操作中注意到这一点，并创建一个单位数的整数对象来存储它。

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

相反，左移不是这样优化的，每一个左移都处理作为数组移位的整数。特别是，如果您查看long_lshift()的源代码，对于一个很小但正的左移，总是会创建一个2位数的整型对象，即使它的长度稍后被截断为1:(我在/*** */中的注释)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

Integer division

You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.

你没有问过整数层除法比右移差多少，因为它符合你(和我)的期望。但是用一个小正数除以另一个小正数也不如小乘法优化。每个//都使用函数long_divrem()计算商和余数。这个余数用于一个带乘法的小除数，并存储在一个新分配的整数对象中，在这种情况下，该对象立即被丢弃。

比位移位快2倍?

1 个解决方案

#1

Integer division

更多相关文章

随机推荐