Christopher Sardegna's Blog

Thoughts on technology, design, data analysis, and data visualization.


Floor Division in Python

Floor Division in Python

The Problem

In solving data analytics problems, occasionally we must bin data into intervals or categories based on some arbitrary stratification. A common example outside of the world of computer science is letter grades, a concept that can be abstracted1 to:

def grade(score):
    grades = ['F', 'D', 'C', 'B', 'A']
    if score >= 100:
        return 'A'
    if score <= 50:
        return 'F'
    return grades[(score - 50) // 10]

Thus, for:

scores = [88, 72, 61, 39, 97]
results = [grade(score) for score in scores]

We get:

['B', 'C', 'D', 'F', 'A']

However, if any of our score values2 are not integers we will encounter a TypeError:

>>> grade(61.2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
TypeError: list indices must be integers or slices, not float

This is caused by the following type coercion, as lists cannot be indexed by floats:

>>> (61.2 - 50) // 2
1.0

Avoiding this Error

Of course, we can cast to an int:

def grade(score):
    grades = ['F', 'D', 'C', 'B', 'A']
    if score > 99:
        return 'A'
    if score <= 50:
        return 'F'
    return grades[(int(score) - 50) // 10]

However, for large datasets this is not an efficient operation:

chris% python -m timeit 'int(102.3 // 2)'
5000000 loops, best of 5: 99.6 nsec per loop
chris% python -m timeit '102.3 // 2'     
50000000 loops, best of 5: 7.05 nsec per loop

In my real-world example, the score equivalent is always a float, so we can map the float integer values to their respective categories3, similar to:

>>> grades
{0.0: 'F', 1.0: 'D', ...}
>>> grades[0]
'F'
>>> grades[0.]
'F'
>>> grades[1]
'D'
>>> grades[1.]
'D'

However, I was still curious why floor division was returning a float and not an integer.

Definition of Floor Division

According to Concrete Mathematics4:

⌊xβŒ‹ = the greatest integer less than or equal to x

⌈xβŒ‰ = the least integer greater than or equal to x

Note the key word integer; while this may be the official mathematical definition, Python does not seem to follow this to the letter and returns a type-coerced value.

Python Arithmetic

It is documented when types are coerced. From the Python docs:

The same is true for binary operators:

The / (division) and // (floor division) operators yield the quotient of their arguments. The numeric arguments are first converted to a common type. Division of integers yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the β€˜floor’ function applied to the result.

float_floor_div

The float divmod work occurs in float_floor_div, which comes from the cpython source. All this function does is call float_divmod and dereferences the quotient:

static PyObject *
float_floor_div(PyObject *v, PyObject *w)
{
    PyObject *t, *r;

    t = float_divmod(v, w);
    if (t == NULL || t == Py_NotImplemented)
        return t;
    assert(PyTuple_CheckExact(t));
    r = PyTuple_GET_ITEM(t, 0);
    Py_INCREF(r);
    Py_DECREF(t);
    return r;
}

The bulk of the work occurs inside of the called function, float_divmod.

float_divmod

The source for float_divmod lives here. The first step of this function handles the type cast:

float_divmod(PyObject *v, PyObject *w)
{
    double vx, wx;
    double div, mod, floordiv;
    CONVERT_TO_DOUBLE(v, vx);
    CONVERT_TO_DOUBLE(w, wx);
    ...

Where vx is the dividend, and wx is the divisor. The rest of this function handles the various cases that can occur:

Division by Zero

If the divisor is zero, we raise an error:

...
if (wx == 0.0) {
    PyErr_SetString(PyExc_ZeroDivisionError, "float divmod()");
    return NULL;
}
...

Quotient and Dividend

Before we can do any work, we must calculate the quotient and dividend:

mod = fmod(vx, wx);
div = (vx - mod) / wx;

We get the modulus using the fmod function from the C Standard Library, and the quotient using float division, also leveraging the C standard library.

Remainder Sign

Once we have these values, we check to ensure the sign is correct.

...
if (mod) {
    if ((wx < 0) != (mod < 0)) {
        mod += wx;
        div -= 1.0;
    }
}
...

Remainder is Zero

If the remainder (i.e., the modulus) is zero, we copy the sign to the divisor with copysign(0.0, wx)5.

Calling Floor

If the quotient is not zero, we call the floor function:

...
if (div) {
    floordiv = floor(div);
    if (div - floordiv > 0.5)
        floordiv += 1.0;
}
...

Numerator is Zero

If the dividend is zero, we copy the sign of the quotient onto zero with copysign(0.0, vx / wx).

Returning the Tuple

Once all of this has completed, we return a tuple like (floor_qotient, modulus).

Design Decisions

As Python is a mature language, the changes that affected these behaviors underwent much debate.

PEP 328

Since the proposal of the Python 3 change to the division operator in PEP 238, it was decided that the result of floor division with floats would be a float, following the above arithmetic conversions:

In particular, if a and b are both ints or longs, the result has the same type and value as for classic division on these types (including the case of mixed input types; int // long and long // int will both return a long).

For floating point inputs, the result is a float. For example: 3.5 // 2.0 == 1.0

PEP 3141

This PEP defined the changes to the numerical stack in Python and explicitly noted the following:

__floor__(self), called from math.floor(x), which returns the greatest Integral <= x.

Thus, it would follow that the floor call in float_divmod should return an integer and not a float value. However, this is not the case.

Further Debate

Thirteen years later, Alexander Belopolsky reported Issue 22444: Floor divide should return int:

PEP 3141 defines floor division as floor(x/y) and specifies that floor() should return int type. Builtin float type has been made part of the PEP 3141 numerical tower, but floor division of two floats still results in a float.

In this thread, my exact issue accessing a list index above was raised:

This is one of the common uses of floor division - to find an index of a cell in a regular grid: (x - start) // step. In this situation, it is convenient to have the result ready to be used as an index without a cast.

However, the decision to not make this change ended with Raymond Hettinger's reply:

  1. The current behavior has been around for a long time and is implemented in several modules, including decimal and fraction. As core devs, we need to keep focused on a priority of making the language stable (not making changes that truly necessary and invalidating all previously published material) and more importantly not adding yet more obstacles to converting from Python 2 to Python 3 (which Guido has called "death by a thousand cuts").

  2. The current behavior can be useful it that it allows floor division operations without unexpected type conversions occurring in the middle of an expression. We really don't want to break those use cases.

Conclusions

Python's philosophy means the language provides high-level access to data structures. Here, the coercion avoids dealing with the nuances of numeric types that one would have to wrangle with when writing C. In C, given the different numerical types, you have to return a double, because the range of a double is so much larger than that of an integer.

Breaking Changes

These decisions were made during the early days of the Python 3 transition, where the core developers wanted to minimize the friction of the upgrade from 2 to 3. As a result, to not break code that depended on floor division being type preserving6, it was decided that floor division should return a value equal to an integer7 but agnostic of type.

In a post-Python 2 era, these concerns do not hold the same weight. This is an example of some technical debt that Python has accrued, as other parts of Python do not behave this way:

>>> type(timedelta(2.2) // timedelta(3))
<class 'int'>
>>> type(Fraction(2.3) // Fraction(1))
<class 'int'>

Simplicity

Python makes sacrifices to retain its beauty and simplicity, this being one of them. As a result, the programmer, in this instance, is unable to choose the functionality that best addresses a specific problem. A possible solution would be to provide both floor() and float_floor() for accessing these values in both types. Also possible would be to make lists accessible by the float counterparts of their integer indices, since 1 == 1.0.

Speed

Further, the language is far more performant as a result of this decision: float // any == float is a computationally simple and fast operation; float // any == int is significantly more complicated and expensive, especially for large inputs due to the inefficient cast from float back to int.

chris% python -m timeit 'int(102.3 // 2)'
5000000 loops, best of 5: 99.6 nsec per loop
chris% python -m timeit '102.3 // 2'     
50000000 loops, best of 5: 7.05 nsec per loop

This performance hit does not stem from the float conversion itself 8 but rather from the fact that integer and float values are stored in different registers: the CPU must first store the float from the FPU register in memory, then read that memory address into the int register. This is a classic example of a load-hit-store stall.

Polymorphism

Since we can combine types in this way in Python, we need to be cognizant of the pitfalls that we can run into when writing polymorphic functions. Most of the time, the type of number will not matter to the Python interpreter. When type does matter, understanding how these data are handled internally leads to a smoother development process.


Discussion: Hacker News, Reddit


  1. This is an oversimplification of the problem I was solving but serves to demonstrate the issue at hand. ↩︎

  2. ex 61 -> 61.2 ↩︎

  3. Additionally, this means we can lookup based on integer or float keys, because in Python, 1.0 == 1, thus hash(1.0) == hash(1). ↩︎

  4. Graham, Ronald Lewis, et al. Concrete Mathematics: a Foundation for Computer Science. 2nd ed., Addison-Wesley, 2017. P 67. PDF. ↩︎

  5. copysign comes from the math module. ↩︎

  6. Like it was in Python 2. ↩︎

  7. As mentioned above, 1.0 == 1. ↩︎

  8. The conversion operation just truncates at the decimal. ↩︎