Why does `PyObject_GenericSetAttr` behave differently with lambdas than named functions

Question

I was recently experimenting with monkey patching built in types (Yes I know this is a terrible idea---trust me, this is for educational purposes only).

I discovered that there is an odd distinction between lambda expressions and functions declared with def. Take a look at this iPython session:

In [1]: %load_ext cython

In [2]: %%cython
   ...: from cpython.object cimport PyObject_GenericSetAttr
   ...: def feet_to_meters(feet):
   ...: 
   ...:     """Converts feet to meters"""
   ...: 
   ...:     return feet / 3.28084
   ...: 
   ...: PyObject_GenericSetAttr(int, 'feet_to_meters', feet_to_meters)

In [3]: (20).feet_to_meters()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-63ba776af1c9> in <module>()
----> 1 (20).feet_to_meters()

TypeError: feet_to_meters() takes exactly one argument (0 given)

Now, if I wrap feet_to_meters with a lambda, everything works!

In [4]: %%cython
   ...: from cpython.object cimport PyObject_GenericSetAttr
   ...: def feet_to_meters(feet):
   ...: 
   ...:     """Converts feet to meters"""
   ...: 
   ...:     return feet / 3.28084
   ...: 
   ...: PyObject_GenericSetAttr(int, 'feet_to_meters', lambda x: feet_to_meters(x))

In [5]: (20).feet_to_meters()
Out[5]: 6.095999804928006

What is going on with this?

I don't know if this is relevant but trying to set `int.feet_to_meters` in Python doesn't work — DavidW, Aug 09 '18 at 17:17
What you're seeing is a Cython thing, not a regular Python thing. — user2357112, Aug 09 '18 at 17:44

ead · Accepted Answer · 2018-08-11T00:37:45.100

Your problem can be reproduced in Python and without (very) dirty tricks:

class A:
   pass

A.works = lambda x: abs(1)
A.dont = abs

A().works()  # works
A().dont()   # error

The difference is, that abs is a builtin-function of type PyCFunctionObject, while lambda is of type PyFunctionObject (a C is missing compared to PyCFunction...).

Those cfunctions just cannot be used for patching, see for example PEP-579.

The problem, which is also mentioned in PEP-579, that cython-functions are PyCFunctions and thus are seen as builtin-functions:

%%cython
def foo():
    pass

>>> type(foo)
builtin_function_or_method

That means, you cannot use the Cython-function directly for monkey patching, but have to wrap them into a lambda or similar, as you already do. One should not worry about performance, because due to method-lookup there is already overhead, having a little bit more of it doesn't change things dramatically.

I must confess, I don't know why this is the case (historically). But in the current code (Python3.8) you can easily find the crucial line in _PyObject_GetMethod, which makes the difference:

descr = _PyType_Lookup(tp, name);
    if (descr != NULL) {
        Py_INCREF(descr);
        if (PyFunction_Check(descr) ||  # HERE WE GO
                (Py_TYPE(descr) == &PyMethodDescr_Type)) {
            meth_found = 1;
} else {

After looking-up the function (here descr) in the dictionary _PyType_Lookup(tp, name), method_found is set to 1 only if the found function is of type PyFunction, which is not the case for builtin-PyCFunctions. Thus abs and Co aren't seen as methods, but stay kind of "staticmethod".

The easiest way to find a starting point for the investigation, is to inspect the produced opcode for:

import dis
def f():
  a.fun()

dis.dis(f)

i.e. the following opcode(and seems to have changed since Python3.6):

2         0 LOAD_GLOBAL              0 (a)
          2 LOAD_METHOD              1 (fun)  #HERE WE GO
          4 CALL_METHOD              0
          6 POP_TOP
          8 LOAD_CONST               0 (None)
         10 RETURN_VALUE

We can inspect the corresponding part in ceval.c:

TARGET(LOAD_METHOD) {
            /* Designed to work in tamdem with CALL_METHOD. */
            PyObject *name = GETITEM(names, oparg);
            PyObject *obj = TOP();
            PyObject *meth = NULL;

            int meth_found = _PyObject_GetMethod(obj, name, &meth);
            ....

and let the gdb take us from there.

As @user2357112 has rightly pointed out, if PyCFunctionObject would support the descriptor protocol (more precisely to provide tp_descr_get), even after meth_found = 0; it still would have a fall back which would lead to the desired behavior. PyFunctionObject does provide it, but PyCFunctionObject does not.

Older versions used LOAD_ATTR+CALL_FUNCTION for a.fun() and in order to work, function-objects had to support the descriptor protocol. But now it seems not to be mandatory.

My quick tests with extending the crucial line with PyCFunction_Check(descr) to:

 if (PyFunction_Check(descr) || PyCFunction_Check(descr) ||
                (Py_TYPE(descr) == &PyMethodDescr_Type))

have shown, that then also builtin-methods would work as bound-methods (at least for the case above). But this would probably break something - I didn't run any bigger tests.

However, as @user2357112 mentioned (thanks again), this would lead to a inconsistency, because meth = foo.bar still uses LOAD_ATTR and thus depends on the descriptor protocol.

Recommendation: I found this answer helpful in understanding, what is going on in the case of LOAD_ATTR.

The line you describe as the "crucial line" isn't really the crucial part. The crucial part is that C functions don't support the descriptor protocol. The code you're looking at is a fast-path to skip bound method object creation for a few common descriptor types; it's only an optimization, and even if functions written in Python didn't take that path, you'd see the same behavior due to the regular descriptor handling. — user2357112, Aug 10 '18 at 23:11
@user2357112 actually I'm asking myself, what prevents me to accept also `PyCFunctionObject` here in the same way I accept `PyFunctionObject`? The descriptors are no longer created/used when LOAD_METHOD` + `CALL_METHOD` are used... — ead, Aug 10 '18 at 23:58
Accepting PyCFunctionObject would be wrong because PyCFunctionObject does not support the descriptor protocol, and the LOAD_METHOD/CALL_METHOD fast path is supposed to produce the same behavior as the regular path. Otherwise, there would be a behavior difference between `foo.bar()` and `meth = foo.bar; meth()`. — user2357112, Aug 11 '18 at 00:07

Why does `PyObject_GenericSetAttr` behave differently with lambdas than named functions

1 Answers1