"is" operator behaves unexpectedly with integers

оператор "is" неожиданно ведет себя с целыми числами

Почему следующее неожиданно ведет себя в Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

Я использую Python 2.5.2. Попробовав несколько разных версий Python, оказывается, что Python 2.3.3 показывает описанное выше поведение между 99 и 100.

Основываясь на вышесказанном, я могу выдвинуть гипотезу, что Python внутренне реализован таким образом, что "маленькие" целые числа хранятся иначе, чем целые числа большего размера, и is оператор может заметить разницу. Почему дырявая абстракция? Какой лучший способ сравнить два произвольных объекта, чтобы увидеть, совпадают ли они, когда я заранее не знаю, являются ли они числами или нет?

Переведено автоматически

Ответ 1

Взгляните на это:

>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False

Вот что я нашел в документации для "Простых целочисленных объектов":

Текущая реализация сохраняет массив целочисленных объектов для всех целых чисел между -5 и 256. Когда вы создаете int в этом диапазоне, вы фактически просто получаете обратно ссылку на существующий объект.

Итак, целые числа 256 идентичны, а 257 - нет. Это деталь реализации CPython, и она не гарантируется для других реализаций Python.

Ответ 2

Оператор Python “is" неожиданно ведет себя с целыми числами?

В заключение позвольте мне подчеркнуть: Не используйте is для сравнения целых чисел.

Это не то поведение, относительно которого вам следует ожидать.

Вместо этого используйте == и != для сравнения на равенство и неравенство соответственно. Например:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

Объяснение

Чтобы знать это, вам нужно знать следующее.

Во-первых, что is делает? Это оператор сравнения. Из документации:

Операторы is и is not проверяют идентичность объекта: x is y истинно тогда и только тогда, когда x и y - один и тот же объект. x is not y выдает обратное значение истинности.

И поэтому следующие действия эквивалентны.

>>> a is b
>>> id(a) == id(b)

Из документации:

id Возвращает “идентификатор” объекта. Это целое число (или длинное целое число), которое гарантированно будет уникальным и постоянным для этого объекта в течение его срока службы. Два объекта с неперекрывающимися сроками службы могут иметь одинаковое id() значение.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What `is` does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use `is` then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.

Ответ 3

I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.

A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.

In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

(My italics)

Don't know about you but I see this and think: Let's find that array!

If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.

PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

So it's a macro that calls function get_small_int if the value ival satisfies the condition:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

So our condition is if (-5 <= ival && ival < 257) call get_small_int.

Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.

Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.

Но когда они выделяются??

Во время инициализации в _PyLong_Init Python с радостью введет цикл for, чтобы сделать это за вас:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

Ознакомьтесь с исходным кодом, чтобы прочитать тело цикла!

Я надеюсь, что мое объяснение теперь прояснило вам ситуацию с C (каламбур явно намеренный).

Но, `257 is 257`? Что случилось?

На самом деле это проще объяснить, и я уже пытался это сделать; это связано с тем, что Python выполнит этот интерактивный оператор как единый блок:

>>> 257 is 257

Во время выполнения этого оператора CPython увидит, что у вас есть два совпадающих литерала, и будет использовать одно и то же PyLongObject представление 257. Вы можете убедиться в этом, если самостоятельно выполните компиляцию и изучите ее содержимое:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

Когда CPython выполняет операцию, теперь он просто загружает точно такой же объект:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

Поэтому is вернет True.

Ответ 4

Это зависит от того, хотите ли вы увидеть, равны ли две вещи или это один и тот же объект.

is проверяет, являются ли они одним и тем же объектом, а не просто равны. Небольшие целые числа, вероятно, указывают на одну и ту же ячейку памяти для экономии места.

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

Вы должны использовать == для сравнения равенства произвольных объектов. Вы можете указать поведение с помощью __eq__, и __ne__ атрибуты.