How does CPython handle circular references during garbage collection?

Explore this question in depth in our interactive lesson on Deep Dive into Python Object Internals.

Can I inspect the ob_refcnt of an object in Python?

Explore this question in depth in our interactive lesson on Deep Dive into Python Object Internals.

What happens if ob_refcnt becomes negative?

Explore this question in depth in our interactive lesson on Deep Dive into Python Object Internals.

How does PyObject_VAR_HEAD differ from a standard PyObject header?

Explore this question in depth in our interactive lesson on Deep Dive into Python Object Internals.

Are custom class instances also stored as PyObject structs?

Explore this question in depth in our interactive lesson on Deep Dive into Python Object Internals.

Lesson 1

Deep Dive into Python Object Internals

~5 min50 XP

Introduction

Every value in Python, from a simple integer to a complex function, is an object built upon the C language. In this lesson, we will peel back the abstraction layer to reveal how these objects are represented in memory and how the CPython runtime manages their lifecycle.

The Foundation: The PyObject Structure

In the eyes of the CPython interpreter, every object is essentially a C struct known as a PyObject. This structure is the base level of all Python data types and is defined in the source file Include/object.h. At its core, the PyObject contains only two fields: ob_refcnt (reference count) and ob_type (a pointer to the type object).

When you create a Python variable, the interpreter allocates memory for this struct. The ob_refcnt is crucial for garbage collection, as it tracks how many references point to that memory location. When this count hits zero, the memory is immediately deallocated. The ob_type pointer tells Python whether the object is an int, str, list, or a custom class, effectively defining what operations can be performed on the raw bytes stored in the object.

Which two fields are mandatory for every PyObject structure?

Variable Sized Objects

Not all data types have a fixed size. While an integer has a predictable memory footprint, a list or string must grow dynamically. For these cases, CPython uses the PyVarObject structure, which extends PyObject by adding an ob_size field. This field tracks the number of items in the container.

The memory management for these objects is handled by the obmalloc allocator. Instead of calling the standard C malloc for every tiny object creation, which would be inefficient and lead to memory fragmentation, python uses a multi-layered allocator. It pre-allocates pools of memory for small objects (up to 512 bytes) and uses the standard system allocator for larger ones. This design significantly improves the performance of short-lived objects.

Note: Because CPython uses ob_refcnt, circular references (where object A points to B, and B points to A) can cause memory leaks. Python solves this with a separate cyclic garbage collector that periodically detects and cleans up these unreachable islands of objects.

Type Objects and Method Resolution

If a PyObject holds the data, where are the methods and attributes stored? They reside in the Type Object. For a class, the type object contains a dictionary (tp_dict) of methods. When you call my_list.append(), Python follows the object's ob_type pointer to the list type object and looks up the append function in its dictionary.

This mechanism is the basis for method resolution order and why Python's dynamic nature works. Because the type definition is just a pointer, you can technically change an object's __class__ at runtime, pointing it to a different type object. This alters the behavior of the object mid-execution, though this is rarely done in production environments due to the stability risks involved.

The object's internal data (like the characters in a string) is stored directly inside the PyObject structure rather than the PyTypeObject.

Small Object Optimization

CPython employs an optimization technique called interning for small integers (typically -5 to 256) and certain strings. Because these objects are used so frequently, CPython creates a pre-allocated array of these objects during startup.

When you define a = 10 and b = 10, Python does not create two separate memory objects. Instead, both variables point to the same memory address in the interned array. This saves memory and speeds up identity comparison, as the interpreter can simply compare the memory addresses rather than the values themselves. This is why a is b returns True for small integers but may return False for larger ones.

___ is the specific CPython memory allocator system designed to efficiently manage small objects and object pools.

Key Takeaways

Every Python variable is a pointer to a PyObject structure containing a reference count and a type pointer.
PyVarObject extends base objects to support dynamic sizing for lists, dictionaries, and strings.
CPython improves performance by using obmalloc for small objects and interning frequently used values.
Methods and behaviors are not defined on instances themselves, but reside in the tp_dict of their corresponding type object.

Finding tutorial videos...

Go deeper

How does CPython handle circular references during garbage collection?🔒
Can I inspect the ob_refcnt of an object in Python?🔒
What happens if ob_refcnt becomes negative?🔒
How does PyObject_VAR_HEAD differ from a standard PyObject header?🔒
Are custom class instances also stored as PyObject structs?🔒