Skip to content

Internals

Momtchil Momtchev edited this page Dec 4, 2022 · 1 revision

Internals

Object lifecycle overview

Everything is organized around the PyObjectWrap which is the JS representation of a Python reference which is a PyObject* pointer.

A PyObjectWrap exists both in C++ and in JS through the ObjectWrap interface.

A Python reference that is not represented in JS is entirely managed by the Python GC.

Every time JavaScript needs to access a PyObject, it goes through PyObjectWrap::New()/PyObjectWrap::NewCallable(). These check if this PyObject has an already created JavaScript wrapper and if not, create a new one. Once a PyObjectWrap has been created for a given PyObject, all subsequent "creations" of a new object will return the existing object through the ObjectStore API. As a PyObjectWrap holds a strong reference on the PyObject, Python cannot GC objects which are referenced in JS.

Almost all Python references are managed by the two classes PyWeakRef and PyStrongRef. If you are coming from the Python world, you should read these as PyBorrowedRef and PyOwnedRef. These two classes take care of the reference counting and their main function is to make omitting incrementing or decrementing a reference harder. A PyStrongRef can be used in place of a PyWeakRef but one has to construct a new PyStrongRef if only a PyWeakRef is available.

When the PyObjectWrap is not referenced anymore by JS, V8 will eventually GC the object which will trigger the C++ destructor. This destructor will dereference the PyObject, signaling to Python that this object can be GCed and it will erase it from the ObjectStore.

Converting objects

The heart of the translation layer are the PyObjectWrap::FromJS() and PyObjectWrap::ToJS() recursive methods - with all their subroutines.

Both of them have local object stores that exist only for the duration of the recursion. These are only for detecting and handling circular references. The PyObjectWrap::ToJS() object store cannot be merged with the environment object store, because the environment object store stores only one reference to the top-most Python object, while PyObjectWrap::ToJS() performs a deep recursive transform.

PyObjectWrap::FromJS() has two functions: at the low-level, it can produce a raw PyObject from a JS object - those are needed for calling into Python. At the higher level, it produces a new PyObjectWrap representation of a JS object. Both functions use the same inner methods. FromJS returns strong references. PyObjectWrap::FromJS() can also extract PyObject references from PyObjectWrap objects. It also recognizes proxified objects and JS trampolines for Python functions. This is the PyObject pass-through.

PyObjectWrap::ToJS() accepts a weak reference which is kept only for the duration of the recursion. It constructs JS objects from Python objects. In some cases these new JS objects may in fact be PyObject - when dealing with functions and when encountering Python objects without JS equivalence.

Functions

Converting a Python function to JS function

assert(py_fn.callable);
const js_fn = py_fn.toJS();

In Python a function is also an object. Thus, a function is also a PyObjectWrap.

V8 allows the construction of a function reference with a C++ callback - this API is exported by Node.js through node-addon-api. When creating this object, one can associate a C++ structure to be passed as an argument to the C++ function - this structure contains the PyObject. The C++ trampoline carries the argument conversion and then calls the Python function.

A in JavaScript, a function is also an object and can have properties. Functions carry the underlying PyObjectWrap in a hidden __PyObject__ property. This allows PyObjectWrap::FromJS() to extract the PyObject reference from it is this object is used as an argument when calling another function - ie passing a callback to Python. This is what allows passing of arguments such as dtype=int16 in numpy or the subscript iterators in pandas - as these are in fact functions.

Converting a JS function to Python function

const py_fn = PyObject.fromJS((x) => +x + 1); // x will be a PyObject

pymport registers a new Python type, pymport.js_function. This type is callable - it implements the tp_call method. When Python invokes this object, a C++ trampoline wraps the arguments in JS PyObjectWrap objects. If the invocation is from the main V8 thread, JavaScript can be entered immediately. Otherwise the invoking thread must block, releasing the GIL so other Python code can run, and waiting for the main V8 thread to become available. The communication mechanism used is uv_async_send abstracted by ThreadSafeFunction in node-addon-api.

pymport.js_function also has a custom deallocator which decrements the V8 reference counter.

Execution contexts

Unless explicitly noted in the comments, all functions are expected to run exclusively on the V8 main thread. Notable exceptions are the AsyncWorker class (PympWorker) - used for asynchronous calling of Python code by callAsync - and the Python trampolines used for calling into JavaScript.

The GIL locking convention is that each C++ function that is entered from JavaScript - ie all the JavaScript calling convention methods - has to obtain the GIL. When calling JavaScript from a Python context, the lock is to be released. Additionally, all V8 finalizers, that are called directly from the Node.js event loop, have to obtain the GIL.

When called from a Python context, V8 finalizers are executed on the V8 main by using the RunInV8Context function through an uv_async_send handles.

Exceptions

Exceptions are not directly converted, they are always caught in C++ and reemitted as new objects according to each language semantics.