RECENT POSTS
- Introducing modelx-cython: Boosting modelx with Cython Compilation
- A First Look at Python in Excel
- Enhanced Speed for Exported lifelib Models
- New Feature: Export Models as Self-contained Python Packages
- Why you should use modelx
- New MxDataView in spyder-modelx v0.13.0
- Building an object-oriented model with modelx
- Why dynamic ALM models are slow
- Running a heavy model while saving memory
- Running modelx in parallel using ipyparallel
- All posts ...
Introducing modelx-cython: Boosting modelx with Cython Compilation
Oct 21, 2023
modelx’s Push Towards Native-Code Performance
The recent addition of the export feature has substantially enhanced modelx’s capabilities. As evidenced in our previous post, exported models greatly benefit from a notable performance boost.
However, the journey of modelx doesn’t end here. When comparing models purely written in Python (without numpy assistance) to their native-code equivalents, the former significantly lags in speed. Typically, for computationally demanding tasks, compiled languages like C or C++ are favored. Thus, our next milestone for modelx is to create native-code models that far outpace the speed of exported models.
To this end, we’re introducing modelx-cython, an experimental python package. This tool transforms exported modelx models into native code using the prowess of Cython.
Demystifying Cython
Cython is a well-established tool that accelerates Python code by translating it to C, and subsequently compiling it to native code using an external C compiler. With over two decades of development, foundational Python libraries like numpy, pandas, scipy, and scikit-learn leverage Cython to build their C extension modules.
While Cython can translate nearly all Python code to C, a direct translation doesn’t always yield significant performance enhancements. This limitation stems from the use of Python objects in the translated C code, even for primitive objects like integers. To truly harness performance benefits, original Python code needs tweaking.
Cython extends the Python language with its own set of syntax, which allows users to optimize Python code by defining static types for variables and functions, as depicted in the following illustration:
However, a newer Cython feature lets users adjust the original Python code using pure Python syntax, leveraging decorators and Python’s type annotation feature. This enables Cython to generate C code where objects, especially numeric ones, are represented as C types, ensuring performance levels close to those written originally in compiled languages.
Introducing modelx-cython
modelx-cython, our new experimental package, creates native-code models from exported modelx models.
In the visualization below, modelx-cython handles the latter half of the conversion, while modelx’s export functionality manages the former half:
modelx-cython converts a pure-Python model exported by modelx into native code. This process comprises two primary steps:
- Preparing Cython Source Files:
- Initially, modelx-cython executes a sample script you provide to gather run-time type information.
- It then creates a copy of the original pure-Python model, appending “_cy” to the model’s name.
- With the type data and a spec file you supply, modifications are made to the source files of the copied model. This is done to optimize them for Cython compilation, emphasizing the translation into as many static objects as possible using the newly added type details.
- In addition, a setup file is generated for Cython invocation in the next step.
- Cython Compilation:
- Cython then compiles these prepared source files.
- Upon successful compilation, binary files (with extensions *.pyd for Windows and *.so for Linux) are produced alongside the source files. As a result, when you import this “_cy” version of the model into Python, Python reads these faster, compiled binary modules.
A lifelib Example
A Walkthrough Using the BasicTerm_S Model
To demonstrate the process, we’ll use the BasicTerm_S model from the basiclife library in lifelib. For reference, all code snippets provided here are available in the example
directory of the modelx-cython GitHub repository.
-
Exporting the Model:
Start by exporting the BasicTerm_S model using modelx:
import modelx as mx mx.read_model("BasicTerm_S").export("BasicTerm_S_nomx")
-
Setting Up the Sample File:
You’ll need to create a sample Python script, which will be executed in the initial step to capture run-time type information. Below is an example of this sample file:
from BasicTerm_S_nomx import BasicTerm_S for i in range(1, 11): BasicTerm_S.Projection[i].pv_net_cf()
Change the current directory to the location where BasicTerm_S_nomx is located. Then, create a
sample.py
file and populate it with the provided code. -
Preparing the Spec File:
In addition to the sample file, you also need a spec file. This file is for stipulating supplementary configuration settings not sourced from the sample run. In this example, the nested
dict
expression below determines that the parametert
in theProjection
space has a size of241
and that the return type fordisc_factors
should be of theobject
type, irrespective of the type designation from the sample run:{"spaces": {"Projection": {"cells_params": {"t": {"size": 241}}, "cells": {"disc_factors": {"return_type": "object"}}}}}
In the same directory, create a
spec.py
file and insert the code snippet above. -
Invoking
mx2cy
:modelx-cython installs the
mx2cy
command. Invoke themx2cy
command from your terminal:mx2cy BasicTerm_S
On successful completion, a new package labeled BasicTerm_S_nomx_cy will be created alongside BasicTerm_S_nomx. Within this directory, you’ll identify the compiled Python module binary files (*.pyd for Windows or *.so for Linux) produced during the
mx2cy
operation.
Performance Comparison
Let’s measure the performance of our compiled model against its pure Python counterpart. The following script computes net cashflow present values for 10,000 model points and aggregates them.
import sys
import os
import timeit
import psutil
is_nomx = '--nomx' in sys.argv
is_cy = '--cython' in sys.argv
process = psutil.Process(os.getpid())
def get_mem_use():
"""Return the memory usage of the current process in MB"""
return process.memory_info().rss / (1024 ** 2)
if is_nomx:
from BasicTerm_S_nomx import BasicTerm_S
elif is_cy:
from BasicTerm_S_nomx_cy import BasicTerm_S
else:
raise RuntimeError("Parameter missing")
result = {}
def run_model(m, size):
mem_use = get_mem_use()
result['value'] = sum(m.Projection[i].pv_net_cf() for i in range(1, size+1)) / size
result['mem_use'] = get_mem_use() - mem_use
result['time'] = timeit.timeit(
'run_model(BasicTerm_S, 10000)', number=1, globals=globals())
print(result)
The summarized performance results on both Windows and Linux (Windows Subsystem for Linux) are:
Run Time (sec.)
Model | Windows | Linux |
---|---|---|
Pure-Python | 17.7 | 14.1 |
Cythonized | 7.8 | 6.4 |
Ratio | 2.3 | 2.2 |
Memory Usage (MB)
Model | Windows | Linux |
---|---|---|
Pure-Python | 1831 | 1736 |
Cythonized | 440 | 443 |
Ratio | 4.2 | 3.9 |
On both platforms, the compiled models run at double the speed of their pure-Python counterparts and consume only one-fourth of the memory compared to the pure-Python models.
What’s Next?
Though promising, the current release of modelx-cython is experimental and may not work with many modelx models without manual tweaks.
Despite our compiled model being over twice as fast as the pure-Python version, we believe a tenfold speed increase is attainable with native code. The reason the potential performance isn’t fully realized is that while Cython compiles numeric objects (like integers and floats) into primitive C types, other objects, such as lists and pandas objects, remain as Python objects. As a result, the generated binary code relies on the Python interpreter to call these objects’ methods. To circumvent this inefficiency, the original modelx formulas should be written with Cython’s translation and compilation performance in mind. Alternatively, modelx-cython could analyze the original formulas for users, identify inefficient segments, and then adjust them for optimal compilation.
It’s a challenging path ahead, but one we’re excited to traverse. We envision modelx evolving into the premier tool for building full-fledged production models.