Running modelx in parallel using ipyparallel

Mar 13, 2022

In the post, “How fast are lifelib models now?”, we’ve seen that lifelib models run pretty fast. But each of these tests was run by a single process. In fact, the CPU utilization during any of the test runs was only about 8%.

How can modelx take full advantage of multiple CPU cores?

The answer is to leverage multiprocessing. You can write a multiprocessing program on your own using multiprocessing, the Python standard library for multiprocessing, which was briefly discussed before. However, a much easier approach is to use ipyparallel.

ipyparallel

ipyparallel lets you run multiple instances of your model in multiple processes on multiple machines. Here are some features of ipyparallel.

Easy to handle both synchronous and asynchronous communications.
Capable of sending and receiving numpy and pandas objects fast to and from engines.
Capable of communicating with remote machines in the same way as with the localhost.

Testing ipyparallel

The machine used for the test is the same as the one used for the previous test.

CPU: 12th Gen Intel Core i7-12700KF (20 logical CPU cores)
OS: Windows 11
Memory: 64GB

The relevant results form the previous test are shown below. These results are of single-process runs.

Model	Python ver.	# Model points	# Steps	# Calcs	Run time (Sec.)
`CashValue_ME`	3.9.7	10K	1141	46820	13.61
`CashValue_ME`	3.10.2	10K	1141	46820	9.85

Now we run up to 100K model points by up to 10 processes using ipyparallel. For example, if 10 processes are run and each process gets 10K model points, and if the parallel runs scale perfectly, the run time should be as the same as the run with 10K model points by a single process.

The notebook used for the test is available here, together with the model point samples and a notebook to generate the model points.

Test results

The table below shows the test summary. The run time does not include data loading and transfer, and shows only the processing time to calculate result_pv.

ID	Model	Python ver.	# Processes	# Model points	Run time (Sec.)
1	`CashValue_ME`	3.9.7	1	10K	13.61
2	`CashValue_ME`	3.9.7	1	100K	116
3	`CashValue_ME`	3.9.7	5	50K	15
4	`CashValue_ME`	3.9.7	5	100K	30
5	`CashValue_ME`	3.9.7	10	100K	22
6	`CashValue_ME`	3.10.2	1	10K	9.85
7	`CashValue_ME`	3.10.2	1	100K	65
8	`CashValue_ME`	3.10.2	5	50K	11
9	`CashValue_ME`	3.10.2	5	100K	19
10	`CashValue_ME`	3.10.2	10	100K	14

Here findings from the test.

The runs with 10 processes are 5 times faster than the single process runs.
Per-process performance gets gradually worse as the number of processes increase.
There are a number of formulas that are not proportional to the number of model points. These formulas should take a fixed amount of time regardless of the number of processes.
The parallel runs with 10 processes utilize 65% to 70% of the CPU cores while the single process runs utilize only 8%. The 5 process parallel runs utilize 35%.
Python 3.10 is 25%-45% faster than Python 3.9
Garbage collection needed be turned off to get the results above. When garbage collection was turned on, the run speed deteriorated significantly.

Note: Garbage collection is a mechanism to clean up data that is not referenced by any variable anymore by freeing the memory for the data.

RECENT POSTS

Running modelx in parallel using ipyparallel

ipyparallel

Testing ipyparallel

Test results