Running notebooks with the PyMPX jupyter module

The PyMPX jupyter module is designed to ease the process of creating good ETL using Jupyter notebooks.

Jupyter notebooks are easy to write and debug, but converting them to .py files each time they are edited is time consuming and error prone.

The PyMPX jupyter module includes a function that can run a Jupyter notebook, optionally passing in parameters as keyword arguments.

When you use the jupyter run() function you must do the following:

  • Move hardcoded values to the first code cell of your Jupyter notebook.

  • Call the jupyter.run() function from your batch python code, passing in key-word arguments that match the hardcoded values in the notebook.

The notebook will be run, and the assignments in the first cell of the notebook will be overridden with the parameters in the call to run()

Getting Started

To use the jupyter notebook runner in python, you first need to import the module:

>>> from pympx import jupyter

If you get an error, check that you have installed pympx correctly. See Installation

Running the notebook - simple version

The simplest way to run a Jupyter notebook is to call it without changing any of the contents.

The example notebook hello_world.ipynb (which can be found in pympx\doc\examples\hello_world.ipynb), looks like this:

Notebook

Assuming you are running your script from the root pympx directory, you can simply put the path to the notebook into the function:

>>> from pympx import jupyter
>>> jupyter.run(r'template_etl\code\scripts\hello_world.ipynb')

Running the notebook - with parameters

In order to pass in parameters you need to alter the Jupyter notebook.

Move any hardcoded values that you wish to parameterise for the batch job to the top of the file.

Put the path to the notebook into the function. The path below assumes you are running from the root pympx directory. You also add any parameters that you want to change, in this case text is the parameter we are changing.

>>> from pympx import jupyter
>>> jupyter.run(r'template_etl\code\scripts\hello_world.ipynb', text = "Goodbye World")

The notebook will be transformed before it is run:

Notebook

Jupyter module - reference

PyMPX.jupyter module - reference