Running notebooks with the PyMPX jupyter module
The PyMPX jupyter
module is designed to ease the process of creating good ETL using Jupyter notebooks.
Jupyter notebooks are easy to write and debug, but converting them to .py files each time they are edited is time consuming and error prone.
The PyMPX jupyter
module includes a function that can run a Jupyter notebook, optionally passing in parameters as keyword arguments.
When you use the jupyter run() function you must do the following:
Move hardcoded values to the first code cell of your Jupyter notebook.
Call the
jupyter.run()
function from your batch python code, passing in key-word arguments that match the hardcoded values in the notebook.
The notebook will be run, and the assignments in the first cell of the notebook will be overridden with the parameters in the call to run()
Getting Started
To use the jupyter notebook runner in python, you first need to import the module:
>>> from pympx import jupyter
If you get an error, check that you have installed pympx
correctly. See Installation
Running the notebook - simple version
The simplest way to run a Jupyter notebook is to call it without changing any of the contents.
The example notebook hello_world.ipynb (which can be found in pympx\doc\examples\hello_world.ipynb), looks like this:
text = "Hello World"
print(text)
Hello World
Assuming you are running your script from the root pympx directory, you can simply put the path to the notebook into the function:
>>> from pympx import jupyter
>>> jupyter.run(r'template_etl\code\scripts\hello_world.ipynb')
Running the notebook - with parameters
In order to pass in parameters you need to alter the Jupyter notebook.
Move any hardcoded values that you wish to parameterise for the batch job to the top of the file.
Put the path to the notebook into the function. The path below assumes you are running from the root pympx directory.
You also add any parameters that you want to change, in this case text
is the parameter we are changing.
>>> from pympx import jupyter
>>> jupyter.run(r'template_etl\code\scripts\hello_world.ipynb', text = "Goodbye World")
The notebook will be transformed before it is run:
text = "Hello World"
text = "Goodbye World"
print(text)
Goodbye World