MLflow tracking with hierarchy
Introduction
MLflow tracking is a fully organized mechanism where you can track your entire Data science code executions with offered APIs
APIs are available for the following languages
- Python
- Scala / Java
- REST
- R
Information is broadly further divided into different segments based on the nature of the data and its category
Code version: Git commit hash on which your code has run
Start time: Start time for the run
End time: End time for the run
Artifacts: Output files in various formats like Parquet, Images, scikit-learn model
Metrics: These are numeric values, they can be in the form of key-value metrics, which can help you visualize the metrics' full history
Parameters: String type key-value which the user selected
Source: Name of the file that has launched the run
Creation of Hierarchy and logging
There are core concepts to create/initiate the mlflow logging via APIs.
- Experiments: For each Data science code we can create separate experiments and create multiple runs to analyze and compare the logging values. It is identified by experiment ID and experiment day in the entire workflow lifecycle.
- Runs: Runs can be created under the experiment and they can be at multiple levels i.e. Parent run -> Child run -> Sub child run
The entire flow has been depicted down
Experiments have two essential information experiment ID and experiment name. We will try to create the entire depiction of an image via Python APIs
import mlflow
experiment_name = "explorer_history"
experiment_id = mlflow.create_experiment(name=experiment_name)
Since we will have the experiment ID now , so we can set the experiment ID for the current flow using below API
mlflow.set_experiment(experiment_id=experiment_id)
The above code block will create the experiment with the passed experiment name and return us the experiment ID which can be used further for creating the hierarchy.
The mlflow.client
the module provides a Python CRUD interface to MLflow Experiments, Runs, Model Versions, and Registered Models. This is a lower-level API that directly translates to MLflow REST API calls
client = mlflow.MlflowClient()
Now we can create the parent run or let’s say just run using mlflow API but this is applicable when we are going to create the new run in the beginning.
run_name = 'parent_run_name'
active_run = mlflow.start_run(experiment_id=experiment_id, run_name=run_name)
return active_run.info.run_id
Now the below code will give us the run ID for the currently created parent run which can be further used for creating the child in reference to the same parent which we will come to later on this.
Now the scenario is how to allocate or search all the run-into parents that we created in the above code, so to answer that we will have to explore the API which is called search_runs(), you can find the brief details here.
experiment_runs = client.search_runs(experiment_ids=[experiment_id])
So, in the experiment_run If output_format is list
: a list of mlflow.entities.Run
. If output_format is pandas
: pandas.DataFrame
of runs, where each metric, parameter, and tag is expanded into its own column named metrics.*, params.*, or tags.* respectively. For runs that don’t have a particular metric, parameter, or tag, the value for the corresponding column is (NumPy) Nan
, None
, or None
respectively.
Now moving on to creation of child runs
Earlier in the code blocks, we created the parent run so we are going to create a further hierarchy for the child.
active_run = client.create_run(
run_name="child_1",
experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{current_active_run_id}"})
Here, if we observe we have given them current_active_run_id
which is nothing but the parent ID of the run, and child_1
is going to come under the parent.
Likewise, we can create n numbers of children under the same parent
active_run = client.create_run(
run_name="child_2",
experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{current_active_run_id}"})
Now moving on the creating the sub child runs
Sub-child will be created under the child, for our example let’s pick up the child_1
We will be simply searching the parent and child via search_runs() API, but this time our search parameters will play a vital role
Every output of search based on experiments will have various details like:
tags.mlflow.rootRunId
tags.mlflow.parentRunId
tags.mlflow.runName
These are the attributes that help us to pass under search_runs() API to fetch the details
searched_runs = client.search_runs(experiment_ids=[experiment_id],
filter_string=f"tags.mlflow.rootRunId ILIKE '%{root_run_id}%' and tags.mlflow"
f".runName ILIKE '%{parent_run_name}%'",
run_view_type=ViewType.ACTIVE_ONLY)
so here to relate with our case
- tags.mlflow.rootRunId : parent run id
- tags.mlflow.parentRunId : child 1 run id
If child 1 run details are found it helps us to further fetch the details for the existing sub child and replace them with the new if at all it exists or else we can simply create again a run and make child 1 as parent, this way we should avoid the duplication
If one doesn’t want to touch the existing sub-child and each time creates a new then the name should be dynamic and may be auto-generated or with suffix/prefix with the current date time.
run_name = "sub_child_1"
active_run = client.create_run(run_name=f"{run_name}", experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{child_1_run_id}"})