MLflow tracking with hierarchy

Amit Prasad
4 min readSep 23, 2023

Introduction

MLflow tracking is a fully organized mechanism where you can track your entire Data science code executions with offered APIs

APIs are available for the following languages

  • Python
  • Scala / Java
  • REST
  • R

Information is broadly further divided into different segments based on the nature of the data and its category

Code version: Git commit hash on which your code has run

Start time: Start time for the run

End time: End time for the run

Artifacts: Output files in various formats like Parquet, Images, scikit-learn model

Metrics: These are numeric values, they can be in the form of key-value metrics, which can help you visualize the metrics' full history

Parameters: String type key-value which the user selected

Source: Name of the file that has launched the run

Creation of Hierarchy and logging

There are core concepts to create/initiate the mlflow logging via APIs.

  • Experiments: For each Data science code we can create separate experiments and create multiple runs to analyze and compare the logging values. It is identified by experiment ID and experiment day in the entire workflow lifecycle.
  • Runs: Runs can be created under the experiment and they can be at multiple levels i.e. Parent run -> Child run -> Sub child run

The entire flow has been depicted down

MLflow Hierarchy
MLflow Hierarchy

Experiments have two essential information experiment ID and experiment name. We will try to create the entire depiction of an image via Python APIs

import mlflow
experiment_name = "explorer_history"
experiment_id = mlflow.create_experiment(name=experiment_name)

Since we will have the experiment ID now , so we can set the experiment ID for the current flow using below API

mlflow.set_experiment(experiment_id=experiment_id)

The above code block will create the experiment with the passed experiment name and return us the experiment ID which can be used further for creating the hierarchy.

The mlflow.client the module provides a Python CRUD interface to MLflow Experiments, Runs, Model Versions, and Registered Models. This is a lower-level API that directly translates to MLflow REST API calls

client = mlflow.MlflowClient()

Now we can create the parent run or let’s say just run using mlflow API but this is applicable when we are going to create the new run in the beginning.

run_name = 'parent_run_name'
active_run = mlflow.start_run(experiment_id=experiment_id, run_name=run_name)
return active_run.info.run_id

Now the below code will give us the run ID for the currently created parent run which can be further used for creating the child in reference to the same parent which we will come to later on this.

Now the scenario is how to allocate or search all the run-into parents that we created in the above code, so to answer that we will have to explore the API which is called search_runs(), you can find the brief details here.

experiment_runs = client.search_runs(experiment_ids=[experiment_id])

So, in the experiment_run If output_format is list: a list of mlflow.entities.Run. If output_format is pandas: pandas.DataFrame of runs, where each metric, parameter, and tag is expanded into its own column named metrics.*, params.*, or tags.* respectively. For runs that don’t have a particular metric, parameter, or tag, the value for the corresponding column is (NumPy) Nan, None, or None respectively.

Now moving on to creation of child runs

Earlier in the code blocks, we created the parent run so we are going to create a further hierarchy for the child.


active_run = client.create_run(
run_name="child_1",
experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{current_active_run_id}"})

Here, if we observe we have given them current_active_run_id which is nothing but the parent ID of the run, and child_1 is going to come under the parent.

Likewise, we can create n numbers of children under the same parent

active_run = client.create_run(
run_name="child_2",
experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{current_active_run_id}"})

Now moving on the creating the sub child runs

Sub-child will be created under the child, for our example let’s pick up the child_1

We will be simply searching the parent and child via search_runs() API, but this time our search parameters will play a vital role

Every output of search based on experiments will have various details like:

tags.mlflow.rootRunId 
tags.mlflow.parentRunId
tags.mlflow.runName

These are the attributes that help us to pass under search_runs() API to fetch the details

searched_runs = client.search_runs(experiment_ids=[experiment_id],
filter_string=f"tags.mlflow.rootRunId ILIKE '%{root_run_id}%' and tags.mlflow"
f".runName ILIKE '%{parent_run_name}%'",
run_view_type=ViewType.ACTIVE_ONLY)

so here to relate with our case

  • tags.mlflow.rootRunId : parent run id
  • tags.mlflow.parentRunId : child 1 run id

If child 1 run details are found it helps us to further fetch the details for the existing sub child and replace them with the new if at all it exists or else we can simply create again a run and make child 1 as parent, this way we should avoid the duplication

If one doesn’t want to touch the existing sub-child and each time creates a new then the name should be dynamic and may be auto-generated or with suffix/prefix with the current date time.

run_name = "sub_child_1"
active_run = client.create_run(run_name=f"{run_name}", experiment_id=experiment_id,
tags={MLFLOW_PARENT_RUN_ID: f"{child_1_run_id}"})

--

--

Amit Prasad

Engineer by profession, Scala | Data engineering | Distributed System Linkedin: https://www.linkedin.com/in/amitprasad119/