Skip to content


opsml logo

A Universal Artifact Registration System for Machine Learning


Tests Examples Style Ruff Py-Versions Checked with mypy Pydantic v2 gitleaks

What is it?

OpsML provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.

Why?

The core focus of OpsML is artifact registration, management and governance. After using various open-source and vendor tooling to manage different aspects of the machine learning project lifecycle, we found that we were still spending ample time gluing different tooling together in order to adequately govern, manage and deploy artifacts. Moreover, machine learning currently lacks a universal standard for artifact registration and governance, which makes managing machine learning projects and systems challenging. And given how expansive the machine learning tooling ecosystem is, and how each tool tends to have it's own way of saving, storing and tracking artifacts, we decided to build OpsML with the goal of providing a common framework.

Features:

  • Simple Design: Standardized design that can easily be incorporated into existing projects.

  • Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think trading cards for machine learning.

  • Type Checking: Strongly typed and type checking for data and model artifacts.

  • Support: Robust support for a variety of ML and data libraries.

  • Automation: Automated processes including onnx model conversion, metadata creation and production packaging.

Incorporate into Existing Workflows

Add quality control to your ML projects with little effort! With opsml, data and models are represented as cards and stored in a card registry. This allows for easy versioning, tracking and storage of ML artifacts.


opsml logo

Our Goal

Our goal is 2-fold. (1) We want to provide a simple and consistent interface for managing and tracking ML artifacts that is easy for DSs to use and incorporate, and (2) we want to inject quality control by providing standardized interfaces to DSs that automate the creation and governance of ML artifacts for our engineers.

Why Use OpsML vs other open source or vendor tooling?

With the plethora of available ML tooling it can be difficult to decide which tooling to use. The following are some reasons why you might want to use Opsml and why we created it.

  • Need for a consistent and standardized ML workflow to use in your organization
  • Not enthusiastic about vendor lock-in or paying a vendor to use an SDK and UI but still need to create and maintain your own infrastructure
  • You want to use a tool that is open source and continually developed
  • You want all artifacts to be given the same priority (no more treating data as less of a priority than models)
  • Don't want to worry about implementation details (how to version, store and track artifacts)
  • You'd like to have auto-generated metadata that meets engineering standards and can be used in production
  • You want to be able to share artifacts and workflows across teams

Nitty Gritty Details and OpsML

  • Supports streaming (can use big data and models!)
  • Supports a variety of storage backends (local, gcs, s3)
  • Supports a variety of databases (sqlite, postgres, mysql)
  • Supports many ML and data libraries (sklearn, pytorch, tensorflow, huggingface, xgboost, lightgbm, catboost, polars, pandas, numpy, pyarrow, onnx, etc.)
  • Can create your own interfaces if we're missing anything!
  • Supports semantic versioning for all cards
  • Type checking for all interfaces
  • Strongly typed codebase

To get started using OpsML, check out the installation and quickstart guides.

Supported Libraries

Opsml is designed to work with a variety of ML and data libraries. The following libraries are currently supported:

Data Libraries

Name Opsml Implementation
Pandas PandasData
Polars PolarsData
Torch TorchData
Arrow ArrowData
Numpy NumpyData
Sql SqlData
Text TextDataset
Image ImageDataset

Model Libraries

Name Opsml Implementation Example
Sklearn SklearnModel link
LightGBM LightGBMModel link
XGBoost XGBoostModel link
CatBoost CatBoostModel link
Torch TorchModel link
Torch Lightning LightningModel link
TensorFlow TensorFlowModel link
HuggingFace HuggingFaceModel link
VowpalWabbit `VowpalWabbitModel`` link

Contributing

If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap section in the docs and get started 😃

Thanks goes to these phenomenal projects and people and people for creating a great foundation to build from!