A Universal Artifact Registration System for Machine Learning
What is it?¶
OpsML
provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.
Why?¶
The core focus of OpsML
is artifact registration, management and governance. After using various open-source and vendor tooling to manage different aspects of the machine learning project lifecycle, we found that we were still spending ample time gluing different tooling together in order to adequately govern, manage and deploy artifacts. Moreover, machine learning currently lacks a universal standard for artifact registration and governance, which makes managing machine learning projects and systems challenging. And given how expansive the machine learning tooling ecosystem is, and how each tool tends to have it's own way of saving, storing and tracking artifacts, we decided to build OpsML
with the goal of providing a common framework.
Features:¶
-
Simple Design: Standardized design that can easily be incorporated into existing projects.
-
Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think
trading cards for machine learning
. -
Type Checking: Strongly typed and type checking for data and model artifacts.
-
Support: Robust support for a variety of ML and data libraries.
-
Automation: Automated processes including onnx model conversion, metadata creation and production packaging.
Incorporate into Existing Workflows¶
Add quality control to your ML projects with little effort! With opsml
, data and models are represented as cards
and stored in a card registry
. This allows for easy versioning, tracking and storage of ML artifacts.
Our Goal¶
Our goal is 2-fold. (1) We want to provide a simple and consistent interface for managing and tracking ML artifacts that is easy for DSs to use and incorporate, and (2) we want to inject quality control by providing standardized interfaces to DSs that automate the creation and governance of ML artifacts for our engineers.
Why Use OpsML vs other open source or vendor tooling?¶
With the plethora of available ML tooling it can be difficult to decide which tooling to use. The following are some reasons why you might want to use Opsml
and why we created it.
- Need for a consistent and standardized ML workflow to use in your organization
- Not enthusiastic about vendor lock-in or paying a vendor to use an SDK and UI but still need to create and maintain your own infrastructure
- You want to use a tool that is open source and continually developed
- You want all artifacts to be given the same priority (no more treating data as less of a priority than models)
- Don't want to worry about implementation details (how to version, store and track artifacts)
- You'd like to have auto-generated metadata that meets engineering standards and can be used in production
- You want to be able to share artifacts and workflows across teams
Nitty Gritty Details and OpsML¶
- Supports streaming (can use big data and models!)
- Supports a variety of storage backends (local, gcs, s3)
- Supports a variety of databases (sqlite, postgres, mysql)
- Supports many ML and data libraries (sklearn, pytorch, tensorflow, huggingface, xgboost, lightgbm, catboost, polars, pandas, numpy, pyarrow, onnx, etc.)
- Can create your own interfaces if we're missing anything!
- Supports semantic versioning for all cards
- Type checking for all interfaces
- Strongly typed codebase
To get started using OpsML
, check out the installation and quickstart guides.
Supported Libraries¶
Opsml
is designed to work with a variety of ML and data libraries. The following libraries are currently supported:
Data Libraries¶
Name | Opsml Implementation |
---|---|
Pandas | PandasData |
Polars | PolarsData |
Torch | TorchData |
Arrow | ArrowData |
Numpy | NumpyData |
Sql | SqlData |
Text | TextDataset |
Image | ImageDataset |
Model Libraries¶
Name | Opsml Implementation | Example |
---|---|---|
Sklearn | SklearnModel |
link |
LightGBM | LightGBMModel |
link |
XGBoost | XGBoostModel |
link |
CatBoost | CatBoostModel |
link |
Torch | TorchModel |
link |
Torch Lightning | LightningModel |
link |
TensorFlow | TensorFlowModel |
link |
HuggingFace | HuggingFaceModel |
link |
VowpalWabbit | `VowpalWabbitModel`` | link |
Contributing¶
If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap
section in the docs and get started
Thanks goes to these phenomenal projects and people and people for creating a great foundation to build from!