This is an ML pipeline with both CI & CD components*.
The story today is about CD - Continuous Deployment**. I’m assuming as the Data Scientist I don’t have access to the production environment (no “one-click deploy to prod” for me)***.
I develop my model (new feature branch) locally, using our dev database and local compute. I track experiments with MLflow and orchestrate my code with Dagster.
Once satisfied with my new model, I do the following to deploy:
I check my code into a feature-branch in source control
Open a pull request to the dev branch of our code base
Kick back and watch
This starts a CI job via Github actions that:
Builds my project
Test my code
Deploys to dev environment (teeny little deployment :)
If successful, another team member will merge my code into the dev branch.
Upon the merge into dev, this automatically triggers (dare I say, continuously) a deployment job:
Deploys my code to Staging
In my case, this job re-runs my ML pipeline and tests on staging data
The benefit here is that typically staging data is closer to production data than whatever I was using in dev
If successful, this job initiates a manual review process for deployment to prod:
Prompts an admin to review my code and choose whether to run the final job in the workflow: deploying to production.
As the admin, I approve the deployment and the CD jobs finishes by training and deploying the model in prod.
Lots of hand-waving in this example, but I hope it helps show the git-based workflow moving between environments and the larger theme of the significant work required to actually deploy an ML project.
This is part of my ongoing saga is to get closer to something resembling an actual production deployment instead of the notebook-based fit/predict/API patterns you tend to see
** I have a tendency of using CI/CD interchangeably (read: incorrectly). Setting up this example has really helped clarify where
*** In my examples, I only move code between environments, never models. This is the pattern I see most often with ML teams. It’s possible you might deploy your model from dev to staging to prod
Continual - we're lucky to work with ML teams that care about software engineering best practices. If that sounds like you, please hit us up.
#python #ml #dagster #mlflow
Feel free to connect with me on LI: / gustafrcavanaugh
Смотрите видео Adding CI/CD to ML Pipeline with MLflow, Dagster, and Github Actions онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Gus Cavanaugh 08 Декабрь 2022, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 1,832 раз и оно понравилось 34 людям.