A DevOps approach to research
Sebastian Pölsterl
sebastian.poelsterl@icr.ac.uk
C4RR, 28 June 2017
Challenges in Research #1
- A piece of code tends to work only for a single point in time and space
Challenges in Research #2
- Errors are often discovered months after they have been introduced
Challenges in Research #3
- We want to repeat the same analysis with different parameters and/or data, sometimes 6 months later
Docker
- Allows bundeling software and all of its dependencies into an image
- Built images can be shared easily
- Docker provides a more lightweight solution, compared to virtual machines
- Easy to get started: Docker Hub has a rich collection of pre-built images
GitLab
- GitLab is an open-source platform similar to Github for managing projects
- Features:
- Source code browser (Git repository)
- Repository and activity monitoring
- Issue management
- Code review
- Wiki
- Continuous integration/deployment
- Docker registry
Continuous integration (CI)
- Frequently merge all developers working copies and identifying/resolve problems as early as possible
GitLab CI
- GitLab CI allows defining pipelines to automate certain tasks
- Jobs can be triggered manually or automatic on specific events (e.g. a new commit is pushed)
- GitLab CI supports Docker!
Example Project
Learn the distribution of hand written digits using a Restricted Boltzmann Machine (RBM) and sample from it.
Ingredients:
- Code based on Theano’s RBM tutorial
- Docker image containing the code and its dependencies
- GitLab CI configuration
GitLab CI configuration
- GitLab CI uses a YAML file (
.gitlab-ci.yml
)
- Defines a set of jobs with constraints stating when they should be run
- Jobs are picked up by Runners (shell, Docker, SSH, VirtualBox, …)
- Each job is run independently from each other
Docker Image in Registry
Coverage report
Coverage report
Artifacts
Artifacts - samples.png
Challenge #3
- We want to repeat the same analysis with different parameters and/or data, sometimes 6 months later
- Solution
- Create a new Docker image for every single analysis you are performing
- Automatically commit it to a Docker registry (just a Dockerfile is not enough)
- GitLab CI can trigger your analysis, collect and store the results