A DevOps approach to research

Sebastian Pölsterl

Institute of Cancer Research

C4RR, 28 June 2017

Challenges in Research #1

  • A piece of code tends to work only for a single point in time and space

cycle-lane-1

Challenges in Research #2

  • Errors are often discovered months after they have been introduced

cycle-lane-2

Challenges in Research #3

  • We want to repeat the same analysis with different parameters and/or data, sometimes 6 months later

broken-vase

The DevOp Approach

  • DevOps are responsible for developing, testing, and operating a piece of software
  • Only possible by relying on a large set of tools that help automating certain processes of their work (version control, unit testing, deployment, monitoring)
  • Git and Docker are two of the most fundamental tools

Jenkins Bamboo Chef Travis CI Selenium
Docker Git Puppet Snort logstash

Docker moby

  • Allows bundeling software and all of its dependencies into an image
  • Built images can be shared easily
  • Docker provides a more lightweight solution, compared to virtual machines
  • Easy to get started: Docker Hub has a rich collection of pre-built images

GitLab gitlab

  • GitLab is an open-source platform similar to Github for managing projects
  • Features:
    • Source code browser (Git repository)
    • Repository and activity monitoring
    • Issue management
    • Code review
    • Wiki
    • Continuous integration/deployment
    • Docker registry

Continuous integration (CI)

  • Frequently merge all developers working copies and identifying/resolve problems as early as possible

ci-infograph

GitLab CI

  • GitLab CI allows defining pipelines to automate certain tasks
  • Jobs can be triggered manually or automatic on specific events (e.g. a new commit is pushed)
  • GitLab CI supports Docker!

ci-infograph

Example Project

Learn the distribution of hand written digits using a Restricted Boltzmann Machine (RBM) and sample from it.

Ingredients:
  1. Code based on Theano’s RBM tutorial
  2. Docker image containing the code and its dependencies
  3. GitLab CI configuration

GitLab CI configuration

  • GitLab CI uses a YAML file (.gitlab-ci.yml)
  • Defines a set of jobs with constraints stating when they should be run
  • Jobs are picked up by Runners (shell, Docker, SSH, VirtualBox, …)
  • Each job is run independently from each other

Docker Registry

Build stage

Docker Image in Registry

registry

Test stage

Coverage report

pytest

Coverage report

pytest

Deploy stage

Artifacts

artifacts

Artifacts - samples.png

samples

Challenges #1

  • A piece of code tends to work only for a single point in time and space

  • Solution
    • Always use Git, and commit often!
    • Manage your dependencies (e.g. by using Docker)
    • Use continuous integration

Challenge #2

  • Errors are often discovered months after they have been introduced

  • Solution
    • Use continuous integration
    • Unit tests would be ideal, but static code analysis or regression tests are a good start

Challenge #3

  • We want to repeat the same analysis with different parameters and/or data, sometimes 6 months later
  • Solution
    • Create a new Docker image for every single analysis you are performing
    • Automatically commit it to a Docker registry (just a Dockerfile is not enough)
    • GitLab CI can trigger your analysis, collect and store the results

Thanks for your attention!

giraffe

https://gitlab.com/sebp/devops-approach-to-research/