## Version control and software environments ----------- *[2022 MUSES Collaboration Meeting](https://forum.musesframework.io/t/muses-collaboration-meeting-2022)* T. Andrew Manning National Center for Supercomputing Applications @ UIUC --- ## Outline ----------- * What is version control? Why does everyone love Git? * Why do we use services like GitLab and GitHub? * What is a software environment? How do containers help us? * What tools are available to us develop code efficiently? * What are data structures? What is YAML? <aside class="notes"> This is introductory level talk. Some of you may learn almost nothing from this talk. Some of you will appreciate the refresher. Some of you will learn something from every slide. We are assuming you have done some programming but are not experts in software development concepts, tool, and practices. The goal is not to make you experts, but instead to introduce you to what you need to be a successful MUSES collaborator. </aside> --- ## [What is version control?](https://git-scm.com/video/what-is-version-control) * A version control system (VCS) does exactly what its name suggests: it gives you control over the versions of a file (document, code, any file). * A modern VCS like Git offers much more than this basic functionality. * Revision history * Branching * Merging * Synchronizing * There are far too many use cases for VCS to cover here. We will focus on why we use it in MUSES, and why virtually all software developers use it. In a nutshell: * **Collaboration**: Individuals can get often get by without VCS, but when working on code with other people, it becomes essential. * **Interdependency**: The moment your software depends on someone else's software, or their software depends on yours, you need VCS! <aside class="notes"> Watch some of the short videos linked in this slide. A VCS can be useful for historical review. Instead of saving 50 copies of a file numbered by version, you "commit" a version of the file before changing it so that you can always revert it or compare to an old version without the clutter. Plus, there is no worry you will taint your historical record by modifying one of your "static copies". It's also much more space efficient. </aside> ---- ## [Why does everyone love Git?](https://git-scm.com/video/what-is-version-control) * [**Git is "local-first"**](https://git-scm.com/). You can create a git repo in a local folder in a second without any servers or internet connection. * **Git is lightweight**. All Git needs to make is a `.git` subfolder in a directory. Making branches and performing diffs are super fast operations and typically require simple commands. ```bash git init git checkout -b my-branch git commit -m 'Fixed the syntax error in the output format' git diff my-branch main git checkout main git merge my-branch git push origin main ``` * **Git is fully decentralized**. Two or more *independent* Git repos can push and pull changes between each other without any central server being involved. You can use a repo hosting service like GitHub without your code being "locked-in". ---- ## Why do we use services like [GitLab](https://gitlab.com/) and [GitHub](https://github.com/)? * **Publishing**. Others can access your code and its static releases at any time from a "static" URL. You can apply access control to restrict who can view the code as desired. * **Backups**. Git services provide off-site backups for your code to protect against data loss from local workstation failures. * **Central source of truth**. Although Git is decentralized, your team needs an authoritative copy of the code that members to which members push and pull. ---- ## Why do we use services like [GitLab](https://gitlab.com/) and [GitHub](https://github.com/)? * **Collaborative tools**. * **Issues**. Issues are a task management tool that lets you define tasks, assign people, and discuss solutions. * **Wiki**. Repos have an associated wiki that you can use to draft/publish documentation and share group knowledge. * **Deployment services**. * **Container registry**. Provides a registry of Docker images that you and others can pull, for example when deploying in Kubernetes or running in Docker on your local workstation. * **Automation pipelines**. Automatic image builds and documentation rendering from source code upon commit. ---- ## What is semantic versioning? Semantic versioning, or [semver](http://semver.org) addresses this problem: > In the world of software management there exists a dreaded place called "dependency hell". The bigger your system grows and the more packages you integrate into your software, the more likely you are to find yourself, one day, in this pit of despair. Given a version number `MAJOR.MINOR.PATCH`, increment the: * `MAJOR` version when you make incompatible API changes, * `MINOR` version when you add functionality in a backwards compatible manner, and * `PATCH` version when you make backwards compatible bug fixes. <aside class="notes"> This </aside> --- ## What is a software environment? To execute even the simplest code, a computer relies on an entire "stack" of hardware and software, each layer of which has many versions and flavors: * **Hardware processor architecture** (x86_32, x86_64, ARM, PPC64, MIPS) * **Operating system** (Mac, Linux, Android, iOS, etc) * **Installed libraries** (Python packages, locally compiled C++ libraries, binaries installed by your OS package manager) * **Environment variables** There is a classic **"Works for me!"** problem when teams work together to build code, where one person can compile and run their code on their machine in their :snowflake: special snowflake development environment, but their teammate cannot. Due to the complexity of the stack, you can waste *so much time* debugging this common problem. ---- ## How do containers help us? **Containers eliminate the differences between developers' software environments** by creating a consistent "bubble" in which code will run the same regardless of what operating system and specific software environment is running on the "host". [If you know what a virtual machine is, you can think of it as a very lightweight virtual machine](https://www.docker.com/resources/what-container/). Containers are good for more than just developer productivity. **Our MUSES modules need to run on our [Kubernetes cluster](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/) that hosts the online MUSES service**, and containers guarantee that they will run properly even in that very different environment from the laptops and workstations of the developers. ---- ## How do we use containers in MUSES? We chose Docker as our container system in MUSES. Each independent module or other software component must have a `Dockerfile` that defines how to **build the image** for that module. We have a common "base image" that several module teams are already using to make their lives easier: https://gitlab.com/nsf-muses/common/-/blob/main/Dockerfile Some examples of modules using the common base image: https://gitlab.com/nsf-muses/module-cmf/cmf-solver/-/blob/main/Dockerfile https://gitlab.com/nsf-muses/module-bqs-eos/module-bqs-eos/-/blob/main/Dockerfile --- ## What tools are available to us develop code efficiently? ----------- <div style="font-size: smaller;"> Editing files in Windows Notepad or using `vi`/ `emacs` in a terminal is great, but there are alternatives that can make work much more efficient, even if you did receive a PhD in `vi` keybindings and `emacs` sequences... </div> * Integrated development environments (IDEs) like [Visual Studio Code](https://code.visualstudio.com/) or [Atom](https://atom.io/) * Extensions to provide syntax highlighting, code navigation, auto-formatting for any language * Built-in support for git [![](https://wiki.musesframework.io/uploads/65135112be0e89d0184679e04.png =600x)](https://code.visualstudio.com/) ---- ## What tools are available to us develop code efficiently? ----------- Git GUIs like [Git Cola](https://git-cola.github.io/) and [GitHub Desktop](https://desktop.github.com/) eliminate some of the typos and awkwardness of git CLI syntax, and provide visual aids when reviewing history and file diffs. ![](https://wiki.musesframework.io/uploads/65135112be0e89d0184679e06.png =400x) ![](https://wiki.musesframework.io/uploads/65135112be0e89d0184679e05.png =400x) --- ## What is a data structure? * Rigorously structured structure that captures information including both content and hierarchical relationships. * There are many ways to represent the same information. * A data structure can define a "data type" or "schema" (like a programming class) or it can capture a particular instance of a data type (like a programming object). * Your data structures *will evolve*: [Semantic versioning and backwards compatibility](https://forum.musesframework.io/t/semantic-versioning-and-backwards-compatibility) --- ## What is YAML? Why are we using it? * [YAML](https://yaml.org/): YAML Ain't Markup Languageā„¢ >YAML is a human-friendly data serialization language for all programming languages. * [What is YAML? A beginner's guide.](https://circleci.com/blog/what-is-yaml-a-beginner-s-guide/) * **YAML is a univeral translator for data**, and it is easy for humans to read and edit. * Brain-storm a random example to demonstrate the design process. --- ## Data structure examples * [Concept proposal](https://forum.musesframework.io/t/uniform-dockerfile-base-image-and-yaml-parsing-approach/) * [Constraints of EoS](https://forum.musesframework.io/t/empirical-and-theoretical-constraints-on-the-eos) * [BQS EoS example](https://forum.musesframework.io/t/ci-meeting-2022-04-26/411/2) --- ## C++ shared libraries for MUSES modules * In the [MUSES common code repo](https://gitlab.com/nsf-muses/common) there we have baked the `yaml-cpp` library into [the base Docker image](https://gitlab.com/nsf-muses/common/-/blob/main/Dockerfile) * Mauricio is working on some helper functions and classes that everyone can use for the common import/export mechanics. --- ## Conclusions ----------- * Version control is essential to building the MUSES calculation engine and cyberinfrastructure. Git is our friend. * Containers make collaborating on code more efficient and are necessary for our calculation engine to work. * Use the programming tools that save you time and support better code writing. ---- ## thank you Any more questions? <style> .reveal h3 { font-size: 4rem; font-weight: 500; margin-bottom: 6rem; //padding-bottom: 1rem; //border-bottom: 1px solid gray; } .reveal h4 { font-size: 3rem; font-weight: 500; margin-bottom: 5rem; } .reveal { font-size: 2.5rem; } .reveal a { text-decoration: underline; } /* p,ul { padding-left: 10%; padding-right: 10%; } */ </style>
{"title":"Version control and software environments","tags":"presentation","type":"slide","slideOptions":{"transition":"slide","theme":"simple","center":true},"notes":"# Version control and software environments\n\n* Speaker: Andrew Manning\n* Time: Tuesday, May 10, 13:30\n* Duration: 30 min\n"}