Version control and collaboration

On this page, you can follow my progress in learning about tools for version control, reproducible workflow and collaboration

Contents


1       Introduction

Software developers use version control systems for a long time. If the principle can be extended for other file types, such as LaTeX, documents, images and text files in .CSV file format (comma separated values), these systems would be useful for many other use cases such as medicine, good laboratory practice or research. Similarly, software for reproducible workflows emerging which makes documentation easy and allows automation of data analysis and reporting. The following software is promising: Stitch  (an ETL service by Stitch Inc. that loads data from GitLab and MySQL and allows for data analysis using R and Python) or the Invantive Query Tool and the Invantive Control for Excel and the Invantive Control for Word (loads data from GitLab using SQL and connects to Microsoft Word, Excel or allows for data analysis using SQL). In addition, the KNIME platform (an Eclipse rich client platform (RCP)) allows to create reproducible workflows that include i.e. Python or R scripts or ImageJ2 KNIME nodes that interact with data (e.g. with databases or text files in .CSV format (comma separated values / Excel sheets / google sheets) or automatically generate reproducible reports (see knitr/RMarkdown, knitpy) and the KNIME workflows can be put under revision- / version control by GitHub/GitLab or by the proprietary KNIME Server. You can install the KNIME Development SDK from Bio7 / Eclipse with the Eclipse-plugin Egit. Just follow the instructions and make sure you wait long enough until everything is loaded.

2       Using GitLab with Bio7 / Eclipse and Egit

GitLab.com provides unlimited private or public repositories for free. It is identical to Github.com in terms of interaction with Egit (except that you create groups, subgroups and projects instead of simple repositories) and has some extra features.

2.1         First time setup

  1. On gitlab.com, create an account and create a new online repository on gitlab.com.
  2. Confirm that the Eclipse plug-in Egit together with Gitflow components is installed or install those.
  3. Follow the user guide. The following steps provide additional information on how to proceed. Make sure, that the environment variable HOME is set to …Users/<UserProfile>
  4. In Bio7 / Eclipse, left-click Preferences -> Team -> Git -> Configuration -> User Settings -> Add Entry. In the field “key”, type “user.email” (without “”) and in the field “value” enter your GitLab login email address. Add another entry and in the field “key”, type “user.name” (without “”). In the field “value”, enter your Github username. Left-click Apply.
  5. Left-click Preferences -> SSH2 -> key management -> Generate RSA key. Save the private key, note the password you enter and copy-paste the public key and save it.
  6. Log in to your Gitlab account. Navigate to the “SSH Keys” tab in your ”Profile Settings”. Paste your key in the “Key” section and give it a relevant “Title”. Use an identifiable title like “Work Laptop - Windows 7” or “Home MacBook Pro 15”.

2.2         Repository setup

  1. In your GitLab account, create a new group (e.g. GitLabGroup1ByAuthor) and, if you like, a subgroup (e.g. Group1Subgroup1ByAuthor). Then create a new blank online project (call it e.g. Project1) and initialize it by Initialize repository with a README (see guide).
  2. To create an additional remote branch for testing, open the GitLab project in your GitLab account that was created in step 7 and left-click “Create new…” -> New branch. Type the name of the branch, e.g. “remote_mastertest”. Set “Create from” to “master” (= keep the default setting) and left-click “Create branch”. Then, similarly create a new branch “remote_dev2” from master. Finally create a new branch “remote_dev2_test” from remote_dev2.
  3. In Bio7 / Eclipse, open the Git perspective and Left-click “Clone repository”. Alternatively, in any perspective, left-click File -> Import -> Git -> Project from Git -> Next -> Clone URI -> Next. In the Wizard, paste the URI of the online GitLab repository.  Under target directory, unselect default and specify a folder under Users/<UserProfile>. Normally this is Users/<UserProfile>/git/. As protocol, select SSH and let the wizard set all settings automatically. Left-click “Finish”, enter the password that protects the SSH key (see step 5) and left-click OK.
  4. Open the Git perspective. Right-click on the local cloned repository created in step 8 and in the context-menu left-click “Import project”. Select the repository created in step 8 and left-click Next. If the repository contains no project folder, select “General Project” -> Finish.
  5. Open the Resources perspective of Bio7 / Eclipse and in the Project explorer view right-click on the project that was added in step 10. In the context menu, left-click Team -> Switch To -> New Branch -> Select… -> remote tracking -> origin/master -> OK. Type in the branch name, e.g. “local_test”. Repeat for a branch called e.g. “local_feature1”.
  6. In the Project Explorer view, right-click on the project and in the context-menu left-click Team -> Switch To -> feature1_local.   
  7. In the Project Explorer view, right-click on the root directory and in the context-menu left-click New -> folder and call it e.g. src. For Arduino Sketches, it is recommended to add another folder with the name of the program inside the folder src/. Then right-click on the folder and in the context-menu left-click New -> Project -> Arduino -> Arduino Sketch. In the Wizard, under target directory, unselect default and left-click “browse”. Navigate to the directory of the local clone of the GitLab repository that was created in step 8 (e.g. Users/<UserProfile>/git/Project1). Set the target directory to non-default Select the project with the name of the repository created in steps 8 and 9. Name the project the same as the previously created directory. For other projects, the nature of a directory can be changed later.
  8. After the project was created successfully, continue with development in your local environment.

2.3         Local development:

  1. Right-click on the project in the Resource perspective. In the context-menu left-click Team -> Switch To -> local_test. Right-click on the project in the Resource perspective. In the context-menu left-click Team -> Pull.
  2. Repeat step 15 for local master and finally for local_feature1
  3. While being checked in to the local_test branch, in the Project Explorer view, add folders and files to the project and optionally modify the content of those files in the Editor view. Perform formatting, code analysis, compilation and tests. Left-click File -> Save All.

2.4         Development with Egit:

  1. While being checked in to the local_test branch, Right-click on the project in the Resource perspective. In the context-menu left-click Team -> Merge… -> local -> local_test -> OK -> OK. Then right-click on the project and in the context-menu left-click Team -> Switch To -> local_test. Perform code analysis, compilation and tests. If everything works as expected, proceed.
  2. Switch to the branch local_feature1 and right-click the project and in the context-menu left-click Team -> Commit…. In the Git staging view of Gitflow that appears, select all files that were changed with a common goal and left-click “Add selected file to index” to stage these files. Then enter a commit message specifying the goal of the changes. Refer to the Egit guide, section “Working with Gitflow” for details. Finally, left-click “Commit”.
  3. Right-click on the project and in the context-menu left-click Team -> Push branch local_feature1 … In branch, delete the default entry master and type m for remote master, or r for remote branches starting with r. double-click on one of the remote test-branches. Then left-click Preview -> Push. Test the remote test branch. If the remote test branch behaves as expected, proceed.
  4. Repeat step 18 but select master (= remote master) or remote_dev as target branch.
  5. Right-click on the project in the Resource perspective. In the context-menu left-click Team -> Switch To -> master
  6. Go back to step 15, or close Bio7 / Eclipse.

The strategy used for short-lived feature branches is, before submitting a pull request, to rebase interactively on a previous commit and edit the commits to the feature branch, then checkout master, fetch changes from upstream and rebase (pull --rebase) master on upstream master, then create a m_feaure1_PR branch based off of master, test, then update master and rebase on master, then create a pull request, then before merging: rebase the PR branch on updated master, then merge the PR branch into upstream master. For long-lived feature branches, rebasing and merging can become tedious. Repeatedly rebase onto updated master as long as the feature branch is local. Once a branch is pushed for collaborative work or for opening a pull request or for use, changes from upstream can be incorporated by merging updated master into the feature branch. This creates an unclean history but is fail-save. If a long-lived feature branch was not updated for a long time, rebasing or merging can become difficult and/or result in an untidy history. You can clean the history by creating a new branch based on upstream/master and then extract all changes between the original feature branch and master using either git diff sha1 sha3 > diff or git log or git reflog (short for git log -g). Apply those changes using git apply diff. Alternatively, use git cherry-pick sha1 sha3. If you have no merge commits in the feature branch and do not want to identify the first and last commit ID (sha-1) manually, use the following syntax:
git cherry-pick $(git log devel..B --pretty=format:"%h" | tail -1)^..$(git log B -n 1 --pretty=format:"%h")
For brevity, git rebase -m invokes cherry-pick repeatedly for each commit passed to the git rebase -m command. However it collapses history on merge commits, so you have to cherry-pick manually in case you have merge commits in the feature branch.
It can be easier to squash all commits in the feature branch into one before merging/cherry-pick ing the changes into master. Daira Hopwood posted a more advanced solution on stackoverflow.com

When a new software version is ready for public testing, add a tag to a specified commit and push.
At the web-host’s web interface for the online-repository, select the previously created tag and add a release note to create a release.



Copyright © 2018-2020 DerAndere

Popular posts from this blog

Scalable knitting patterns with open source software: Textile design with Inkscape and GIMP.

AYAB shield - self soldered circuit board for a computer-controlled knitting machine