Using CWL and Toil to Wrap an Ad-hoc Astronomy Data Processing Pipeline

Blasting Students with Science I was recently invited to give a workshop on reproducible scientific workflows to students as part of the Inter-university Institute for Data Intensive Astronomy’s (IDIA) “JEDI” programme. The overall purpose of this workshop was to introduce students from the African continent to various topics that are being dealt with in the data science space. A large focus here was machine learning. This post details some of my experiences with preparing the original pipeline, working CWL around it and also teaching people how to do it....

June 27, 2018 · 15 min · Eugene de Beste

Understanding Ceph Placement Groups (TOO_MANY_PGS)

The Issue My first foray into Ceph was at the end of last year. We had a small 72TB cluster that was split across 2 OSD nodes. I was tasked to upgrade the Ceph release running on the cluster from Jewel to Luminous, so that we could try out the new Bluestore storage backend, and add two more OSD nodes to the cluster which brought us up to a humble 183TB....

March 14, 2018 · 6 min · Eugene de Beste

Removing CephFS from a Ceph Cluster (Luminous)

While upgrading the packages for the Ceph cluster at SANBI, I encountered an issue where the Ceph MDS daemon was causing the CephFS filesystem to become unresponsive and stuck in the active(laggy) state. I decided to strip down the CephFS deployment and reinstall it, since the existing one was for testing (set up before my time) and I wanted to do the process of setting it up from scratch. It was surprisingly difficult to find a simple process for removing an MDS, but after I did some digging I ended up using the following:...

March 13, 2018 · 1 min · Eugene de Beste