Category: Containers

Running a Docker Container as a WSL2 Distribution

Published 2021-04-14 by Kevin Feasel

Andrew Pruski has a wacky idea that just might work:

I’ve been playing around a bit with WSL2 and noticed that you can import TAR files into it to create your own custom distributions.
This means that we can export docker containers and run them as WSL distros!
So, let’s build a custom SQL Server 2019 docker image, run a container, and then import that container into WSL2…so that we have a custom distro running SQL Server 2019.

Read on to see how.

Comments closed

Deploying an Azure Arc Enabled Data Services Controller

Published 2021-04-06 by Kevin Feasel

Chris Adkin continues a series:

If you have been following this series, you should have:
– a basic understanding of Terraform
– a Kubernetes cluster that you can connect to using kubectl
– a basic understanding of Kubernetes services
– a working metalLB load balancer
– a basic understanding of how storage works in the world of Kubernetes
– a Kubernetes storage solution in the form of PX Store, alternatively you can use any solution (for the purposes of this series) which supports persistent volumes, however to use the backup solution in part 9 of the series you will need to use something that supports CSI

From here, Chris explains the importance of the data controller and then deploys one.

Comments closed

Deploying a Storage Solution to a Kubernetes Cluster

Published 2021-03-29 by Kevin Feasel

Chris Adkin continues a series:

Before we dive into deploying a storage solution to our Kubernetes cluster, we need to understand the basics of storage in the world of Kubernetes, which can appear to be both exotic and mysterious to the uninitiated. To dispel some confusion around Kubernetes and storage, the storage IO path is exactly the same as that with common garden vanilla variety Unix or Linux. The Kubernetes storage ecosystem introduces two extra things we need to concern ourselves with above and beyond conventional Unix/Linux storage, firstly there are some extra layers of abstraction between the physical storage and filesystems that pods use, what I like to refer to as . . .

Read the whole thing. And that was a particularly mean cut-off point on my part, if I do say so.

Comments closed

Deploying Azure Data Services via Terraform

Published 2021-03-16 by Kevin Feasel

Chris Adkin has two additional parts of a series. Part 3 shows us how to deploy a virtual machine on VMware:

To do this you require an Ubuntu virtual machine, I’ve tested this with Ubuntu 18.04 LTS and I will get around to testing it with Ubuntu 20.10 at some stage. If for example the virtual machine was created with a user called azuser, the deployment server should also have an azuser account under which all Terraform commands are executed.

Part 4 takes those VMs and set up a Kubernetes cluster across them:

Whatever you do when deploying a Kubernetes cluster, somewhere along the line you have to use kubeadm. There is a wealth of material available on blog posts and on the internet in general in which people roll there own scripts using kubeadm. I often suspect that many of these efforts are the result of Kelsey Hightower’s: Kubernetes the hard way. In this post we are emphatically going to do things the easy way, […]

And now we’re caught up on the series…for the moment, at least.

Comments closed

Deploying Azure Data Services via Terraform

Published 2021-03-15 by Kevin Feasel

Chris Adkin has started a series on deploying Azure Arc enabled Data Services. Part 1 serves as an introduction

:One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc has to offer, Microsoft has come up with Jumpstart – a collection of GitHub repos for deploying Arc in different scenarios. Last Christmas I had a few vacation days and took the opportunity to try out Jumpstart for Azure Arc enabled data services on AWS. AWS was my choice because it made a certain amount of sense to try out Azure Managed SQL Server instances and Postgres Hyperscale on a cloud that they are not natively available on. After all, the whole point of Azure Arc enabled Data Services is to bring Azure to you on your terms if for any reason you cannot use the Azure cloud.

Part 2 gives us an introduction to Terraform:

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool that works on the principle of infrastructure-as-code. Resources are specified in what are called configuration files using Hashicorp Control Language in a declarative manner, i.e. you state what you want and to the best of its ability Terraform attempts to create those resources for you. ‘Providers’ are used to create resources for particular types of entity, for example you might use local file, helm (the Kubernetes package manager), Azure, VMware providers etc. etc. . . . Using providers requires plugins, most of which are provided by Hashicorp, but third parties can write their own plugins also.

Check out the first two posts in what promises to be an interesting series.

2 Comments

A Mini-Lab: SQL Server 2019 on Docker

Published 2021-03-10 by Kevin Feasel

Ayman El-Ghazali has a three-parter for us. Part one involves installing Docker for Desktop on Windows and creating a SQL Server container:

Recently, with the help of a colleague at work, I’ve started to dabble a little with containers. I had a customer that requested some specific code to be tested, and I realized that I didn’t have my own local instance of SQL running (always good to have a local one). I decided to try to make this process easier instead of going the traditional route of creating a Virtual Machine and also to help me learn a new technology. In these series of posts, I’m going to document my process of creating a Mini Data Lab for SQL Server on my desktop using Docker. It is intended to be for beginners and in no way is an article for best practices or production deployments.

Part two includes persistent storage and some of the other niceties of hosting a database in a container:

Let’s first take a look at the way I have my disk/folder structure laid out. Again, this is on my personal computer so it’s not a best practice for production and more suitable for development environments.
For each container, I’m creating a separate folder with the MSSQL paths that I need to put my databases, transaction log, and backup files on. Additionally, under the DockerMount folder I have a folder called sqldockershared (which I will put some shared content in later).

Part three is about configuration in existing containers and deploying a second container side-by-side:

For those that are more curious in changing other Instance level properties here is a list of configurable properties for SQL Server on Linux via Bash.
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-mssql-conf?view=sql-server-ver15
Now we have a great foundation to create another container, so let us go and do that now. The code is similar to the previously created container with the exception of the file path for the data, log, and back files and the port number. The SQLShare path will be the same so that we can run our initialization script from there.

I’m bought-in on containers. There are still some pains around containers for production databases, but “some pain” is a much better experience than a few years ago, when the answer to the question of whether you want to use containers in production for databases was “Are you mad?”

Comments closed

Using containerd as a Kubernetes Container Runtime

Published 2021-02-15 by Kevin Feasel

Anthony Nocentino does a switcheroo:

In this post, I’m going to show you how to install containerd as the container runtime in a Kubernetes cluster. I will also cover setting the cgroup driver for containerd to systemd which is the preferred cgroup driver for Kubernetes. In Kubernetes version 1.20 Docker was deprecated and will be removed after 1.22. containerd is a CRI compatible container runtime and is one of the supported options you have as a container runtime in Kubernetes in this post Docker Kubernetes world. I do want to call out that you can use containers created with Docker in containerd.

There are a few steps, but it’s not crazy by any stretch, and that’s part of why the post-Docker-in-Kubernetes world won’t be chaos.

Comments closed

Running an mlflow Server on Azure

Published 2021-01-08 by Kevin Feasel

Paul Hernandez configures mlflow on Azure using platform-as-a-service offerings:

It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.
To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.

I concur on the power of mlflow.

Comments closed

Running Spark on Azure Kubernetes Service

Published 2020-12-16 by Kevin Feasel

Tsuyoshi Matsuzaki walks us through running Apache Spark on Azure Kubernetes Service:

Apache Spark officially includes Kubernetes support, and thereby you can run a Spark job on your own Kubernetes cluster. (See here for official document. Note that Kubernetes scheduler is currently experimental.)
Especially in Microsoft Azure, you can easily run Spark on cloud-managed Kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you step-by-step tutorial for running Apache Spark on AKS. In this tutorial, artifacts, such as, source code, data, and container images are all protected by Azure credentials (keys).
Although managed services for Apache Spark, such as, Azure Databricks, Azure Synapse Analytics, and Azure HDInsight, is the best place to run Spark workloads, you will get much flexibility by running workloads on managed Kubernetes (AKS) – such as, spot VM support, start/stop cluster, confidential computing (Intel SGX) support, so on and so forth.

Read on to see how. Though of these options, I’d probably choose Azure Databricks or Azure Synapse Analytics well before the others.

Comments closed

Changing a Kubernetes Cluster to containerd

Published 2020-12-04 by Kevin Feasel

Andrew Pruski wants to get ahead of the game:

DISCLAIMER – You’d never do this for a production cluster. For those clusters, you’d simply get rid of the existing nodes and bring new ones in on a rolling basis. This blog is just me mucking about with my Raspberry Pi cluster to see if the update can be done in-place without having to rebuild the nodes (as I really didn’t want to have to do that).

Check it out. In addition to the Twitter thread Andrew mentions, the Kubernetes group has a full blog post with more details.

Comments closed