Press "Enter" to skip to content

Category: Containers

Deploying Azure Data Services via Terraform

Chris Adkin has started a series on deploying Azure Arc enabled Data Services. Part 1 serves as an introduction

:One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc has to offer, Microsoft has come up with Jumpstart – a collection of GitHub repos for deploying Arc in different scenarios. Last Christmas I had a few vacation days and took the opportunity to try out Jumpstart for Azure Arc enabled data services on AWS. AWS was my choice because it made a certain amount of sense to try out Azure Managed SQL Server instances and Postgres Hyperscale on a cloud that they are not natively available on. After all, the whole point of Azure Arc enabled Data Services is to bring Azure to you on your terms if for any reason you cannot use the Azure cloud. 

Part 2 gives us an introduction to Terraform:

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool that works on the principle of infrastructure-as-code. Resources are specified in what are called configuration files using Hashicorp Control Language in a declarative manner, i.e. you state what you want and to the best of its ability Terraform attempts to create those resources for you. ‘Providers’ are used to create resources for particular types of entity, for example you might use local file, helm (the Kubernetes package manager), Azure, VMware providers etc. etc. . . . Using providers requires plugins, most of which are provided by Hashicorp, but third parties can write their own plugins also.

Check out the first two posts in what promises to be an interesting series.

2 Comments

A Mini-Lab: SQL Server 2019 on Docker

Ayman El-Ghazali has a three-parter for us. Part one involves installing Docker for Desktop on Windows and creating a SQL Server container:

Recently, with the help of a colleague at work, I’ve started to dabble a little with containers. I had a customer that requested some specific code to be tested, and I realized that I didn’t have my own local instance of SQL running (always good to have a local one). I decided to try to make this process easier instead of going the traditional route of creating a Virtual Machine and also to help me learn a new technology. In these series of posts, I’m going to document my process of creating a Mini Data Lab for SQL Server on my desktop using Docker. It is intended to be for beginners and in no way is an article for best practices or production deployments.

Part two includes persistent storage and some of the other niceties of hosting a database in a container:

Let’s first take a look at the way I have my disk/folder structure laid out. Again, this is on my personal computer so it’s not a best practice for production and more suitable for development environments.

For each container, I’m creating a separate folder with the MSSQL paths that I need to put my databases, transaction log, and backup files on. Additionally, under the DockerMount folder I have a folder called sqldockershared (which I will put some shared content in later).

Part three is about configuration in existing containers and deploying a second container side-by-side:

For those that are more curious in changing other Instance level properties here is a list of configurable properties for SQL Server on Linux via Bash.
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-mssql-conf?view=sql-server-ver15

Now we have a great foundation to create another container, so let us go and do that now. The code is similar to the previously created container with the exception of the file path for the data, log, and back files and the port number. The SQLShare path will be the same so that we can run our initialization script from there.

I’m bought-in on containers. There are still some pains around containers for production databases, but “some pain” is a much better experience than a few years ago, when the answer to the question of whether you want to use containers in production for databases was “Are you mad?”

Comments closed

Using containerd as a Kubernetes Container Runtime

Anthony Nocentino does a switcheroo:

In this post, I’m going to show you how to install containerd as the container runtime in a Kubernetes cluster. I will also cover setting the cgroup driver for containerd to systemd which is the preferred cgroup driver for Kubernetes. In Kubernetes version 1.20 Docker was deprecated and will be removed after 1.22. containerd is a CRI compatible container runtime and is one of the supported options you have as a container runtime in Kubernetes in this post Docker Kubernetes world. I do want to call out that you can use containers created with Docker in containerd.

There are a few steps, but it’s not crazy by any stretch, and that’s part of why the post-Docker-in-Kubernetes world won’t be chaos.

Comments closed

Running an mlflow Server on Azure

Paul Hernandez configures mlflow on Azure using platform-as-a-service offerings:

It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.

To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.

I concur on the power of mlflow.

Comments closed

Running Spark on Azure Kubernetes Service

Tsuyoshi Matsuzaki walks us through running Apache Spark on Azure Kubernetes Service:

Apache Spark officially includes Kubernetes support, and thereby you can run a Spark job on your own Kubernetes cluster. (See here for official document. Note that Kubernetes scheduler is currently experimental.)
Especially in Microsoft Azure, you can easily run Spark on cloud-managed Kubernetes, Azure Kubernetes Service (AKS).

In this post, I’ll show you step-by-step tutorial for running Apache Spark on AKS. In this tutorial, artifacts, such as, source code, data, and container images are all protected by Azure credentials (keys).

Although managed services for Apache Spark, such as, Azure Databricks, Azure Synapse Analytics, and Azure HDInsight, is the best place to run Spark workloads, you will get much flexibility by running workloads on managed Kubernetes (AKS) – such as, spot VM support, start/stop cluster, confidential computing (Intel SGX) support, so on and so forth.

Read on to see how. Though of these options, I’d probably choose Azure Databricks or Azure Synapse Analytics well before the others.

Comments closed

Changing a Kubernetes Cluster to containerd

Andrew Pruski wants to get ahead of the game:

DISCLAIMER – You’d never do this for a production cluster. For those clusters, you’d simply get rid of the existing nodes and bring new ones in on a rolling basis. This blog is just me mucking about with my Raspberry Pi cluster to see if the update can be done in-place without having to rebuild the nodes (as I really didn’t want to have to do that).

Check it out. In addition to the Twitter thread Andrew mentions, the Kubernetes group has a full blog post with more details.

Comments closed

Finding the Physical Path of a SQL Server Backup on a Container

Jack Vamvas is looking for love files in all the wrong places:

I’m migrating some SQL Server databases to Openshift Containers. The SQL Server is set up with persistent disk , with a dedicated persistent disk partition for the SQL Server defaultbackup directory. I don’t have access to the underlying files via command line and can only use command line. How can I get the physical disk device , which will then allow me to create a RESTORE DATABASE statement pointing to the device?

Read on for the answer, including a T-SQL script to find where these files live.

Comments closed

Considerations Before using SQL Server on Containers

Joy George Kunjikkur wants you to slow your roll a little:

It is easy to get started on development and simple testing using SQL containers. It was discussed in the previous post. But before putting into production and start developing real applications we had to make sure the below things at least.

Read on for those considerations. I think they are reasonable and generally agree with the bottom-line conclusion.

Comments closed