Press "Enter" to skip to content

Category: Containers

PolyBase and Dockerized Hadoop

I have a solution to a problem which vexed me for quite some time:

Quite some time ago, I posted about PolyBase and the Hortonworks Data Platform 2.5 (and later) sandbox.

The summary of the problem is that data nodes in HDP 2.5 and later are on a Docker private network. For most cases, this works fine, but PolyBase expects publicly accessible data nodes by default—one of its performance enhancements with Hadoop was to have PolyBase scale-out group members interact directly with the Hadoop data nodes rather than having everything go through the NameNode and PolyBase control node.

Click through for the solution.

Comments closed

strace and SQL Server Containers

Anthony Nocentino tries using strace to diagnose SQL Server process activity in a container:

We’re attaching to an already running docker container running SQL. But what we get is an idle SQL Server process this is great if we have a running workload we want to analyze but my goal for all of this is to see how SQL Server starts up and this isn’t going to cut it.
 
My next attempt was to stop the sql19 container and quickly start the strace container but the strace container still missed events at the startup of the sql19 container. So I needed a better way.

Don’t worry—Anthony finds a better way.

Comments closed

Saving Data in Docker Containers

Anthony Nocentino has a three-part series on persisting SQL Server data in Docker containers. Part 1 takes us through volumes:

Let’s talk about how we can use Docker Volumes and SQL Server to persist data. If we want to run SQL Server in a container we will want to decouple our data from the container itself. Doing so will enable us to delete the container, replace it and start up a new one pointing at our existing data. When running SQL Server in a container will store data in /var/opt/mssql by default. When the container starts up for the first time it will put the system databases in that location and any user databases created will also be placed at this location by default. 

Part 2 looks at how volumes differ between the Linux and Mac/Windows versions of Docker:

So in my previous post, we discussed Docker Volumes and how they have a lifecycle independent of the container enabling us to service the container image independent of the data inside the container. Now let’s dig into Volumes a little bit more and learn where Docker actually stores that data on the underlying operating system.  

Part 3 ties it in with SQL Server:

Makes sense…we changed where SQL Server is reading/writing data. macOS doesn’t support a file mode called O_DIRECT which allows for unbuffered read/write access to the file opened using the open system call.  O_DIRECT is used by systems that manage their own file caching, like relational database management systems (RDBMS). So as SQL starts up and tries to open the master database with O_DIRECT the files can’t be opened because the macOS kernel doesn’t support this mode. And this is the reason why we have to have that Linux VM around. That Linux VM will support O_DIRECT option on the file opened. See more about this at the GitHub issue here.

Definitely worth getting a handle on this if you’re interested in containers.

Comments closed

Local Database Builds with Jenkins

Steve Jones continues a series on continuous integration, containers, and all that is good in life:

The only way to build a database project in SQL Server is with an actual SQL Server. In this case, I don’t have any code that would error on LocalDB, so I’ll just use that. I coudl specify my local SQL Server development database if I had the need.

This is a test build, so I also don’t need any SQL Compare options or other switches.

Getting code into source control and building continuous integration around it has become a lot easier over the past several years. Easy enough that you can work a simple system out in a day or two of experimentation.

Comments closed

Exposing Multiple Docker Ports

Steve Jones shows how to expose multiple ports when spinning up a container:

I was working with containers recently with Jenkins. I didn’t want the server process running on my machine all the time, but I did need to allow some communication. Jenkins uses 8080 by default, but agents need another port.

I figured there was a way to do this, and I found it on Stack Overflow, which is the perfect forum for a question like this. The answer?

You’ll need to click through for the answer.

Comments closed

Kubernetes on Windows

Elton Stoneman helps us get started with Kubernetes on Windows boxes:

Now you can take older .NET Framework apps and run them in Kubernetes, which is going to help you move them to the cloud and modernize the architecture. You start by running your old monolithic app in a Windows container, then you gradually break features out and run them in .NET Core on Linux containers.

Organizations have been taking that approach with Docker Swarm for a few years now. I cover it in my book Docker on Windows and in my Docker Windows Workshop. It’s a very successful way to do migrations – breaking up monoliths to get the benefits of cloud-native architecture, without a full-on rewrite project.

Now you can do those migrations with Kubernetes. That opens up some interesting new patterns, and the option of running containerized Windows workloads in a managed Kubernetes service in the cloud.

Elton’s not kidding about this support being new. I’m not sure I’d entrust it for my production work just yet, but I’m glad to see people working on the problem.

Comments closed

Deploying a Big Data Cluster

Mohammad Darab takes us through the Big Data Cluster deployment process using Azure Data Studio:

I’ve been “playing around” with Big Data Clusters for some time now and CTP 3.2 is way ahead when it comes to streamlining the BDC deployment process. You can check out my 4-part series on deploying BDC on AKS to see how cumbersome the process used to be. New in CTP 3.2, you can deploy a BDC on AKS (an existing cluster OR a new cluster) using an Azure Data Studio notebook. Let’s see how.

Click through for instructions. It was rather smart of Microsoft to release the instructions as a notebook.

Comments closed

Kafka Docker on Kubernetes

Bill Ward gives us a step-by-step set of instructions for installing Kafka Docker on Kubernetes:

In this ultimate guide I will give you a simple step-by-step tutorial on installing Kafka Docker on Kubernetes. This post includes a complete video walk-through.

There has been a lot of interest lately about deploying Kafka to a Kubernetes cluster. If you are wanting to take the deep dive yourself then you found the right article. Now that we have Kafka Docker, deploying a Kafka cluster to Kubernetes is a snap.

This makes it even easier to get started with Kafka in a development environment.

Comments closed

Spark and dotnet in a Single Container

Ed Elliott shows how you can combine Spark and .NET Core in a single Docker container:

This is quite new syntax in docker and you need at least docker 17.05 (client and daemon), after the images “FROM blah” you can specify a name “core” in this case, then later you can copy from the first image to the second using “–from=” on the “COPY” command.

In this dockerfile I have added Spark 2.4.3 and the default environment variables we need to get spark running, if you grab this dockerfile and run “docker build -t dotnet-spark .” you should get an images you can then run which includes the dependencies for dotnet as well as spark.

Ed includes all of the scripts needed to test this out, too.

Comments closed