Press "Enter" to skip to content

Category: Big Data Clusters

New in SQL Server Big Data Clusters

Daniel Coelho has an update on what’s available in SQL Server Big Data Clusters:

SQL Server Big Data Clusters (BDC) is a capability brought to market as part of the SQL Server 2019 release. Big Data Clusters extends SQL Server’s analytical capabilities beyond in-database processing of transactional and analytical workloads by uniting the SQL engine with Apache Spark and Apache Hadoop to create a single, secure, and unified data platform. It is available exclusively to run on Linux containers, orchestrated by Kubernetes, and can be deployed in multiple-cloud providers or on-premises.

Today, we’re proud to announce the release of the latest cumulative update, CU13, for SQL Server Big Data Clusters which includes important changes and capabilities:

Updating to the most recent production-ready version of Spark (as of today) is a nice upgrade.

Leave a Comment

Updates to SQL Server Big Data Clusters

Rahul Ajmera fills us in on what they’ve been doing with SQL Server Big Data Clusters:

Today, we’re announcing the release of the latest cumulative update (CU9) for SQL Server Big Data Clusters, which includes important capabilities:

– Support to configure BDC post deployment.
– Improved experience for encryption at rest.
– Ability to install Python packages at Spark job submission time.
– Upgraded software versions for most of our OSS components (Grafana, Kibana, FluentBit, etc.) to ensure Big Data Clusters images are up to date with the latest enhancements and fixes.
– Miscellaneous improvements and bug fixes.

This announcement highlights some of the major improvements, provides additional context to better understand the design behind these capabilities, and points you to relevant resources to learn more and get started.

Click through for more detail on a few of the items.

Comments closed

Looking at BDC in Kubernetes with Lens

Mohammad Darab shows off a tool to monitor the Kubernetes cluster driving a Big Data Cluster:

I don’t recall how I came across this Kubernetes IDE called Lens, but all I know is it’s cool as hec! It connects to a Kubernetes cluster (using the kube config file) and gives you an in depth view of all the different Kubernetes objects, their associated yaml files, health/metrics, etc. In this blog post I will show you how we can look into a Big Data Cluster’s Kubernetes infrastructure using Lens.

Click through for instructions on installation, as well as how to use the product.

Comments closed

Stopping and Starting an Azure Kubernetes Service Cluster

Mohammad Darab wants to save some cash (or at least Azure credits):

I remember when I first started deploying Big Data Clusters, they were on Azure Kubernetes Service utilizing the $200 credit for first time sign ups. By the time I got around to figuring out how to deploy the BDC, not only was my $200 credit gone, but I started to incur cost out of pocket.

If only there was a feature that would allow me to stop the VMs in AKS whenever I wasn’t using them. Well, I’m excited to share that Microsoft AKS (Azure Kubernetes Service) came out with a neat feature (currently in preview at the time of the publishing of this post) that allows you to stop and start your AKS cluster by running a simple command. Of course I had to try it out on BDCs and to my surprise it worked. Well, sort of. Let me explain…

Read on for more information, as well as current limitations.

Comments closed

OpenShift and SQL Server Big Data Clusters

Chris Adkin explains why support for OpenShift is important for SQL Server Big Data Clusters:

One thing that should become immediately apparent when installing and administering an OpenShift cluster, is that it is a lot more prescriptive and opinionated that vanilla Kubernetes. The simple reason for this is that OpenShift is intended to be deployed to environments that require enterprise grade levels of hardening and security. For example, Red Hat mandates the operating system distributions you must use, to the extent that when deploying a cluster on VMware – Red Hat’s documentation recommends the use of OVA’s, compressed files containing install-able virtual machines.

Read on for the full story.

Comments closed

Planning for a Big Data Cluster

Chris Adkin has started a series on SQL Server Big Data Clusters:

Proposing the idea of using virtual machines as Kubernetes cluster nodes to a Kubernetes purist is likely to be met with consternation. However, the different nodes in your cluster have different resource requirements. A master node can get away with as little as 2 GB of memory and 2 logical processors, worker nodes require much more resources. A best practice is never to run applications on master nodes in production. The view of the world from a Kubernetes purist, is that Kubernetes is designed to obviate the need for virtualization. Consider that you do go down the bare metal route, its unlikely that you are going to purchase blades or servers with 2 GB of memory and 2 CPU cores. At the very least consider the use of virtual machines to host master nodes on. For organizations that have standardized on a software defined virtualized infrastructure, Kubernetes will run perfectly happy on this. Also for the rapid provisioning of environments – virtualization provides the fastest means of doing this – simply create yourself a virtual machine template and base your cluster node hosts on this.

Click through for more guidance around what you need to know before you deploy a cluster.

Comments closed

Big Data Clusters and Fixed IP Addresses

Denny Cherry warns you about Big Data Clusters and keeping a particular IP address:

No problem, we just added in the correct IP range to the possible addresses for the vNet, added a new Subnet and moved the VMs over to the new subnet (which caused the VMs to reboot, but that was expected).

It turns on that BDC in SQL Server 2019 doesn’t like having the IPs changed for the aks nodes.  The problem stems from the fact that BDC is generating its certificates off of the IP address of the node, so if the IP address of the node changes (even if you are using DHCP for on-prem nodes and DHCP gives you a new IP address) your BDC won’t respond.

Read on for your three possible solutions.

Comments closed

Upgrading a SQL Server Big Data Cluster

Mohammad Darab shows how to upgrade an existing Big Data Cluster:

The above scenario was updating a Big Data Cluster from a supported release. Microsoft officially supports BDCs starting from SQL Server 2019 GDR1. But what if you have a previous version of BDCs, say CTP or release candidate? In that case you’ll have to backup any data you have, delete your cluster, uninstall azdata, install the updated azdata, and deploy your big data cluster anew. A little cumbersome but that’s how it is. In fact, no one should be running an unsupported release of Big Data Clusters anyway!

Click through for the instructions.

Comments closed