Press "Enter" to skip to content

Category: Big Data Clusters

Big Data Clusters and Fixed IP Addresses

Denny Cherry warns you about Big Data Clusters and keeping a particular IP address:

No problem, we just added in the correct IP range to the possible addresses for the vNet, added a new Subnet and moved the VMs over to the new subnet (which caused the VMs to reboot, but that was expected).

It turns on that BDC in SQL Server 2019 doesn’t like having the IPs changed for the aks nodes.  The problem stems from the fact that BDC is generating its certificates off of the IP address of the node, so if the IP address of the node changes (even if you are using DHCP for on-prem nodes and DHCP gives you a new IP address) your BDC won’t respond.

Read on for your three possible solutions.

Comments closed

Upgrading a SQL Server Big Data Cluster

Mohammad Darab shows how to upgrade an existing Big Data Cluster:

The above scenario was updating a Big Data Cluster from a supported release. Microsoft officially supports BDCs starting from SQL Server 2019 GDR1. But what if you have a previous version of BDCs, say CTP or release candidate? In that case you’ll have to backup any data you have, delete your cluster, uninstall azdata, install the updated azdata, and deploy your big data cluster anew. A little cumbersome but that’s how it is. In fact, no one should be running an unsupported release of Big Data Clusters anyway!

Click through for the instructions.

Comments closed

Deploying a Big Data Cluster to a Multi-Node kubeadm Cluster

Mohammad Darab shows how we can deploy a SQL Server Big Data Cluster on a multi-node kubeadm cluster:

There are a few assumptions before we get started:

1. You have at least 3 virtual machines running with the minimum hardware requirements
2. All your virtual machines are running Ubuntu Server 16.04 and have OpenSSH installed
3. All the virtual machines have static IPs and on the same subnet
4. All the virtual machines are updated and have been rebooted

Mohammad shows how to set up the cluster, configure Kubernetes, and then install Big Data Clusters. Definitely worth the read if you’re interested in building a Big Data Cluster on-premises.

Comments closed

Deploy a Big Data Cluster to a Single-Node kubeadm Cluster

Mohammad Darab shows how to build out a single-node Big Data Cluster on-premises:

This blog post will walk you through deploying a SQL Server Big Data Cluster on a single node Kubernetes cluster. You can install a Big Data Cluster on a physical machine or a virtual machine. Whatever option you choose must have the below minimum requirements:

– 8 cpu
– 64 GB RAM
– 100 GB disk space

Read on for instructions, or check out Mohammad’s video on the topic.

Comments closed

Investigating the Big Data Cluster Data Pool

Mohammad Darab takes us through Big Data Cluster data pools:

Data pools enable the creation of scale-out data marts. Whether your data is being ingested from Spark jobs or SQL, it is stored into the data pool. Data is distributed across one, or two, SQL Server instances running queries against it is more efficient.

Whether the data is being ingested from IoT device, Kafka, another relational data source (like Oracle or Teradata), it all is stored into the data pool instances and are available as “data marts” for the consumer to work with. There is no need to go back out to the original data source each time you want to query the data. It is all available inside the data pool instances.

This lets you cache data brought in via PolyBase and spread it across a number of instances. That’s pretty powerful.

Comments closed

Deploying a Big Data Cluster with Azure Data Studio

Mohammad Darab shows how you can deploy a Big Data Cluster to Azure Kubernetes Service using Azure Data Studio:

A few months ago I posted a blog on deploying a BDC using the built-in ADS notebook. This blog post will go a bit deeper into deploying a Big Data Cluster on AKS (Azure Kubernetes Service) using Azure Data Studio (version 1.13.0). In addition, I’ll go over the pros and cons and dive deeper into the reasons why I recommend going with AKS for your Big Data Cluster deployments.

AKS does make it pretty easy. The toughest part for me was figuring out which instance types were supported—I tried a few which would save me money and they weren’t available. I do like that they added a check to view availability before completing the notebook; that wasn’t in the preview version.

Comments closed

Licensing for SQL Server Big Data Clusters

Mohammad Darab tackles the licensing question for Big Data Clusters:

One of the biggest questions I had when I first started diving into Big Data Clusters was, “What about licensing….how will that work?” With so many different instances running on the storage pool, data pool and compute pool nodes will licensing cost too much? The answer I got from Microsoft was that it will “be competitive”.

Well, with the general availability of SQL Server as of this week, Microsoft is making it way more financially attractive than I thought. Below is a summation of the SQL Server 2019 Licensing Guide for Big Data Clusters.

Click through for the explanation. It really is pretty simple, all things considered.

Comments closed

Running Big Data Clusters on VS Subscriptions

Kevin Chant has a few tips for people wanting to try out Big Data Clusters with their Visual Studio subscriptions to Azure:

In order to present the right results for various outcomes I attempted to deploy Big Data Clusters multiple times.

When I say multiple times, I mean the number of deployments easily went into double figures. Because I was testing deploying various virtual machine sizes in multiple regions.

Hence, I spent many hours testing and verifying the results in order to present them properly.

Read on to see Kevin’s notes and recommendations.

Comments closed

Using Azure Kubernetes Services for Big Data Clusters

Mohammad Darab explains why it’s a good idea to use Azure Kubernetes Service when building out a Big Data Cluster:

According to the Microsoft documentation, there are three ways to deploy a Big Data Cluster:

1. Minikube
2. Kubeadm
3. AKS

I’ll go into each and list the pros and cons.

Of course, if you have a great Kubernetes admin, on-prem is certainly a viable option, but AKS is definitely easier to get started with.

Comments closed