Press "Enter" to skip to content

Category: Big Data Clusters

Deploying a Big Data Cluster with Azure Data Studio

Mohammad Darab shows how you can deploy a Big Data Cluster to Azure Kubernetes Service using Azure Data Studio:

A few months ago I posted a blog on deploying a BDC using the built-in ADS notebook. This blog post will go a bit deeper into deploying a Big Data Cluster on AKS (Azure Kubernetes Service) using Azure Data Studio (version 1.13.0). In addition, I’ll go over the pros and cons and dive deeper into the reasons why I recommend going with AKS for your Big Data Cluster deployments.

AKS does make it pretty easy. The toughest part for me was figuring out which instance types were supported—I tried a few which would save me money and they weren’t available. I do like that they added a check to view availability before completing the notebook; that wasn’t in the preview version.

Comments closed

Licensing for SQL Server Big Data Clusters

Mohammad Darab tackles the licensing question for Big Data Clusters:

One of the biggest questions I had when I first started diving into Big Data Clusters was, “What about licensing….how will that work?” With so many different instances running on the storage pool, data pool and compute pool nodes will licensing cost too much? The answer I got from Microsoft was that it will “be competitive”.

Well, with the general availability of SQL Server as of this week, Microsoft is making it way more financially attractive than I thought. Below is a summation of the SQL Server 2019 Licensing Guide for Big Data Clusters.

Click through for the explanation. It really is pretty simple, all things considered.

Comments closed

Running Big Data Clusters on VS Subscriptions

Kevin Chant has a few tips for people wanting to try out Big Data Clusters with their Visual Studio subscriptions to Azure:

In order to present the right results for various outcomes I attempted to deploy Big Data Clusters multiple times.

When I say multiple times, I mean the number of deployments easily went into double figures. Because I was testing deploying various virtual machine sizes in multiple regions.

Hence, I spent many hours testing and verifying the results in order to present them properly.

Read on to see Kevin’s notes and recommendations.

Comments closed

Using Azure Kubernetes Services for Big Data Clusters

Mohammad Darab explains why it’s a good idea to use Azure Kubernetes Service when building out a Big Data Cluster:

According to the Microsoft documentation, there are three ways to deploy a Big Data Cluster:

1. Minikube
2. Kubeadm
3. AKS

I’ll go into each and list the pros and cons.

Of course, if you have a great Kubernetes admin, on-prem is certainly a viable option, but AKS is definitely easier to get started with.

Comments closed

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook:

I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I believe), and you had to sign up, to get access to the “bits”. Well, you did not really get any “bits”; what you did get was access to Python deployment scripts.

Now, September 2019, the BDC is in public preview (you do not have to sign up), and it has reached Release Candidate (RC) status, RC 1. The install method has changed, or rather, in addition to installing via deployment scripts, you can now also install using Azure Data Studio deployment notebooks, and that is what this blog post is about.

Having gone through this myself, there’s quite a bit of reading involved in the setup, but they make the process pretty smooth. This also shows off one of the key benefits of notebooks: documentation and code together.

Comments closed

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code:

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current linerun selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

This is rather useful for developers, though I greatly prefer the Azure Data Studio notebook interface.

Comments closed

SQL Server 2019 RC 1.1

Amit Banerjee announces a minor numeric change and a big update to SQL Server 2019 RC1:

In continuation with our announcement of SQL Server 2019 release candidate last week, we’re announcing that the release candidate refresh for SQL Server 2019 is now available to download. The release candidate now includes bits for Big Data Clusters in SQL Server 2019 in this refresh.

Back in July, we announced the preview of Big Data Clusters in SQL Server 2019 and since then we’ve seen our customers actively bringing their big data analytical workloads to SQL Server 2019 to operationalize their AI and machine learning projects.

Read on for more.

Comments closed

“Big” Data

Buck Woody explains that “Big Data” is just data:

A few years ago it was all the rage to talk about “Big Data”. Lots of descriptions of “Big Data” popped up, including the “V’s” (Variety, Velocity, Volume, etc.) that proved very helpful. I even have my own definition:

Big Data is any data you can’t process
in the time you want
with the systems you have

This post is quite reasonable in its depiction of the problem. I extend it a bit further than that and talk about difficulty of processing the data. Nonetheless, read Buck’s full thoughts and check out the Big Data Clusters workshop.

Comments closed

Deploying a Big Data Cluster

Mohammad Darab takes us through the Big Data Cluster deployment process using Azure Data Studio:

I’ve been “playing around” with Big Data Clusters for some time now and CTP 3.2 is way ahead when it comes to streamlining the BDC deployment process. You can check out my 4-part series on deploying BDC on AKS to see how cumbersome the process used to be. New in CTP 3.2, you can deploy a BDC on AKS (an existing cluster OR a new cluster) using an Azure Data Studio notebook. Let’s see how.

Click through for instructions. It was rather smart of Microsoft to release the instructions as a notebook.

Comments closed

Azul Java in SQL Server 2019

Travis Wright announces support for Azul Systems’ Java distribution in SQL Server 2019:

In September 2018, Microsoft announced a new partnership with Azul Systems, a leading Java open source contributor and distributor. This partnership allows for all Azure customers to use Azul’s Zulu for Azure – Enterprise distribution of Java for free with support jointly provided by Microsoft and Azul. That’s right – supported for free.

Today, we are announcing that we have extended that partnership to cover SQL Server. Starting in the SQL Server 2019 community technology preview (CTP) 3.2 that was released today, we are including Azul System’s Zulu Embedded right out of the box for all scenarios where Java is used in SQL Server – in PolyBase, Apache Spark, Java extensibility, and more. There is no additional cost beyond what you pay for SQL Server.

This is interesting. We’ll have to see if the CTP 3.2 installation doesn’t ask for JDK 1.8 anymore and just installs the Azul Systems version.

Comments closed