Press "Enter" to skip to content

Month: November 2018

Creating A SQL Server 2019 Big Data Cluster On Azure

Niels Berglund walks us through the setup for SQL Server 2019 Big Data Clusters:

If you, like me, are a SQL Server guy, you are probably quite familiar with installing SQL Server instances by mounting an ISO file, and running setup. Well, you can forget all that when you deploy a SQL Server 2019 Big Data Cluster. The setup is all done via Python utilities, and various Docker images pulled from a private repository. So, you need Python3. On my box I have Python 3.5, and – according to Microsoft – version 3.7 also works. Make you that you have your Python installation on the path.

When you deploy you use a Python utility: mssqlctl. To download mssqlctl, you need Python’s package management system pip installed. During installation you also need a Python HTTP library: Requests. If you do not have it you need to install it:

python -m pip install requests

This isn’t available to the general public quite yet, but when it is publicly available (or if you are part of the Early Access Program), the instructions are nice and clear.

Comments closed

Installing Kubernetes On-Prem

Anthony Nocentino shows us how to install Kubernetes on-prem:

Kubernetes is a distributed system, you will be creating a cluster which will have a master node that is in charge of all operations in your cluster. In this walkthrough we’ll create three workers which will run our applications. This cluster topology is, by no means, production ready. If you’re looking for production cluster builds check out Kubernetes documentation. Here and here. The primary components that need high availability in a Kubernetes cluster are the API Server which controls the state of the cluster and the etcddatabase which stores the persistent state of the cluster. You can learn more about Kubernetes cluster components here.

In our demonstration here, the master is where the API Server, etcd, and the other control plan functions will live. The workers, will be joined to the cluster and run our application workloads.

This is an area I need to focus on, given my almost total lack of knowledge in the world of container orchestration.

Comments closed

Views And Derived Tables In SQL Server 2019 Graph

Shreya Verma shows examples of using views and derived tables in SQL Server 2019’s graph database functionality:

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of those features that is now available for public preview in Azure SQL Database and SQL Server 2019 CTP2.1: use of derived tables and views on graph tables in MATCH queries.

Graph queries on Azure SQL Database now support using view and derived table aliases in the MATCH syntax. To use these aliases in MATCH, the views and derived tables must be created either on a node or edge table which may or may not have some filters on it or a set of node or edge tables combined together using the UNION ALL operator. The ability to use derived table and view aliases in MATCH queries, could be very useful in scenarios where you are looking to query heterogeneous entities or heterogeneous connections between two or more entities in your graph.

It’s good to see the product team expand on what they released in 2017, getting the graph product closer to production-quality.

Comments closed

Kaggle-Maintained Data

Noah Daniels announces Maintained by Kaggle data sets:

The “Maintained by Kaggle” badge means that Kaggle is now and will continue to actively maintain that dataset. This includes regular updates to descriptions and metadata, quicker response rates in discussion, and accurate current data from the source. Our goal is to create seamless workflows that allow everyone to do data science on Kaggle and be confident in the data they work with.

They have several data sets available from different open data projects for several cities, as well as NOAA and the World Bank.  If you’re looking for data sets to play with, this is a good option.

Comments closed

Faster Scalar Functions In SQL Server 2019

Brent Ozar looks at improvements the SQL Server team has made to scalar functions in 2019:

My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different:

  • Runtime: 4 seconds

  • CPU time: 4 seconds

  • Logical reads: 3,247,991 (which still sounds bad, but bear with me)

My bias tells me that I still want to avoid scalar functions, but it’s no longer the automatic deal-killer it once was.

Comments closed

The Basics Of Kubernetes

Chris Adkin gives us a rundown on Kubernetes:

With the announcement of SQL Server 2019 big data clusters at Ignite, Kubernetes (often abbreviated to K8s) now stands front and center as part of Microsoft’s data platform vision. The obvious inference being that this is something that the Microsoft data platform community is going to show an increased interest in. The post aims to provide some context around:

  • why container orchestration is required

  • how Kubernetes is architected

  • the basics of working with Kubernetes

  • and why embracing open source software should be approached in an eyes wide open manner

Kubernetes is another technology which is useful to learn and can be helpful down the line.

Comments closed

The Table Spool Operator In SQL Server

Hugo Kornelis digs into table spools:

The Table Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in a worktable (in tempdb) and can then later return extra copies of these rows without having to call its child operators to produce them again. These copies can be made available in the same part of the execution plans, or in another part.

Table Spool is probably the most basic of the spool operators. The Index Spool operator is very similar to it, but indexes its data to allow it to return only a subset of the stored rows. The Row Count Spool operator is optimized for specific cases where the rows to be returned are empty. And the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Typical use cases of a Table Spool are: to reproduce the same input multiple times without having to re-execute its child nodes (e.g. in the inner input of a Nested Loops); to make the same input available in multiple branches of an execution plan (e.g. in wide update plans); or to ensure that an original copy of the data is available after an insert, update, or delete operator changes the base data (“Halloween protection”).

Click through for a great deal more detail.

Comments closed

Accelerated Database Recovery In SQL Server 2019

Frank Gill notes an exciting new feature in SQL Server 2019:

“Any sufficiently advanced technology is indistinguishable from magic.” -Arthur C. Clarke

In this morning’s keynote session at PASS Summit 2018, public preview of a new feature in Azure SQL Database and SQL Server 2019 called Accelerated Database Recovery (ADR) was announced.  This changes the way that SQL Server handles recovery of a SQL Server instance on start up.

This looks really good for large databases, where recovery can sometimes be measured in hours.

Comments closed

Azure Data Studio November Release

Alan Yu announces this month’s Azure Data Studio update:

In November’s version of the monthly release blog, the emphasis was on fixing customer issues and adding and improving existing extensions.

This includes:

Read on for the details.  This product is getting closer and closer to a state where it can be a daily driver.

Comments closed

Explaining Neural Networks With H2O

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide:

Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net to learn linearrelationships between data. In order to be able to learn (you can also say approximate) any mathematical problem – no matter how complex – we use activation functions. Activation functions normalize the output of a neuron, e.g. to values between -1 and 1, (Tanh), 0 and 1 (Sigmoid) or by setting negative values to 0 (Rectified Linear Units, ReLU). In H2O we can choose between Tanh, Tanh with Dropout, Rectifier (default), Rectifier with Dropout, Maxout and Maxout with Dropout. Let’s choose Rectifier with Dropout. Dropout is used to improve the generalizability of neural nets by randomly setting a given proportion of nodes to 0. The dropout rate in H2O is specified with two arguments: hidden_dropout_ratios, which per default sets 50% of hidden (more on that in a minute) nodes to 0. Here, I want to reduce that proportion to 20% but let’s talk about hidden layers and hidden nodes first. In addition to hidden dropout, H2O let’s us specify a dropout for the input layer with input_dropout_ratio. This argument is deactivated by default and this is how we will leave it.

Read the whole thing and, if you understand German, check out the video as well.

Comments closed