2018-02-20 – Curated SQL

Microsoft and Python data science platform vendor Anaconda have extended their partnership by adding the software giant’s code editor to the latest Anaconda distribution.

The addition of Microsoft’s Visual Studio Code (VS Code) expands its support for the latest release of the Python data science platform, Anaconda 5.1. The Python platform has attracted more than 4.5 million users running the programming language on Windows, Mac and Linux.

Along with editing and debugging features, the partners said the cross-platform code editor includes custom features for Anaconda users. For example, a Python extension customizes VS Code for the Python development environment.

Read on for more information.

Comments closed

Streaming ETL In Practice Using KSQL

Published 2018-02-20 by Kevin Feasel

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka:

So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from a transactional system built on Oracle, transform it, and stream it into Elasticsearch to land the results to, but your choice of datastore is up to you—with Kafka’s Connect API you can stream the data to almost anywhere! Using KSQL we’ll see how to filter streams of events in real-time from a database, how to join between events from two database tables, and how to create rolling aggregates on this data.

It’s a very useful example.

Comments closed

Automating HDF Cluster Deployment

Published 2018-02-20 by Kevin Feasel

Ali Bajwa has a how-to guide for automating HDF 3.1 cluster deployment on AWS:

The release of HDF 3.1 brings about a significant number of improvements in HDF: Apache Nifi 1.5, Kafka 1.0, plus the new NiFi registry. In addition, there were improvements to Storm, Streaming Analytics Manager, Schema Registry components. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. To quickly setup a single node setup, a prebuilt AMI is available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands.

Click through for the installation guide.

Comments closed

SSAS Query Analyzer

Published 2018-02-20 by Kevin Feasel

Chris Webb reviews Analysis Services Query Analyzer:

Last week a new, free tool for analysing the performance of MDX queries on SSAS Multidimensional was released: Analysis Services Query Analyzer. You can get all the details and download it here:

https://ssasqueryanalyzer.github.io/

…and here’s a post on LinkedIn by one of the authors, Francesco De Chirico, explaining why he decided to build it:

https://www.linkedin.com/pulse/asqa-10-released-francesco-de-chirico/

I’ve played around with it a bit and I’m very impressed – it’s a really sophisticated and powerful tool, and one I’m going to spend some time learning because I’m sure it will be very useful to me.

Read on for the rest of Chris’s review, including product screenshots.

Comments closed

Installing Jupyter Notebook Kernels

Published 2018-02-20 by Kevin Feasel

Nigel Meakins continues his Jupyter series by showing how to install various kernels:

Jupyter-Scala

This can be downloaded from here. Unzip and run the jupyter-scala.ps1 script on windows using elevated permissions in order to install.

The kernel files will end up in <UserProfileDir>\AppData\Roaming\jupyter\kernels\scala-develop and the kernel will appear in Jupyter with the default name of ‘Scala (develop)’. You can of course change this in the respective kernel.json file.

Click through to see how to install a few other kernels with various levels of configuration.

Comments closed

Changing Int To Bigint

Published 2018-02-20 by Kevin Feasel

Danny Kruge shows one way to change a table’s identity value from integer to bigint:

The table was around 500GB with over 900 million rows. Based on the average number of inserts a day on that table, I estimated that we had eight months before inserts on that table would grind to a halt. This was an order entry table, subject to round-the-clock inserts due to customer activity. Any downtime to make the conversion to BIGINT was going to have to be minimal.

This article describes how I planned and executed a change from an INT to a BIGINT data type, replicating the process I used in a step by step guide for the AdventureWorks database. The technique creates a new copy of the table, with a BIGINT datatype, on a separate SQL Server instance, then uses object level recovery to move it into the production database.

There’s a way to do this without any downtime, though the trigger logic gets a little more complex and it does take longer.

Comments closed

Looking Up Managers In AD Using Powershell

Published 2018-02-20 by Kevin Feasel

Jana Sattainathan shows how to use Powershell to look up a group of Active Directory users’ managers:

Today, I received a request to find the manager for a whole bunch of users. This was a list of names (not UserId’s) in a Excel worksheet.

It is not actually that complex to do it

Locate the AD user based on the name
Check the Manager property
Lookup AD again for Manager to get the name

Click through for the script. This does, of course, assume that the information is already in Active Directory somewhere.

Comments closed

Changing The Default Filegroup

Published 2018-02-20 by Kevin Feasel

Kenneth Fisher shows how you can change the default filegroup:

You know you can have multiple filegroups right? You might have a separate filegroup for the data (the clustered index & heaps) and another for the indexes (non-clustered indexes). Or maybe you want to separate your data tables from the system tables. There are any number of reasons why you might want to have multiple filegroups, however, there will always be a primary filegroup and it will always be the default if you don’t specify otherwise. Right? Wrong.

I’ve never seen a way to remove primary or to move the system objects in it. However, you can change the primary filegroup.

Having a separate filegroup for your tables and another for indexes (or splitting things up some other way) can help get a database back online faster, as you can restore the system tables first and then restore filegroups as needed.

Comments closed

Using Group-Managed Service Accounts With SQL Server

Published 2018-02-20 by Kevin Feasel

Wayne Sheffield has a post on using gMSA with SQL Server:

A gMSA is a sMSA [standalone managed service account] that can be used across multiple devices, and where the Active Directory (AD) controls the password. PowerShell is used to configure a gMSA on the AD. The specific computers that it is allowed to be used on is configured using some more PowerShell commands. The AD will automatically update the password for the gMSA at the specified interval – without requiring a restart of the service! Because the AD automatically manages the password, nobody knows what the password is.

Not all services support a gMSA – but SQL Server does. During a SQL Server installation you can specify the gMSA account. The SQL Server Configuration Manager (SSCM) tool can be used to change an existing SQL Server instance to use a gMSA. After entering the gMSA account you simply do not enter a password. The server automatically retrieves the password from the AD.

This is a nice way of improving service account security in a scenario where, for example, you can’t or don’t want to use virtual service accounts.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28

Day: February 20, 2018

Visual Studio Code In Anaconda 5.1

Streaming ETL In Practice Using KSQL

Automating HDF Cluster Deployment

SSAS Query Analyzer

Installing Jupyter Notebook Kernels

Jupyter-Scala

Changing Int To Bigint

Looking Up Managers In AD Using Powershell

Changing The Default Filegroup

Using Group-Managed Service Accounts With SQL Server