Author: Kevin Feasel

Data Modeling In Cassandra

Published 2018-10-25 by Kevin Feasel

Charmy Garg walks us through some of the basics of modeling tables in Cassandra:

Two basic goals in Cassandra which we should keep in mind:

Spread data evenly around the cluster – You want every node in the cluster to have roughly the same amount of data. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.
Minimize the number of partitions read – Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Why is this important? [Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.]

Charmy also has a couple of pitfalls that people used to the relational database model may hit.

Comments closed

Enhancements To Polybase In SQL Server 2019

Published 2018-10-25 by Kevin Feasel

Rajendra Gupta has a multi-part series on Polybase enhancements with SQL Server 2019. Part one covers installation of SQL Server 2019 and Azure Data Studio:

You need to install Oracle JRE 7 update 51 or higher to install Polybase. If it is not installed, you will get below error message while checking the rules for installation.

To fix this error, go to ‘Java SE Runtime Environment 8 Downloads‘ and download Java SE Runtime Environment 8u191E. Double click on the setup file to install it.

Part two shows us how to install Oracle Express Edition and query it via SQL Server:

As discussed, so far below are the requirements to access Oracle database using PolyBase with Azure Data Studio

SQL Server 2019 preview 4
Azure Data Studio with SQL Server 2019 extension
Oracle Data Source
Polybase services should be running with SQL Server database services

Part three is forthcoming, as Rajendra mentions at the end of part 2.

Comments closed

Tying The Power BI Threads Together

Published 2018-10-25 by Kevin Feasel

Eugene Meidinger has a corkboard with a bunch of pushpins connecting photographs and newspaper articles together with string:

Part of that announcement was them talking about the Common Data Service. When I first heard about CDS months ago, I was again confused. It sounded like some weird semantic layer for the data in Dynamics CRM. Maybe useful if your data lives in Dynamics 365, otherwise who the heck cares.

Oooooh boy was I wrong. Microsoft is aiming for something much, much more ambitious than an awkward pseudo-database layer for people who don’t like SQL. They are aiming for a common shape for all of your business data. They want to want to create a lingua franca for all of your business data, no matter where it is. Especially if it’s hiding in Salesforce.

Now, do I expect them to succeed? I’m not sure. I’ve learned the hard way that every business is a unique snowflake, even two business in exact same industry. But if anyone can do it, Microsoft has a good shot. They’ve been buying up CRM / ERP solutions for decades.

There’s some good stuff in here, including the realization that Power BI is not strictly intended for database developers.

Comments closed

Azure SQL Managed Instance Prerequisites

Published 2018-10-25 by Kevin Feasel

Frank Gill has started a series on Azure SQL Managed Instances and has two posts up already. First, an introduction:

The drawbacks of Azure SQL Database make it difficult to migrate existing applications, because of the number of application changes required. Azure SQL Database is designed to be used for new development in Azure and for multi-tenant environments, where each tenant requires their own copy of a database.

The benefits of SQL Server on an Azure VM make it much easier to migrate an existing application to Azure. However, the VMs underlying the application still have to be managed by the client. This fails to take advantage of the management of resources in Azure, and uses Azure as a VM host.

A third option, Azure SQL Managed Instance, was released at the beginning of October 2018. Managed Instance combines the best of the previous options. With Managed Instance, the infrastructure is fully managed and the majority of the SQL Server feature set is available. The full list of differences between a traditional install of SQL Server and Managed Instance can be found here. A number of the most dramatic differences are listed below.

Then a post covering pre-requisites:

Before creating an Azure SQL Managed Instance, a number of prerequisite resources must be provisioned. These are:

An Azure Virtual Network
A dedicated subnet for Managed Instances
A route table

It looks like this is part of a longer series Frank is building out, so stay tuned.

Comments closed

Creating A Panel For Slicers In Power BI

Published 2018-10-25 by Kevin Feasel

Matt Allington shows us how to create a collapsable panel in Power BI:

There is nothing worse than having a Power BI report that has 50% of the space taken up with slicers. When this happens, you only get half the page to visualise the actual data. But on the flip side, if you don’t have the slicers it can be harder for the report users to filter the data they want to see. Many users don’t like using the built in filter pane on the right hand side. All is not lost – there is a great way that you can have the best of both worlds by creating a collapsible slicer pane that you can show and hide on demand.

Now I didn’t invent this concept – I learnt it from looking at what others have done, such as Amanda Cofsky, Miguel Myers, Mike and Seth from http://powerbi.tips and also Adam and Patrick from GuyInACube. There are lots of great resources out there to learn tricks like this, so you should check those out.

You can see one simple interpretation of this solution below. The user can hide and collapse the slicer pane by using the arrow keys (#1 and #2 below).

Click through for the demo.

Comments closed

Visualizing A Correlation Matrix With corrplot

Published 2018-10-24 by Kevin Feasel

Kristian Larsen demonstrates the corrplot package in R:

First we need to read the packages into the R library. For descriptive statistics of the dataset we use the skimr package and for visualization of correlation matrix we use the corrplot package. We will work with windspeed dataset from the bReeze package:
# Read packages into R library
library(bReeze)
library(corrplot)
library(skimr)

Click through for the demo.

Comments closed

Getting The Right R Version For Packages

Published 2018-10-24 by Kevin Feasel

Colin Gillespie shows a couple methods for figuring out the minimum version of R needed for a set of packages:

In R, there is a handy function called available.packages() that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the readr package
readr_desc = available.packages() %>%
  as_tibble() %>%
  filter(Package == "readr")
I immediately converted the data to a tibble, as that

changed the rownames to a proper column
changed the matrix to a data frame/tibble, which made selecting easier

There’s a good use of R functionality to delve into package requirements, as well as a script to try it out yourself.

Comments closed

What’s New With Docker For Windows Server 2019

Published 2018-10-24 by Kevin Feasel

Elton Stoneman walks us through several additions to Docker support on Windows Server 2019:

5. Volume mounts have usable directory paths

Docker volumes are how you separate storage from the lifecycle of your containers. You attach a volume to a container, and it surfaces as a directory in the container’s filesystem. Your app writes to C:\jenkins (or whatever path you mount) and the data actually gets stored in the volume, which could be storage on the Docker host – like a RAID array on the server – or a separate storage unit in the datacenter, or a cloud storage service.

The mount inside the container should be transparent to the app, but actually in Windows Server 2016 the implementation used symlink directories, and that caused a few problems.

Elton notes that Docker support on Windows is now approaching that of Linux, so check out some of the gaps that have been filled with the latest server release.

Comments closed

Reading Error Logs Outside Of SQL Server

Published 2018-10-24 by Kevin Feasel

Kenneth Fisher shows us where error logs are located and how to read them outside of SQL Server:

Quick and easy post today. Hopefully you’ve opened the error log on a SQL instance. However, what happens if you don’t have the log viewer in SSMS? In fact, the instance won’t come up at all so you really need to see what went wrong.

Fortunately, the error logs in SQL are just text files, even though they don’t actually have that extension. The trick is knowing where they are.

Read on for the answer.

Comments closed

Azure Data Studio October Release

Published 2018-10-24 by Kevin Feasel

Alan Yu announces the October release of Azure Data Studio:

As announced at Microsoft Ignite, one of the most exciting extensions to share in our September GA release was the release of the SQL Server 2019 Preview extension. If you were following the blog announcements, starting with SQL Server 2019 preview, SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.

These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark. SQL Server big data clusters allow you to easily combine and analyze your high-value relational data with high-volume big data. To learn about all the excitement of SQL Server Big Data Clusters, follow the documentation here.

These experiences are built as an extension to Azure Data Studio. We can go into full depth about all the great capabilities this extension includes, but deep-diving into any one of these features can be a full blog post itself. Here is a high-level summary of these features, and then you can see a full demo of the features below.

There’s plenty more in here as well.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31