Kevin Feasel – Page 861

On Pie Charts

Published 2020-05-15 by Kevin Feasel

Elizabeth Ricks dives into the properly-maligned pie chart:

I want to use a pie: how do I make a good one?
When using pie charts, here are a few quick design tips to make them more effective.

The best way to make a good pie chart is not to make a pie chart…

Elizabeth does a good job of being less dogmatic than I am on the topic while still explaining most of why I dislike pie charts so.

Comments closed

Choosing a Recommender

Published 2020-05-15 by Kevin Feasel

Tori Tompkins walks us through four methods for generating models for recommendation systems:

One of the most popular approaches to recommendation problems is Collaborative Filtering (CF).
This approach is based on the assumption that users who have agreed in the past, also tend to agree in the future. For example, if Alice likes Star Wars, Lord of the Rings and Harry Potter and Bob likes Stars Wars and Lord of the Rings too. Then it is likely that Bob would also enjoy Harry Potter. The collaborative in Collaborative Filtering is in relation to looking at the way that you multiple users interact with the same data and share the same commonalities.

Read on for a comparison of these four systems as well as a handy chart to help you figure out which system might work best for you.

Comments closed

Planning for a Big Data Cluster

Published 2020-05-15 by Kevin Feasel

Chris Adkin has started a series on SQL Server Big Data Clusters:

Proposing the idea of using virtual machines as Kubernetes cluster nodes to a Kubernetes purist is likely to be met with consternation. However, the different nodes in your cluster have different resource requirements. A master node can get away with as little as 2 GB of memory and 2 logical processors, worker nodes require much more resources. A best practice is never to run applications on master nodes in production. The view of the world from a Kubernetes purist, is that Kubernetes is designed to obviate the need for virtualization. Consider that you do go down the bare metal route, its unlikely that you are going to purchase blades or servers with 2 GB of memory and 2 CPU cores. At the very least consider the use of virtual machines to host master nodes on. For organizations that have standardized on a software defined virtualized infrastructure, Kubernetes will run perfectly happy on this. Also for the rapid provisioning of environments – virtualization provides the fastest means of doing this – simply create yourself a virtual machine template and base your cluster node hosts on this.

Click through for more guidance around what you need to know before you deploy a cluster.

Comments closed

Paging in Power Query

Published 2020-05-15 by Kevin Feasel

Imke Feldmann explains how you can perform paging with Power Query:

When you use an API with a paging mechanism (like the example from this blogpost), you’ll might work with a field that contains the address for the next page. You can use this to walk through the available chunks until you reach the last element. That last element in the pagination will not contain a next-field or that field will be null.

There is a bit of nuance to this, so click through and learn how.

Comments closed

Accessing Azure from an Android Device

Published 2020-05-15 by Kevin Feasel

Kenneth Fisher had a dream:

The other day I had a random thought.
I wonder if there is a version of Azure Portal for Android.
Turns out there is!

This looks most useful for quick observations of machines and services, such as after receiving an alert. But as Kenneth points out, hooking a phone up to a monitor, keyboard, and mouse (which, at least on newer Samsung models, is definitely an option) means that you can hit that cloud shell and do most of what you need.

Comments closed

When Arrow Widths Mislead

Published 2020-05-15 by Kevin Feasel

Hugo Kornelis provides one example of misleading arrow widths in execution plans:

In my previous post, I talked about the Actual Number of Rows and Estimated Number of Rows properties, their visual representation in execution plans, and the most important ways in which this can be used.
But life is not always perfect, and Microsoft likes to remind us of that. Sometimes, these properties report values in weird and confusing ways. Sometimes the arrow width, as the visual representation in the execution plan, misleads us.
Let’s look at one such case.

Read the whole thing and vote for Hugo’s Feedback item to improve this experience.

Comments closed

Creating Charts with Azure Data Studio

Published 2020-05-15 by Kevin Feasel

Rajendra Gupta walks us through chart creation with Azure Data Studio:

Usually, we use to extract the data from a SQL database, copy it in Microsoft excel and creates the required Chart from it. We can also use various tools such as SQL Server Reporting Service ( SSRS), Power BI to import data and create charts, visuals from it directly. These tools work fine; however, it requires additional steps to install these tools, have intermediate knowledge of it. You might require to do this with different data set, and every time, you cannot create a separate visual using Power BI or SSRS. In this type of requirement, the most common useful tool is Microsoft Excel. You can also use PowerShell, but it again requires you to have PowerShell script knowledge. You can go through the article How to create charts from SQL Server data using PowerShell to create charts from PowerShell.
In this article, we will explore creating charts from SQL Server data without exporting it to separate tools Microsoft Excel, SSRS or Power BI.

There is some nice functionality available for quick analysis, though I’m disappointed that I can’t choose which column(s) to include in the visual—it looks like it simply includes them all. SandDance does, though its style precludes certain types of visuals like line charts.

Comments closed

Internal Storage of Numeric Values

Published 2020-05-15 by Kevin Feasel

Randolph West continues a series on how SQL Server stores values:

As we know from before, integers are whole numbers, or numbers with no fractions (i.e. no decimal places). This is going to be in the test later, so pay attention. In other words, the numbers 0 through 9 are integers, but a floating point or decimal / numeric value is not an integer. As soon as you add decimal places, it stops being an integer even if the fraction equates to zero.
Inside the storage engine, integers are mostly signed values (they can have negative values), and each integer data type has a fixed size. The exception is TINYINT which only has positive values. Like many other data types, integer types are stored byte-reversed (known as little-endian).

Click through for some good information from Randolph.

Comments closed

Spark UDFs and Error Handling

Published 2020-05-14 by Kevin Feasel

Bipin Patwardhan takes us through an error-handling scenario when writing a Spark User-Defined Function:

A couple of weeks ago, at my work place, I wrote a metadata-driven data validation framework for Spark. After the initial euphoria of having created the framework in Scala/Spark and Python/Spark, I started reviewing the framework. During the review, I noted that the User Defined Functions (UDF) I had written were prone to throw an error in certain situations.
I then explored various options to make the UDFs fail-safe.

These are like any other code: you want it to be as robust to failure as you can get it (or at least robust enough at the margin).

Comments closed

Bot-Building with ksqlDB

Published 2020-05-14 by Kevin Feasel

Robin Moffatt has an interesting project for us:

But what if you didn’t need any datastore other than Kafka itself? What if you could ingest, filter, enrich, aggregate, and query data with just Kafka? With ksqlDB we can do just this, and I want to show you exactly how.
We’re going to build a simple system that captures Wi-Fi packets, processes them, and serves up on-demand information about the devices connecting to Wi-Fi. The “secret sauce” here is ksqlDB’s ability to build stateful aggregates that can be directly accessed using pull queries. This is going to power a very simple bot for the messaging platform Telegram, which takes a unique device name as input and returns statistics about its Wi-Fi probe activities to the user:

Click through for the tutorial.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Author: Kevin Feasel