Press "Enter" to skip to content

Day: February 2, 2021

Reporting on Correlation Analysis in R

Petr Baranovskiy continues a series on correlation analysis using R:

This is the second part of the Correlation Analysis in R series. In this post, I will provide an overview of some of the packages and functions used to perform correlation analysis in R, and will then address reporting and visualizing correlations as text, tables, and correlation matrices in online and print publications.

Read the whole thing.

Comments closed

K-Means and K-Medoids Clustering

Niti Sharma explains two clustering algorithms:

K-means and k-medoids are methods used in partitional clustering algorithms whose functionality works based on specifying an initial number of groups or, more precisely, iteratively by reallocation of objects among groups.

The algorithm works by first segregating all the points into an already selected number of clusters. The process is carried out by measuring the distance between the point and the center of each cluster. And because k-means can function only in the Euclidean space, the functionality of the algorithm is limited. Despite the drawbacks or shortcomings of algorithm possesses, k-means is still one of the most powerful tools used in clustering. The applications can be seen widely used in multiple fields – physical sciences, natural language processing (NLP), and healthcare.

k-means is a fairly common algorithm, but you hear less about k-medoids—it’s the more robust alternative to k-means.

Comments closed

The Production-Readiness of Azure Synapse Analytics

Paul Andrew casts some harsh light:

While I completely share and actually like Microsoft’s vision of an analytics resource…

“that brings together data integration, enterprise data warehousing and big data analytics”

… the marketing, hype and technical implementation have resulted in a lot of confusion and disappointment.

So, to answer the title of this blog post directly. My opinion, as I write on 29th January 2021, is: NoAzure Synapse Analytics is not ready. Sorry Microsoft, but you’ve had long enough. I can’t hold back the questions and demands from customers anymore on why Synapse still isn’t included in my architecture diagrams.

Paul raises many good points, and the positive takeaway is that these are fixable issues. But as of today, they are definitely things you want to consider before jumping in.

Comments closed

Combining Azure Synapse Analytics and Azure Purview

Wolfgang Strasser shows how we can integrate Azure Synapse Analytics with Azure Purview:

In the past months I had the chance to play with and build solutions based on Azure Synapse Analytics and Azure Purview.

Azure Synapse (my Synapse blog entries) as the foundation for a solid platform to store, analyze and build data solutions and Azure Purview (my Purview blog posts) as the data governance and data catalog solution in Azure.

During the writing of my latest blog post (What’s new in Azure Synapse Analytics?), I found a very interesting entry in the update feature list: Azure Purview Integration.

Read on to see how.

Comments closed

Changing IP Addresses in an Availability Group

Sreekanth Bandarla is ready to make a change:

In this blog post, let’s see how to change all the IP addresses involved in a typical Always on Availability group configuration. In my setup, I have an AG with two replicas and a listener. See below to get an idea of my current environment on which I am going to change all the underlying IP addresses.

Click through for a step-by-step process, as well as a few things to remember.

Comments closed

Importing Graph Data into SQL Server

Louis Davidson takes us through an interesting problem:

The problem was, if I wanted to recreate this graph in data, I had to type in a bunch of SQL statements (something I generally enjoy to a certain point, but one of my sample files cover the geography of Disney World, and it would take a very long time to manually type that into a database as it took quite a while just to do one section of the park). 

So I went hunting for a tool to do this for me, but ended right back with yEd. The default file type when you save in yEd is GraphML, which is basically some pretty complex XML that was well beyond my capabilities using XML in SQL or Powershell. Realistically I don’t care that much about anything other than just the nodes and edges, and what I found was that you can save graphs in the tool a format named Trivial Graph Format (TGF).

Click through to see it in action.

Comments closed

Power Query Folding Indicators

Matthew Roche points out a nice addition to Power Query:

Because of the performance benefit that query folding provides, experienced query authors are typically very careful to ensure that their queries take advantage of the capabilities of their data sources, and that they fold as many operations as possible. But for less experienced query authors, telling what steps will fold and which will not has not always been simple…

Until now.

Read on for more information. I saw this for the first time in a recent presentation and was pleasantly surprised at how well it works.

Comments closed