Press "Enter" to skip to content

Month: April 2017

Case-Insensitive Power Query Sorts

Cedric Charlier points out a comaprisonCriteria on Table.Sort in Power Query:

Have you already tried to sort a table based on a text field? The result is usually a surprise for most people. M language has a specific implementation of the sort engine for text where upper case letters are always ordered before lower case letters. It means that Z is always before a. In the example (here under), Fishing Rod is sorted before Fishing net.

The classical trick to escape from this weird behavior is to create a new column containing the upper case version of the text that will be used to sort your table, then configure the sort operation on this newly created column. This is a two steps approach (Three steps, if you take into account the need to remove the new column). Nothing bad with this except that it obfuscates the code and I hate that.

Click through to learn a more elegant way of sorting.

Comments closed

T-SQL Tuesday Roundup

Koen Verbeeck has his roundup of this month’s T-SQL Tuesday:

I asked the SQL Server community to write about their experience/opinion about the changing world we live in and how it impacts their daily job. The response was overwhelming: we had 30 participants in this months blog party!

Here’s an overview of everyone who participated. Take your time to read their stories, as they are very insightful, interesting or just plain fun to read.

There’s a lot of reading this month, with a well above-average turnout.

Comments closed

CPU Hot-Add And NUMA

Frank Denneman discusses VMware NUMA behavior when you hot-add more CPUs:

But what happens when the VM is configured with less vCPUs than the core count of the physical CPU package and CPU Hot-Add is enabled? Will there be performance impact? And the answer is no. The VPD configured for the VM fits inside a NUMA node, and thus the CPU scheduler and the NUMA scheduler optimizes memory operations. It’s all about memory locality. Let’s make use of some application workload test to determine the behavior of the VMkernel CPU scheduling.

For this test, I’ve installed DVD Store 3.0 and ran some test loads on the MS-SQL server. To determine the baseline, I’ve logged in the ESXi host via an SSH session and executed the command: sched-stats -t numa-pnode. This command shows the CPU and memory configuration of each NUMA node in the system. This screenshot shows that the system is only running the ESXi operating system. Hardly any memory is consumed. TotalMem indicates the total amount of physical memory in the NUMA node in kb. FreeMem indicates the amount of free physical memory in the NUMA node in kb.

Interesting reading.

Comments closed

Logistic Regression With R

Raghavan Madabusi runs through a sample logistic regression:

Input Variables: These variables are called as predictors or independent variables.

  • Customer Demographics (Gender and Senior citizenship)
  • Billing Information (Monthly and Annual charges, Payment method)
  • Product Services (Multiple line, Online security, Streaming TV, Streaming Movies, and so on)
  • Customer relationship variables (Tenure and Contract period)

Output Variables: These variables are called as response or dependent variables. Since the output variable (Churn value) takes the binary form as “0” or “1”, it will be categorized under classification problem in the supervised machine learning.

One of the interesting things in this post was the use of missmap, which is part of Amelia.

Comments closed

Powershell Difficulties

Dave Mason shares some difficulties he has had grokking Powershell:

The developer in me thinks this is nuts. Run the same few lines of code twice, with no changes in between, and get different outputs? Madness!

Here’s another example. Nothing too complex here: I connect to an instance of SQL, SELECT CURRENT_TIMESTAMP, and show the returned value in the output window. (There’s a fixable issue here that I would go on to discover later. But hold that thought for now.)

Even when you’re conceptually familiar with a language, getting into the particular foibles of that language can expose all sorts of behavior which is strange to newcomers.

Comments closed

Pipeline Architecture With Kafka

Alexandra Wang describes how Pandora Media has used Apache Kafka for real-time ad serving using Kafka Connect:

Our ad server publishes billions of messages per day to Kafka. We soon realized that writing a proprietary Kafka consumer able to handle that amount of data with the desired offset management logic would be non-trivial, especially when requiring exactly-once-delivery semantics. We found that the Kafka Connect API paired with the HDFS connector developed by Confluent would be perfect for our use case.

We’ve also found it painful not having a central authority on data structures that can share their respective schemas across all services and applications. Without a central registry for message schemas, data serialization and deserialization for a variety of applications are troublesome and the pipeline is fragile when schema evolution happens. We found Schema Registry is a great solution for this problem.

To address the above two problems, we integrated the Kafka Connect API and Schema Registry into our Kafka-centered data pipeline.

Well worth reading, especially the difficulties that they’ve had during maintenance periods and in lower environments.

Comments closed

Community Localization For Crossplatform Tools

Mona Nasr and Andy Gonzalez are looking for tool translation support:

Community has completed the translations for VScode SQL Server extension for six languages: Brazilian, French, Japanese, Italian, Russian, and Spanish.

We still need help with other languages. If you know anyone with language expertise, refer them to the Team Page.

Your contributions are valuable and will help us improve the product in your languages. We hope to continue working with the community in future projects.

Hit up the Team Page link to learn more about how to contribute.

Comments closed

Linear Regression In SQL

Phil Factor shows how to generate a quick linear regression using SQL, Powershell, and Gnuplot:

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It’s good to know that this is possible, but I’d switch to R or Python long before.

Comments closed

Power BI Row-Level Security

Steve Hughes has some resources on implementing row-level security in Power BI:

Row level security is the ability to filter content based on a users role. There are two primary ways to implement row level security in Power BI – through Power BI or using SSAS. Power BI has the ability in the desktop to create roles based on DAX filters which affect what users see in the various assets in Power BI.

In order for this to work, you will need to deploy to a Workspace where users only have read permissions. If the members of the group associated to the Workspace have edit permissions, row level security in Power BI will be ignored.

Read on for more details as well as a set of how-to links.

Comments closed