Press "Enter" to skip to content

Author: Kevin Feasel

Microsoft’s R Roadmap

David Smith has a review of Microsoft’s R roadmap, focusing on Azure:

The post references this guide to the machine learning services in Azure, along with their supported languages. Services that currently support R include Azure Machine Learning StudioSQL Server Microsoft Machine Learning ServiceMicrosoft Machine Learning ServerAzure Data Science Virtual MachineAzure Databricks, and more.

David links to this strategy post:

The R and Python programming languages are primary citizens for data science on the Azure AI Platform. These are the most common languages for performing data preparation, transformation, training and operationalization of machine learning models; the core components for one’s digital transformation leveraging AI. Yet they are fundamentally different in many aspects, directly affecting not only deployed solutions IT architectures but also but also corporate strategies for developer skills and product supportability.
 
This series of articles is designed help you understand the options your company and customers have to support and evolve their R strategy.

It’s good to see some of this out in the open for planning purposes.

Comments closed

Chaos Engineering and KubeInvaders

Andrew Pruski wants to play a game:

KubeInvaders allows you to play Space Invaders in order to kill pods in Kubernetes and watch new pods be created (this actually might be my favourite github repo of all time).

I demo SQL Server running in Kubernetes a lot so really wanted to get this working in my Azure Kubernetes Service cluster. Here’s how you get this up and running.

I got to see Andrew show it off at SQL Saturday Cork and it was as fun as you’d expect.

Comments closed

Drillthrough from Power BI to SSRS

Paul Turley shows how you can drill through from a Power BI dashboard into an SSRS report:

This recipe primarily involves Power BI report design techniques. I’m not going to get into the details of Power BI report design but will cover the basics with a partially-completed report to get you started. If you are less-experienced with Power BI you can use this as an example for future report projects.

The sample database and files will be available in the forthcoming book: SQL Server Reporting Services Paginated Report Recipes, 2nd Edition (working title).

These instructions are provided as an example but refer to files that will be available when the book is published. Please contact me in the comments with questions and feedback.

You can’t get the files just yet, but you can see what Paul does to get this working.

Comments closed

Debugging DAX Variables

Imke Feldmann has a lengthy Power Query script to help debug issues with DAX variables:

When you’re dealing with a beast like DAX you can use any help there is, right? So here I show you how you can debug DAX variables who contain tables or show the result of multiple variables at once. So you can easily compare them with each other to spot the reason for problems fast.

Please note, that currently only comma separated DAX code is supported.

Click through for a demo as well as a video.

Comments closed

MAXDOP Configuration on Installation

Randolph West notes a change with SQL Server 2019:

SQL Server 2019 is still in preview as I write this, but I wanted to point out a new feature that Microsoft has added to SQL Server Setup, on the Windows version.

On the Database Engine Configuration screen are two new tabs, called MaxDOP and Memory. These are both new configuration options for SQL Server 2019. This post will specifically look at the MaxDOP tab, and we’ll look at Memorynext week.

I like that they’re adding these things to initial setup; that makes it easier for people to remember that yeah, MAXDOP is important.

Comments closed

Using data.table to Add Aggregate Values to Data Frames

John Mount shows how you can combine := and by in the data.table package to add a new column with the results of an aggregation in R:

The “by” signals we are doing a per-group calculation, and the “:=” signals to land the results in the original data.table. This sort of window function is incredibly useful in computing things such as what fraction of a group’s mass is in each row.

It’s worth reading up on data.table if you aren’t familiar with the great things it can do.

Comments closed

Troubleshooting Kafka Listeners

Robin Moffatt has some tips for configuring listeners in Kafka:

Apache Kafka® is a distributed system. Data is read from and written to the leader for a given partition, which could be on any of the brokers in a cluster. When a client (producer/consumer) starts, it will request metadata about which broker is the leader for a partition—and it can do this from anybroker. The metadata returned will include the endpoints available for the Leader broker for that partition, and the client will then use those endpoints to connect to the broker to read/write data as required.

It’s these endpoints that cause people trouble. On a single machine, running bare metal (no VMs, no Docker), everything might be the hostname (or just localhost), and it’s easy. But once you move into more complex networking setups and multiple nodes, you have to pay more attention to it.

Click through for more tips.

Comments closed

Performing Row-Wise Operations with pmap

Sebastian Sauer shows how you can use pmap in the purrr library to perform row-wise aggregations:

Rowwwise operations are a quite frequent operations in data analysis. The R language environment is particularly strong in column wise operations. This is due to technical reasons, as data frames are internally built as column-by-column structures, hence column wise operations are simple, rowwise more difficult.

This post looks at some rather general way to comput rowwise statistics. Of course, numerous ways exist and there are quite a few tutorials around, notably by Jenny Bryant, and by Emil Hvitfeldt to name a few.

The ideal solution is to have your data be properly columnar, but if you’re in a pinch, it’s good to know that you can do this.

Comments closed

Unexpected Results with ANY Aggregate

Paul White points out a couple odd scenarios with the ANY aggregate in SQL Server:

The execution plan erroneously computes separate ANY aggregates for the c2 and c3 columns, ignoring nulls. Each aggregate independently returns the first non-null value it encounters, giving a result where the values for c2 and c3 come from different source rows. This is not what the original SQL query specification requested.

The same wrong result can be produced with or without the clustered index by adding an OPTION (HASH GROUP) hint to produce a plan with an Eager Hash Aggregate instead of a Stream Aggregate.

Click through for the scenarios. Paul has also reported the second scenario as a bug.

Comments closed

Joining Lists of Values in T-SQL

Jason Brimhall shows how you can build a list of values using the table value constructor and join to it:

The table value constructor is basically like a virtual table not too different from a CTE or a subquery (in that they are all virtual tables of sorts). The table value constructor however can be combined with either of those other types and is a set of row expressions that get put into this virtual table in a single DML statement.

It’s one of the nicer things SQL Server 2008 gave us.

Comments closed