Press "Enter" to skip to content

Month: May 2019

Creating R Visuals in Power BI

Dave Mason takes us through showing an R-based visual in Power BI:

The R engine isn’t included with the installation of Power BI desktop. I won’t go into detail on this, so just know you’d need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you’ll see options to set the R home directory and the desired R IDE.

Click through for the demo.

Comments closed

Plan Cache Sizes

Erin Stellato shares information on plan cache sizes:

If you have an ad hoc workload, you’ll often hit the max number of entries before you hit the space limit, particularly if you have the optimize for ad hoc workloads server option enabled, which stores the plan stub for an ad hoc query on initial execution, rather than the entire plan (the plan stub consumes less space, so this is an attempt to reduce bloat).

Click through for the numbers and a couple of options you have around plan cache size.

Comments closed

Methods for Rewriting SQL Queries

Bert Wagner puts together a list of 12 techniques for tuning SQL queries:

6. DISTINCT with few unique values
Using the DISTINCT operator is not always the fastest way to return the unique values in a dataset. In particular, Paul White uses recursive CTEs to return distinct values on large datasets with relatively few unique values. This is a great example of solving a problem using a very creative solution.

Click through for the full list as well as a video demonstration.

Comments closed

More on Index Fragmentation

Tibor Karaszi revises and extends some remarks on index fragmentation:

In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.

That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?

Read on for a situation in which fragmentation does matter.

Comments closed

Event-Driven Microservices

Saeed Barghi gives us an overview of what event-driven microservices are:

Modern Microservices are all about making systems event-driven: instead of making remote requests and waiting for the response (services and components calling each other and tell each other what to do), we can send notifications to related microservices when an event occurs.

These events are facts about the business. For example, an ATM or online transaction, a new log entry, or a customer registering for a new mobile plan. They are the data points collected by organizations that make their datasets. The good thing is, we can store these events in the very same infrastructure that we use to broadcast them: Apache Kafka. The better thing is we can even process them in the same infrastructure with Stream Processing applications. This means our applications and systems are linked via this central data pipeline, that is capable of real time data broadcast and processing and all data sources are shared via this data pipeline.

Read the whole thing.

Comments closed

Cross-Validation in a Picture

Stephanie Glen shows us cross-validation in one picture:

Cross Validation explained in one simple picture. The method shown here is k-fold cross validation, where data is split into k folds (in this example, 5 folds). Blue balls represent training data; 1/k (i.e. 1/5) balls are held back for model testing.

Monte Carlo cross validation works the same way, except that the balls would be chosen with replacement. In other words, it would be possible for a ball to appear in more than one sample.

You’ll have to click through for the picture.

Comments closed

The structure() Function in R

Tomaz Kastrun takes us through the structure() function in R:

Structure() function is a simple, yet powerful function that describes a given object with given attributes. It is part of base R language library, so there is no need to load any additional library. And also, since the function was part of S-Language, it is in the base library from the earlier versions, making it backward or forward compatible.

Read on to see how you can create a matrix or data frame using this function and additional details you can save.

Comments closed

Self-Joins Versus Key Lookups

Erik Darling takes us through an interesting scenario:

Like most tricks, this has a specific use case, but can be quite effective when you spot it.

I’m going to assume you have a vague understanding of parameter sniffing with stored procedures going into this. If you don’t, the post may not make a lot of sense.

Or, heck, maybe it’ll give you a vague understanding of parameter sniffing in stored procedures.

This was new to me.

Comments closed

Power BI Performance Analyzer

Marco Russo takes us through the Power BI Performance Analyzer:

The Power BI Performance Analyzer is a feature included in the May 2019 release of Power BI Desktop that simplifies the way you can collect the DAX queries generated by Power BI. You can use DAX Studio to capture them (as described in Capturing Power BI queries using DAX Studio), but the Performance Analyzer integrated in Power BI is simpler and provides a few insights about the time consumed in other activities, such as the rendering time of any visuals.

You can enable the Power BI Performance Analyzer by clicking the Performance Analyzer checkbox in the View ribbon of Power BI Desktop.

Read the whole thing.

Comments closed