Visualizing A Correlation Matrix With corrplot

Kristian Larsen demonstrates the corrplot package in R:

First we need to read the packages into the R library. For descriptive statistics of the dataset we use the skimr package and for visualization of correlation matrix we use the corrplot package. We will work with windspeed dataset from the bReeze package:

# Read packages into R library

Click through for the demo.

Getting The Right R Version For Packages

Colin Gillespie shows a couple methods for figuring out the minimum version of R needed for a set of packages:

In R, there is a handy function called available.packages() that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the readr package

readr_desc = available.packages() %>% as_tibble() %>% filter(Package == "readr")

I immediately converted the data to a tibble, as that

  • changed the rownames to a proper column

  • changed the matrix to a data frame/tibble, which made selecting easier

There’s a good use of R functionality to delve into package requirements, as well as a script to try it out yourself.

What’s New With Docker For Windows Server 2019

Elton Stoneman walks us through several additions to Docker support on Windows Server 2019:

5. Volume mounts have usable directory paths

Docker volumes are how you separate storage from the lifecycle of your containers. You attach a volume to a container, and it surfaces as a directory in the container’s filesystem. Your app writes to C:\jenkins (or whatever path you mount) and the data actually gets stored in the volume, which could be storage on the Docker host – like a RAID array on the server – or a separate storage unit in the datacenter, or a cloud storage service.

The mount inside the container should be transparent to the app, but actually in Windows Server 2016 the implementation used symlink directories, and that caused a few problems.

Elton notes that Docker support on Windows is now approaching that of Linux, so check out some of the gaps that have been filled with the latest server release.

Reading Error Logs Outside Of SQL Server

Kenneth Fisher shows us where error logs are located and how to read them outside of SQL Server:

Quick and easy post today. Hopefully you’ve opened the error log on a SQL instance. However, what happens if you don’t have the log viewer in SSMS? In fact, the instance won’t come up at all so you really need to see what went wrong.

Fortunately, the error logs in SQL are just text files, even though they don’t actually have that extension. The trick is knowing where they are.

Read on for the answer.

Azure Data Studio October Release

Alan Yu announces the October release of Azure Data Studio:

As announced at Microsoft Ignite, one of the most exciting extensions to share in our September GA release was the release of the SQL Server 2019 Preview extension. If you were following the blog announcements, starting with SQL Server 2019 preview, SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.

These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark. SQL Server big data clusters allow you to easily combine and analyze your high-value relational data with high-volume big data. To learn about all the excitement of SQL Server Big Data Clusters, follow the documentation here.

These experiences are built as an extension to Azure Data Studio. We can go into full depth about all the great capabilities this extension includes, but deep-diving into any one of these features can be a full blog post itself. Here is a high-level summary of these features, and then you can see a full demo of the features below.

There’s plenty more in here as well.

Beautiful Deadlock Graphs And Tying RIDs Back To Object Names

Josh Simar shares a deadlock graph which I have entitled The Pit Of Despair:

I can’t make heads or tails of that but I can tell you that seems like a really bad brawl for resources. It’s like a Jerry Springer show with a few extras thrown in. Since I knew that my graph wasn’t going to be helpful in this instance I went to the actual xml and tried to figure out how I could tune this to make it better in the future. I needed to know exactly where the issue was so the waitresource pointer is a good place to start.

You will see many blog articles on how to find SQL wait resources when the resource type is a key, a page, or an object (I suggest Kendra Little’s blog post) There is however a noticeable glut on articles explaining RID (a RID is a key on a table with no clustered index). I finally found how to tie a RID to an actual resource name but it was used for corruption so the details were a bit hazy at first.

Click through for this work of database art as well as a script which links RIDs back to specific object names.

Fuzzy Matching In Power BI

Reza Rad looks at a preview feature in Power BI to perform fuzzy matching:

Fuzzy Merge is a way of joining two tables together, but not on exact matching criteria, but on the similarity threshold. If you want to learn what is the Merge operation itself and the difference of that with Append, read my blog post here. If you want to learn more details about what is Merge and the different types of join or merge, read my other blog post here. Merge or Join is simply the act of combining two tables with different structures, but with link/join columns, to access columns from one of the tables in the other one.

To use Merge operation on the “source” query, You can click on the Merge Queries as New option in the Home tab of Power Query Editor window.

This kind of functionality was in SQL Server Integration Services as well but suffered from a huge scaling problem, where the component worked pretty well with small numbers of records, but once you got into the 100K+ range, everything started to fall apart.  I’d be interested to see where that limit is in Power BI.

Understanding Query Optimizer Timeouts

Joseph Pilov answers frequently asked questions about SQL Server’s query optimizer when it times out:

What Is Optimizer Timeout?

SQL Server uses a cost-based query optimizer. Therefore, it selects a query plan with the lowest cost after it has built and examined multiple query plans. One of the objectives of the SQL Server query optimizer (QO) is to spend a “reasonable time” in query optimization as compared to query execution. Therefore, QO has a built-in threshold of tasks to consider before it stops the optimization process. If this threshold is reached before QO has considered most, if not all, possible plans then it has reached the Optimizer TimeOut limit. An event is reported in the query plan as Time Out under “Reason For Early Termination of Statement Optimization.” It’s important to understand that this threshold isn’t based on clock time but on number of possibilities considered. In current SQL QO versions, over a half million possibilities are considered before a time out is reached.

Optimizer timeout is designed in Microsoft SQL Server and in many cases encountering it is not a factor affecting query performance. However, in some cases the SQL query plan choice may be affected by optimizer timeout and thus performance could be impacted. When you encounter such issues, if you understand optimizer timeout mechanism and how complex queries can be affected in SQL Server, it can help you to better troubleshoot and improve your performance issue.

Read the whole thing.


October 2018
« Sep Nov »