Press "Enter" to skip to content

Curated SQL Posts

Animated Dot Plots In R

John MacKintosh shows how to create dot plots in ggplot2, and then he uses gganimate to turn it into an animated plot:

If you want to follow along (go on) then you should head over to Neil’s site, download the excel file and take a look at the “how to” guide on the same page. Existing R users are already likely to be shuddering at all the manual manipulation required.

For the first attempt, I followed Neil’s approach pretty closely, resulting in a lot of code to sort and group, although ggplot2 made the actual plotting much simpler. I shared my very first attempt ( produce with barely any ggplot2 code) which was quite good, but there were a few issues – the ins/ outs being coloured blue instead of grey, and overplotting of several points.

Click through for code and explanation.  H/T R-Bloggers

Comments closed

Power BI Dashboard Sharing

Reza Rad covers one method of sharing Power BI content with users:

What dashboard sharing as the name of it explains is based on a dashboard. You can only share a dashboard with this method, not a report. Consider that you have a dashboard like below screenshot, and you want to share it. There is a share link at the top right corner of the dashboard.

Dashboard sharing have very few options to set and is very simple to configure. You just need to add the email address of people whom you want to share this report. You can also write a message for them to know that this report is shared with them.

Click through for more information.  Note that this is a paid feature.

Comments closed

Splitting Large Files Out In SQL Server

Tracy Boggiano has a script which splits out large files in a filegroup into a smaller set of files:

The solution I offer allows you to break your files into any size you want by rebuilding your indexes. You will need some disk space for it to create the new files while it runs the process then it will drop the large file.  This will also take up some space in your transaction log so if you not running your transaction log backups frequently enough you could have a lot of disk space taken up by that so watch out for that. All the code can be downloaded from my github repository here.

Read on for an explanation of the entire process.

Comments closed

Using DATEADD Instead Of DATEDIFF

Michael J. Swart points out a bit of trickery with DATEDIFF:

I assumed that the DATEDIFF function I wrote worked this way: Subtract the two dates to get a timespan value and then return the number of seconds (rounded somehow) in that timespan.

But that’s not how it works. The docs for DATEDIFF say:

“Returns the count (signed integer) of the specified datepart boundaries crossed between the specified startdate and enddate.”

There’s no rounding involved. It just counts the ticks on the clock that are heard during a given timespan.

Read the whole thing.

Comments closed

Logging Perfmon Data

Raul Gonzalez has started a new series around getting perfmon data into SQL Server.  First up is logging perfmon counters:

Here you have the template I have used to create my Data Collector, you just need to write it down to a XML file and change some of the counters which are related to SQL Server.

When I say MSSQL$MSSQL2016, that is because this counter refer to a named instance called MSSQL2016. If that was the default instance, it’d be just “SQL Server”.

Example: <Counter>\SQL Server:Buffer Manager\Page life expectancy</Counter>

Once you adjust it, you’re good to go.

Click through for a sample data collector set and some instructions on logging counter values.

Comments closed

Hierarchical Clustering

Chaitanya Sagar explains hierarchical clustering with examples in R:

Hope now you have a better understanding of clustering algorithms than what you started with. We discussed about Divisive and Agglomerative clustering techniques and four linkage methods namely, Single, Complete, Average and Ward’s method. Next, we implemented the discussed techniques in R using a numeric dataset. Note that we didn’t have any categorical variable in the dataset we used. You need to treat the categorical variables in order to incorporate them into a clustering algorithm. Lastly, we discussed a couple of plots to visualise the clusters/groups formed. Note here that we have assumed value of ‘k’ (number of clusters) is known. However, this is not always the case. There are a number of heuristics and rules-of-thumb for picking number of clusters. A given heuristic will work better on some datasets than others. It’s best to take advantage of domain knowledge to help set the number of clusters, if that’s possible. Otherwise, try a variety of heuristics, and perhaps a few different values of k.

There’s a lot to pick out of this post, but you’re able to walk through it step by step.  H/T R-Bloggers

Comments closed

The Triumph Of Functional Programming

Amanda LeClair and Michael Facemire have a new report on functional programming:

The customer-facing software development world is outgrowing stateful, object-oriented (OO) development. The bar for great, intuitive customer experience has been raised by ambient, conversation-driven user interfaces, like through Amazon Alexa. Functional programming allows enterprises to take better advantage of compute power to deliver those experiences at scale; better flexibility for delivering the right output; and a more efficient way of delivering customer value. FP also reduces regression defects in code, simplifies code creation and maintenance, and allows for greater code reuse.

Just as object-oriented programming (OOP) emerged as the solution to the limitations of procedural programming at the dawn of the internet boom in the mid-’90s, FP is emerging as the solution to the limitations of OOP today. The shift is already underway– 53% of global developers reported that at least some teams in their companies are practicing functional programming and are planning to expand their usage.

Alexey Sommer notes that functional programming has been sneaking into C# bit by bit for well over a decade:

Retrospective

C# 1.0 Visual Studio 2002

C# 1.1 Visual Studio 2003 – #line, pragma, xml doc comments

C# 2.0 Visual Studio 2005 – Generics, Anonymous methods, iterators/yield, static classes

C# 3.0 Visual Studio 2008 – LINQ, Lambda Expressions, Implicit typing, Extension methods

C# 4.0 Visual Studio 2010 – dynamic, Optional parameters and named arguments

C# 5.0 Visual Studio 2012 – async/await, Caller Information, some breaking changes

C# 6.0 Visual Studio 2015 – Null-conditional operators, String Interpolation

C# 7.0 Visual Studio 2017 – Tuples, Pattern matching, Local functions

I strongly believe that if you are a database developer and need to pick up a non-SQL programming language, functional languages will be a lot easier for you to get than object-oriented languages.  Many of the principles line up much smoother with functional languages, as you can most clearly see with the relationship between Scala and Spark.

Comments closed

SQL Server Feedback In A Post-Connect World

Koen Verbeeck touts the new SQL Server feedback site:

After years of having to deal with Connect – the feedback platform of Microsoft – it is announced a successor has been found: feedback.azure.com. It’s not all about Azure, the link goes to the relevant portion of SQL Server. I’m glad for this change, as Connect could sometimes be a little … quirky. Especially the search function didn’t work properly. The new feedback site is based on UserVoice and it’s really easy to submit feedback. People submitting ideas for Power BI will be very familiar with the format. There are a couple of drawbacks:

  • You cannot specify many details (none to be exact, or you have to list them in the descriptions). OS version, SQL Server version, bitness, et cetera. On the other hand it makes the process of entering feedback a lot faster.

  • You cannot mark a feedback item as private so that only Microsoft can see it. This means it’s not exactly the place to dump your production data to show how a bug is bugging you (haha).

I’m not sure how much of an improvement this is, but at least it does serve the Power BI team well.

Comments closed