Press "Enter" to skip to content

Day: December 20, 2017

Hierarchical Clustering

Chaitanya Sagar explains hierarchical clustering with examples in R:

Hope now you have a better understanding of clustering algorithms than what you started with. We discussed about Divisive and Agglomerative clustering techniques and four linkage methods namely, Single, Complete, Average and Ward’s method. Next, we implemented the discussed techniques in R using a numeric dataset. Note that we didn’t have any categorical variable in the dataset we used. You need to treat the categorical variables in order to incorporate them into a clustering algorithm. Lastly, we discussed a couple of plots to visualise the clusters/groups formed. Note here that we have assumed value of ‘k’ (number of clusters) is known. However, this is not always the case. There are a number of heuristics and rules-of-thumb for picking number of clusters. A given heuristic will work better on some datasets than others. It’s best to take advantage of domain knowledge to help set the number of clusters, if that’s possible. Otherwise, try a variety of heuristics, and perhaps a few different values of k.

There’s a lot to pick out of this post, but you’re able to walk through it step by step.  H/T R-Bloggers

Comments closed

The Triumph Of Functional Programming

Amanda LeClair and Michael Facemire have a new report on functional programming:

The customer-facing software development world is outgrowing stateful, object-oriented (OO) development. The bar for great, intuitive customer experience has been raised by ambient, conversation-driven user interfaces, like through Amazon Alexa. Functional programming allows enterprises to take better advantage of compute power to deliver those experiences at scale; better flexibility for delivering the right output; and a more efficient way of delivering customer value. FP also reduces regression defects in code, simplifies code creation and maintenance, and allows for greater code reuse.

Just as object-oriented programming (OOP) emerged as the solution to the limitations of procedural programming at the dawn of the internet boom in the mid-’90s, FP is emerging as the solution to the limitations of OOP today. The shift is already underway– 53% of global developers reported that at least some teams in their companies are practicing functional programming and are planning to expand their usage.

Alexey Sommer notes that functional programming has been sneaking into C# bit by bit for well over a decade:

Retrospective

C# 1.0 Visual Studio 2002

C# 1.1 Visual Studio 2003 – #line, pragma, xml doc comments

C# 2.0 Visual Studio 2005 – Generics, Anonymous methods, iterators/yield, static classes

C# 3.0 Visual Studio 2008 – LINQ, Lambda Expressions, Implicit typing, Extension methods

C# 4.0 Visual Studio 2010 – dynamic, Optional parameters and named arguments

C# 5.0 Visual Studio 2012 – async/await, Caller Information, some breaking changes

C# 6.0 Visual Studio 2015 – Null-conditional operators, String Interpolation

C# 7.0 Visual Studio 2017 – Tuples, Pattern matching, Local functions

I strongly believe that if you are a database developer and need to pick up a non-SQL programming language, functional languages will be a lot easier for you to get than object-oriented languages.  Many of the principles line up much smoother with functional languages, as you can most clearly see with the relationship between Scala and Spark.

Comments closed

SQL Server Feedback In A Post-Connect World

Koen Verbeeck touts the new SQL Server feedback site:

After years of having to deal with Connect – the feedback platform of Microsoft – it is announced a successor has been found: feedback.azure.com. It’s not all about Azure, the link goes to the relevant portion of SQL Server. I’m glad for this change, as Connect could sometimes be a little … quirky. Especially the search function didn’t work properly. The new feedback site is based on UserVoice and it’s really easy to submit feedback. People submitting ideas for Power BI will be very familiar with the format. There are a couple of drawbacks:

  • You cannot specify many details (none to be exact, or you have to list them in the descriptions). OS version, SQL Server version, bitness, et cetera. On the other hand it makes the process of entering feedback a lot faster.

  • You cannot mark a feedback item as private so that only Microsoft can see it. This means it’s not exactly the place to dump your production data to show how a bug is bugging you (haha).

I’m not sure how much of an improvement this is, but at least it does serve the Power BI team well.

Comments closed

Reusing U-SQL Scripts

Matthew Hicks shows how to use Powershell to parameterize U-SQL scripts:

You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed.

When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining the variable in U-SQL.

Supported types include:

byte, sbyte, int, uint (or uint32), long, ulong (or uint64), float, double, decimal, short (or int16), ushort (or uint16), char, string, DateTime, bool, Guid, or byte[]

Read on for an example of the process.

Comments closed

Setting Up A Test Lab Domain Controller

David Fowler has a new series on building a test lab, starting with a domain controller:

One of the most useful tools to the DBA when we need to test new features, recreate a fault that we’ve seen in production or just want to see ‘what if…?’ is a test lab.

Some of you are going to be lucky enough to have a few servers kicking around or a chunk of the virtual environment that you can build a test lab in but not all of us do.  In this series of posts I’m going to look at how we can build up a fully functioning test lab consisting of a domain and clustered SQL Servers on our desktop PC.  Now, although I’m going to be building this environment on my desktop, the main steps will be the same if you’ve got separate hardware for this so may still be relevant.

So, in this series we’re going to build a virtual test lab that’s going to consist of a domain controller and a couple of SQL Servers in a Windows Failover Cluster, hosting an Availability Group.

Read on for a step-by-step guide using Virtualbox to build these VMs.

Comments closed

New SQL Operations Studio

Alan Yu announces the December release of SQL Operations Studio:

Download SQL Operations Studio and review the Release Notes to get started.

SQL Operations Studio is a data management tool that enables you to work with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux. To learn more, visit our GitHub.

While it’s downloading, check out the new bits and also Marek Masko’s guide on SQL Operations Studio:

If you find Go to Definition feature very useful, but you would like to have a more temporary view of objects source code, then no problem! SQL Ops Studio delivers second functionality which gives you the possibility to check object definition. This time, the source code is displayed in the same editor tab.

Marek’s post went up before the December release, so one big point (about not being able to see actual execution plans) is now fixed.

Comments closed

Error Handling In Powershell In SQL Agent Jobs

Ben Miller has a couple of tips when executing Powershell within SQL Agent jobs:

What you don’t see is the way you have the job step succeed or fail. When using most commands in modules, you may find that errors still cause the step to fail because of the way they report the failure (some kind of throw or a Stop condition outside your control). So if you want things to fail that normally would show red on the screen but things would continue, remember that the default ErrorAction is Continue, so even though you get an error, PowerShell will just continue.

Read on for more.

Comments closed

Fragmentation Can Affect Execution Plans

Jonathan Kehayias explains how index fragmentation can potentially affect execution plans:

Index fragmentation removal and prevention has long been a part of normal database maintenance operations, not only in SQL Server, but across many platforms. Index fragmentation affects performance for a lot of reasons, and most people talk about the effects of random small blocks of I/O that can happen physically to disk based storage as something to be avoided. The general concern around index fragmentation is that it affects the performance of scans through limiting the size of read-ahead I/Os. It’s based on this limited understanding of the problems that index fragmentation cause that some people have begun circulating the idea that index fragmentation doesn’t matter with Solid State Storage devices (SSDs) and that you can just ignore index fragmentation going forward.

However, that is not the case for a number of reasons. This article will explain and demonstrate one of those reasons: that index fragmentation can adversely impact execution plan choice for queries. This occurs because index fragmentation generally leads to an index having more pages (these extra pages come from page split operations, as described in this post on this site), and so the use of that index is deemed to have a higher cost by SQL Server’s query optimizer.

Let’s look at an example.

Check out the example, but definitely read the comments as there are some good conversations in there.

Comments closed