Press "Enter" to skip to content

Curated SQL Posts

“Let” Expressions In M

Chris Webb explains “let” expressions  in M:

In the M language a let expression consists of two sections. After the let comes a list of variables, each of which has a name and an expression associated with it. In the previous example there are three variables: step1, step2 and step3. Variables can refer to other variables; here, step3 refers to both step1 and step2. Variables can be used to store values of any type: numbers, text, dates, or even more complex types like records, lists or tables; here, all three variables return numbers. The Query Editor is usually clever enough to display these variables as steps in your query and so displays then in the Applied Steps pane on the right-hand side of the screen

It’s a look at one of the fundamentals of an interesting language.

Comments closed

Thread-Local Storage

Ewald Cress digs into fundamentals:

As a teaser for where this is heading, I’ll reframe the problem as classic SQL Server examples. Firstly, when a latch wait occurs somewhere in the bowels of a LatchBase subclass instance, how does that latch method know to track the wait against an instance of a Worker, or make it known to the world that it is holding up that Worker? And secondly, at a much higher abstraction level, when a task executes a user query and needs to access a table, how does the access methods code know what security principal to do security checks against? We are taking the first steps towards answering these questions here.

I enjoy Ewald’s explanations because when I’m done, I really feel like I have a clue of what’s going on.  It all fades away as soon as I look away from the screen, but that’s on me, not him.

Comments closed

History Of Apache Storm

Taylor Goetz gives a history of Storm up to release 1.0:

Storm was originally created by Nathan Marz while he was at Backtype (later acquired by Twitter) working on analytics products based on historical and real-time analysis of the Twitter firehose. Nathan envisioned Storm as a replacement for the real-time component that was based on a cumbersome and brittle system of distributed queues and workers. Storm introduced the concept of the “stream” as a distributed abstraction for data in motion, as well as a fault tolerance and reliability model that was difficult, if not impossible, to achieve with a traditional queues and workers architecture.

Nathan open sourced Storm to GitHub on September 19th, 2011 during his talk at Strange Loop, and it quickly became the most watched JVM project on GitHub. Production deployments soon followed, and the Storm development community rapidly expanded.

Storm is an exciting technology in that it’s a key driver in making Hadoop more than just a batch processing framework.

Comments closed

Machine Learning Skepticism

Julia Evans gives reasons to tamp down expectations with machine learning:

When explaining what machine learning is, I’m giving the example of predicting the country someone lives in from their first name. So John might be American and Johannes might be German.

In this case, it’s really easy to imagine what data you might want to do a good job at this — just get the first names and current countries of every person in the world! Then count up which countries Julias live in (Canada? The US? Germany?), pick the most likely one, and you’re done!

This is a super simple modelling process, but I think it’s a good illustration — if you don’t include any data from China when training your computer to recognize names, it’s not going to get any Chinese names right!

Machine learning projects are like any other development projects, with more complex algorithms.  There’s no magic and there’s a lot of perspiration (hopefully figuratively rather than literally) involved in getting a program which behaves correctly.

Comments closed

Template Replacement

Andy Mallon shows SSMS template replacements:

In the above example, there’s not much value-add by using the template replacement. It’s probably easier to just use @variables and highlight-replace.

The template replacement really shines when you have examples where you’d otherwise need to use dynamic SQL. If you have object names or database names that need replacement, this is a great answer. If you work in a multi-tenant hosting environment, and a client name is part of the DB name, this can make your life a lot easier.

Templates work great with auto-replace (a feature several third-party toolkits include).  My favorite auto-replace that I’ve created is “die” which asks for an schema and procedure name and generates the DROP PROCEDURE script.  Naturally, I also have diet (table), diev (view), and dief (function).

Comments closed

BACPACs In Azure

Mike Fal uses Powershell to import and export Azure SQL Database data:

What makes this black voodoo magic work? Is this some proprietary technique Microsoft has snuck in on us? Surprisingly, this is a bit of technology that have existed for sometime now as part of SQL Server Data Tools called BACPACs. A BACPAC is essentially a logical backup of a database, storing the schema and data as SQL statements.

This differs from a typical SQL Server backup, which stores your database pages directly in a binary format. Because of this, native backups are smaller and can be made/restored faster. However, they are more rigid, as you can only restore a native backup in specific scenarios. A logical backup, since it is a series of SQL statements, can be more flexible.

Mike’s going to follow up with a way to take advantage of this to migrate normal SQL Server databases, so that should be interesting as well.

Comments closed

Installing SQL Server R Services Packages

Julie Koesmarno shows how to install an R package on a SQL Server 2016 instance which has SQL Server R Services installed:

When you start playing with R in SQL Server, sooner or later you would need to install some packages, for example ggplot2. You may run into a problem that sounds like this “Error in library(“ggplot2”) : there is no package called ‘ggplot2’“.

The following script is used in the iris_demo.sql (SQLServer2016CTP3Samples\Advanced Analytics\iris_demo.sql), and would cause a missing library error if you don’t have the packages installed on SQL Server R Services yet.

Julie shows two methods, one a Good Idea and the other a Bad(?) Idea.

Comments closed

MDM Is Hard

Knut Juergensen gives an overview of Master Data Management:

The sad reality in many companies is that there is no MDM, or that it exists but is implemented and managed poorly. Often, this is due to lack of managerial-level understanding of its real value and, subsequently, a lack of investment.

I’ll recount some of the problems that we encountered with our MDM system, at least partly due to this lack of understanding and investment from management. Although the example is specific to engineering manufacturing, I know that similar fundamental flaws affect other MDM systems in other environments.

The primary master data in this case comprises the parts and products used in our assembly lines, which are provided and created by our in-house and external design engineers. A core issue with our MDM system is the source of this master data.

Knut gives a good explanation of what MDM is, how it works, and then an example of how it doesn’t work.  Read the whole thing.

Comments closed

Faster Extended Events Reader

The CSS SQL Server Engineers note that with SQL Server 2016, we’ll get faster Extended Events readers:

SQL Server 2016 improves the XEvent Linq reader scalability and performance.    The XEvent UI in SQL Server Management Studio uses the XEvent Linq reader to process the events for display.   Careful study of the XEvent Linq reader revealed opportunities for scalability and performance improvements.

I don’t know if this will push anyone in the direction of using Extended Events who isn’t already doing so, but I like the performance improvement here.

Comments closed

POCs As A Problem

Bill Vorhies argues that data science proofs of concept fall short of the mark:

If you do a quick read through of some of the Gartner or O’Reilly studies you’ll quickly see that a lack of executive sponsorship is one of the major barriers to adoption.  So isn’t the POC a good way to get the attention of the C-level?  Yes and no.

If as we described above it leads to the adoption of a series of stand alone ‘technology projects’, then no.  If it was really necessary to start with little firecracker POCs to demonstrate the explosive strategic value of becoming data-driven, then maybe so.

Here’s a simple change of mindset (borrowed from John Weathington referenced above) that instead of focusing on Proof of Concept, we should instead create projects to demonstrate Proof of Value.  By focusing on value we change the orientation so that any projects are aligned with value to the company.  In other words, they are aligned with the company’s strategic objectives.

This is an interesting argument which goes against my inclinations.  Check it out.

Comments closed