Press "Enter" to skip to content

Curated SQL Posts

Monads and Monoids and Functors

Anmol Sarna explains the concept of a monad:

In functional programming, a monad is a design pattern that allows structuring programs generically while automating away boilerplate code needed by the program logic.

To simplify the above definition a bit more, We can think of monads as wrappers. You just take an object and wrap it with a monad.

Let’s just be clear on one thing: A Monad is not a class or a trait; Neither is it only dedicated to the Scala language. It is a concept related to functional programming.

This also includes a few examples in Scala.

Comments closed

CosmosDB Continuation Tokens

Hasan Savran walks us through the idea of a continuation token in CosmosDB:

In CosmosDB, TOP option is required and its default value is 100. You can change the default value by sending a different value using the request header “x-ms-max-item-count“. If you have 40000 rows in your Orders table, and run the same query in CosmosDB, you will get 100 rows(documents) rather than 40000 rows(documents). CosmosDB returns all kind of metadata with the data. You can find this metadata in the response headers. One of those responses is, “x-ms-continuation” and it is responsible to display the rest of the rows of your query. If you like to get the next set of results, you can take “x-ms-continuation” value from the response headers and attach it to your next request to get the next set of rows. CosmosDB SDK does this automatically for you. SDK checks for the x-ms-continuation value when you check HasMoreResults property. If this property is true, that means CosmosDB returned a continuation token.

I have fanciful notions of SQL Server offering something similar—think of a grid built from a query. Get the first 50 rows from the result set and store that off in tempdb somewhere, using the “continuation token” (which might just be the full name in tempdb) and auto-trashing after a certain amount of time.

Comments closed

Window Functions with IGNORE NULLs

Lukas Eder walks us through a bit of functionality I wish we had in SQL Server:

On each row, the VALUE column should either contain the actual value, or the “last_value” preceding the current row, ignoring all the nulls. Note that I specifically wrote this requirement using specific English language. We can now translate that sentence directly to SQL:

last_value (t.value) ignore nulls over (order by d.value_date)

Since we have added an ORDER BY clause to the window function, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW applies, which colloquially means “all the preceding rows”. (Technically, that’s not accurate. It means all rows with values less than or equal to the value of the current row – see Kim Berg Hansen’s comment)

Only a few database products have this and SQL Server is not one of them.

Comments closed

Aggregate Pushdown with GROUP BY

Paul White takes us through several performance improvements around aggregate pushdown:

SQL Server 2016 introduced serial batch mode processing and aggregate pushdown. When pushdown is successful, aggregation is performed within the Columnstore Scan operator itself, possibly operating directly on compressed data, and taking advantage of SIMD CPU instructions.

The performance improvements possible with aggregate pushdown can be very substantial. The documentation lists some of the conditions required to achieve pushdown, but there are cases where the lack of ‘locally aggregated rows’ cannot be fully explained from those details alone.

This article covers additional factors that affect aggregate pushdown for GROUP BY queries onlyScalar aggregate pushdown (aggregation without a GROUP BY clause), filter pushdown, and expression pushdown may be covered in a future post.

Read the whole thing.

Comments closed

Orphaned Users in SQL Server

Dave Bland walks us through one way to fix an orphaned user:

In my many years of working as a DBA, I have encountered many disabled logins.  However, I have never really encountered what looks to be a disabled database user account.  I didn’t even think it was possible to disable a user account in a SQL Server database.  I checked the user account properties just to makes sure I was correct.  Sure enough, no option to disable a user account. This finally turned out to be a simple case of looks can be deceiving.

You can also use the sp_change_users_login procedure to fix orphaned users.

Comments closed

Controlling Power BI Visual Visibility

Matt Allington shows how we can take one Power BI visual and use it to control the visibility status of another visual:

I have written a few articles in the past that toy with the ideas of changing visibility and text colour based on selection.  I started to wonder if it was possible to make a visual appear (or not) based on a selection from the user.  There is no out of the box way to do that today. It is possible to use bookmarks to show an hide an object, but the user must click a specific button to do this. I want the user to be able to interact with a report and see (or not see) a chart based on some valid selection across the report.  Microsoft is already working on building expression based formatting across the breadth of Power BI however as of now the only item you can change is the header in a chart.

Hopefully this gets better over time.

Comments closed

Processing Fixed-Width Files with Spark

Subhasish Guha shows how you can read a fixed-with file with Apache Spark:

A fixed width file is a very common flat file format when working with SAP, Mainframe, and Web Logs. Converting the data into a dataframe using metadata is always a challenge for Spark Developers. This particular article talks about all kinds of typical scenarios that a developer might face while working with a fixed witdth file. This solution is generic to any fixed width file and very easy to implement. This also takes care of the Tail Safe Stack as the RDD gets into the foldLeft operator.

It’s a little more complicated than with R, where stringr can handle fixed-width formats. But it’s not bad.

Comments closed

Sentiment Analysis with Spark on Qubole

Jonathan Day, et al, have a tutorial on using Qubole to build a sentiment analysis model:

This post covers the use of Qubole, Zeppelin, PySpark, and H2O PySparkling to develop a sentiment analysis model capable of providing real-time alerts on customer product reviews. In particular, this model allows users to monitor any natural language text (such as social media posts or Amazon reviews) and receive alerts when customers post extremely nice (high sentiment) or extremely negative (low sentiment) comments about their products.

In addition to introducing the frameworks used, we will also discuss the concepts of embedding spaces, sentiment analysis, deep neural networks, grid search, stop words, data visualization, and data preparation.

Click through for the demo.

Comments closed

Running Spark MLlib to Feed Power BI

Brad Llewellyn shows how you can take Spark MLlib results and feed them into Power BI:

MLlib is one of the primary extensions of Spark, along with Spark SQL, Spark Streaming and GraphX.  It is a machine learning framework built from the ground up to be massively scalable and operate within Spark.  This makes it an excellent choice for machine learning applications that need to crunch extremely large amounts of data.  You can read more about Spark MLlib here.

In order to leverage Spark MLlib, we obviously need a way to execute Spark code.  In our minds, there’s no better tool for this than Azure Databricks.  In the previous post, we covered the creation of an Azure Databricks environment.  We’re going to reuse that environment for this post as well.  We’ll also use the same dataset that we’ve been using, which contains information about individual customers.  This dataset was originally designed to predict Income based on a number of factors.  However, we left the income out of this dataset a few posts back for reasons that were important then.  So, we’re actually going to use this dataset to predict “Hours Per Week” instead.

Check it out. And Brad’s not joking when he says the resulting model is terrible. But that’s okay, because it was never about the model.

Comments closed