Press "Enter" to skip to content

Curated SQL Posts

Using mssql-cli

Prashanth Jayaram shows how to install and use the mssql-cli client:

Switching to the editor mode is pretty simple and straight-forward. At the bottom of the screen, we can see the help bar which guides us through the switching process between the available editor modes. The options available for instant switching are the multiline mode, activated by pressing F3, and the Emacs mode, activated by pressing the F4 button.

To run the multi-line query in the multi-line mode, append the query with a semicolon and then press the enter key to execute it.

Use the same keys as mentioned above to turn on and turn off the editor modes—F3 for the multi-line query mode and F4 for the EMACS mode.

If you’re big on command-line interfaces, you’ll probably enjoy this client.

Comments closed

Gaps And Islands With DAX

Philip Seamark answers one of the classic gaps and islands problems with DAX:

A recent post on the Power BI community website asked if it was possible to compress a group of numbers into text that described the sequential ranges contained within the numbers. This might be a group of values such as 1, 2, 3, 4, 7, 8, 9, 12, 13:  (note there are gaps) with the expected result grouping the numbers that run in a sequence together to produce text like “1-4, 7-9, 12-13”.  Essentially to identify gaps when creating the text.  This seemed like an interesting challenge and here is how I solved it using DAX.

Read on for the solution, which is conceptually very similar to the T-SQL solution but a bit different in implementation.

Comments closed

Creating Azure SQL Database Managed Instances Via ARM Templates

Jovan Popovic shows how us how to build a Managed Instance of Azure SQL Database using Powershell and an ARM template:

Values that you need to change in this request are:

  • name – name of your Azure SQL Managed Instance (don’t include domain).

  • properties/administratorLogin – SQL login that will be used to connect to the instance.

  • properties/subnetId – Azure identifier of the subnet where Azure SQL Managed Instance should be placed. Make sure that you properlyconfigure network for Azure SQL Managed Instance

  • location – one of the valid location for Azure data centers, for example: “westcentralus”

  • sku/name: GP_Gen4 or GP_Gen5

  • properties/vCores: Number of cores that should be assigned to your instance. Values can be 8, 16, or 24 if you select GP_Gen4 sku name, or 8, 16, 24, 32, or 40 if you select GP_Gen5.

  • properties/storageSizeInGB: Maximum storage space for your instance. It should be multiple of 32GB.

  • properties/licenceType: Choose BasePrice if you don’t have SQL Server on-premises licence that you want to use, or LicenceIncluded if you can have discount for your on-premises licence.

  • tags(optional) – optionally put some key:value pairs that you would use to categorize instance.

Click through for the template and a quick Powershell script which shows how to use the template.

Comments closed

The Elitist Shuffle And Recommenders

Rodrigo Agundez shows us a way of displaying fresh recommendations without retraining the recommender system:

Suppose you have 10,000 items in total that can be recommended to your user, you run the recommendation system over all the items and those 10,000 items get ranked in order of relevance of the content.

The application shows 5 items on the entry screen. The first time the user opens the application after the re-scoring process the top 5 ranked items are shown. It is decided that from now on (based on user control groups, investigation, AB testing, etc.) until the next re-scoring process the entry screen should not be the same every time and remain relevant for the user.

Based on an investigation from the data scientist it turns out that somewhat relevant items appear until item 100. Then the idea is to somehow shuffle those 100 items such that the top 5 items shown are still relevant but not the same.

Click through for an example in Python and how it compares favorably to a couple other shuffling algorithms.

Comments closed

Scaling Kafka With Consumer Groups

Suhita Goswami explains how to use consumer groups to scale processing from Apache Kafka:

Kafka builds on the publish-subscribe model with the advantages of a message queuing system. It achieves this with:

  • the use of consumer groups
  • message retention by brokers

When consumers join a group and subscribe to a topic, only one consumer from the group actually consumes each message from the topic. The messages are also retained by the brokers in their topic partitions, unlike traditional message queues.

Multiple consumer groups can read from the same set of topics, and at different times catering to different logical application domains. Thus, Kafka provides both the advantage of high scalability via consumers belonging to the same consumer group and the ability to serve multiple independent downstream applications simultaneously.

Consumer groups are a great solution to the problem of long-running consumers when items to process are independent and can run concurrently.

Comments closed

Selecting All Columns But One In Postgres

Lukas Eder shows off a BigQuery feature which you can partially implement in Postgres:

In BigQuery syntax, we could now simply write

SELECT * EXCEPT rk
FROM (...) t
WHERE rk = 1
ORDER BY first_name, last_name

Which is really quite convenient! We want to project everything, except this one column. But none of the more popular SQL databases support this syntax.

Luckily, in PostgreSQL, we can use a workaround: Nested records:

SELECT (a).*, (f).* -- Unnesting the records again
FROM (
  SELECT 
    a, -- Nesting the actor table
    f, -- Nesting the film table
    RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk
  FROM film f
  JOIN film_actor fa USING (film_id)
  JOIN actor a USING (actor_id)
) t
WHERE rk = 1
ORDER BY (a).first_name, (a).last_name;

Notice how we’re no longer projecting A.* and F.* inside of the derived table T, but instead, the entire table (record). In the outer query, we have to use some slightly different syntax to unnest the record again (e.g. (A).FIRST_NAME), and we’re done.

Read the whole thing.  Lukas has a workaround for SQL Server, but I’d really like to see SELECT * EXCEPT [something] be viable syntax.  This is something I’d want to use more for ad hoc diagnostic queries, but I have one scenario where most columns on a table are narrow but then I have a big VARBINARY(MAX) (for good reason, I promise) that I almost never want to see in diagnostic queries.  I use a third-party SSMS plugin to populate all the columns and remove the one I don’t want, but it’d be nice to specify the other way because it’s so much faster to type.

Comments closed

Deleting Analysis Services Tabular Partitions Using C#

Chris Koester shows how to delete SQL Server Analysis Services Tabular partitions using the .NET library:

This post shows how to delete SSAS Tabular partitions with C#. This is an improvement over a PowerShell script I previously wrote about, Delete SSAS Partitions with TMSL and PowerShell, because it is more dynamic and doesn’t require a manually-authored TMSL script.

Specifically, the code below deletes all partitions that contain a specified suffix, which is useful for quickly deleting all partitions used for development purposes. I’ve worked with SSAS models that required several development partitions in order to prevent the workspace database from growing too large. Deleting them manually after a deployment to a server would be tedious, and it would be easy to forget one and end up processing duplicate data.

Click through for the code.

Comments closed

mssql-cli Update

Alan Yu announces an update to mssql-cli:

GDPR compliance

As many of us are familiar with, GDPR is approaching and we made some updates. In the past, file history stored entire T-SQL queries. However, if the query contained any secrets or passwords, it wasn’t smart enough to scrub those out. This is no longer the case, and now file history has been updated to no longer store secrets or passwords.

In addition, we have added 24-hour rotation of UserID when we collect telemetry.

Read on for other improvements.

Comments closed