Curated SQL – Page 1065 – A Fine Slice Of SQL Server

Writing SQL Against Cosmos DB

Published 2019-08-09 by Kevin Feasel

Hasan Savran shows how to query Cosmos DB documents with SQL:

CosmosDB’s SQL API uses T-SQL like language to query the database. It looks like T-SQL language. You still need to use SELECT, FROM, WHERE clause to retrieve the data you are looking for. It looks similar but it works different because the data structure is different. CosmosDB is not a relational database that means you cannot join other containers.

Let’s start with SELECT. You can use * to select all properties but just like T-SQL you shouldn’t, pick only what you need. Charge of a Request Units depends on how much data your queries retrieves from containers. ORMs like to select everything and that’s one of the reasons DBAs don’t like them. I know couple of ORM supports CosmosDB. Think twice if you want to use an ORM with CosmosDB. Don’t let an ORM to generate a query for CosmosDB. You will literally pay for it!

Feasel’s Law in force, with a bonus of “ORMs are expensive.”

Comments closed

Guiding Your Users with Power BI

Published 2019-08-09 by Kevin Feasel

Marc Lelijveld continues a series on storytelling with Power BI:

Another thing to think about before building visuals, is the click path. In order to make sure we’re really telling the story, we need to get our click path in a logical order. This is most important for both, declarative and exploratory storytelling.
Maybe you want to provide all the information in your visuals right away. Sometimes I see things like this happening when people add a lot of visuals on only one page. In my experience it is better to make your report interactive and work with multiple pages. Spread the story you want to tell out over multiple pages. This is what Will Thompson also has done as well in this report for the Gartner Bakeoff in 2018.

This sort of planning helps you out in the long run.

Comments closed

Statistics and Multiple Single-Column Indexes

Published 2019-08-09 by Kevin Feasel

Erik Darling is fusing together queries like Dr. Frankenstein in his lab:

You may have noticed that both queries get pretty bad estimates. You might even be thinking about leaving me a comment to update stats.
The thing is that I created these indexes, which means they get stats built with a full scan, and it’s a demo database where nothing changes.
We just get unfortunate histograms, in this case. If I create very specific filtered statistics, both plans perform a key lookup.

I snipped this part because it’s so ingrained in DBAs that when performance is bad, updating statistics is a panacea.

Comments closed

Relating Nonclustered Indexes to the Clustered Index

Published 2019-08-09 by Kevin Feasel

Erin Stellato takes us through a bit of indexing strategy:

In the sp_SQLskills_helpindex output you’ll notice that OrderLineID is the only column in columns_in_tree, and both OrderLineID and OrderID_and_OrderLineID are in columns_in_leaf.
Remember that when you identify a nonclustered index as unique, it will not add the clustering key to the tree level, it just adds it to the leaf level. But if the nonclustered index is not declared as unique, then SQL Server adds the clustering key to the tree level to uniquely identify the row.

This post deserves a careful reading.

Comments closed

SSIS Design Preferences

Published 2019-08-09 by Kevin Feasel

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design:

– Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.
– Whatever you use to create tables, include primary keys, foreign keys, and indexes with your table definitions. Provide explicit constraint names to simplify database comparisons. You can disable your foreign keys, but they need to be there to provide that metadata.
– Separate your final dimensional/reporting tables from audit tables and staging tables. This can be done with separate schemas or even separate databases.

People have added some more thoughts in the comments as well.

Comments closed

Monetizing Performance Tuning in the Cloud

Published 2019-08-09 by Kevin Feasel

Emanuele Meazzo points out a benefit from being in the cloud—performance tuning has a direct monetary tie:

A single step up for a single DB can cost from 10% to 25% more than the previous sizing, which translates in 15K or more when dealing with higher specs databases; remember that this is for a single scale up, on a single DB, on a single year.
It’s easy to understand that if your database and/or queries are not well tuned because “scaling will take care of it eventually”, then, you’ll be secretly bleeding money.

We can quantify performance improvements in terms of resource savings, but tying that to the specific business benefits can be hard because that hardware is a capital expenditure and so much of tuning benefits are “we avoided this worse case” rather than “we actively scaled something down.”

Comments closed

Getting Max Column Value from Power Query

Published 2019-08-09 by Kevin Feasel

Chris Webb hits us with techniques to get the maximum value in a column using Power Query and M:

In this first part I’m going to set up the scenario and show you what I found out from my own experimentation. The really mind-blowing tricks shown to me by the ever-helpful Curt Hagenlocher of the Power Query dev team will be covered in part 2.
Let’s say you have a large csv file which contains a numeric column and you want to get the maximum value from that column. In this case I’m going to use the 2018 Price Paid data from the UK Land Registry available here. This csv file contains 1021215 rows, one for each property transaction in England and Wales in 2018; the second column in this file contains the the price paid for the property, so the aim here is to get the maximum price paid for all property transactions in 2018.

Read the whole thing.

Comments closed

KSQL UDFs

Published 2019-08-08 by Kevin Feasel

Mitch Seymour takes us through user-defined functions in Kafka’s flavor of SQL:

One of KSQL’s most powerful features is allowing users to build their own KSQL functions for processing real-time streams of data. These functions can be invoked on individual messages (user-defined functions or UDFs) or used to perform aggregations on groups of messages (user-defined aggregate functions or UDAFs).
The previous blog post How to Build a UDF and/or UDAF in KSQL 5.0 discussed some key steps for building and deploying a custom KSQL UDF/UDAF. Now with Confluent Platform 5.3.0, creating custom KSQL functions is even easier when you leverage Maven, a tool for building and managing dependencies in Java projects.

Read on to see just how easy it is.

Comments closed

Text Analysis from Google Sheets

Published 2019-08-08 by Kevin Feasel

Federico Pascual shows how you can use MonkeyLearn to perform text analysis (including sentiment analysis and categorization) from a Google Sheets spreadsheet:

Carrying out a customer survey, for example, can be useful to obtain crucial insights into the overall customer experience of your clients. But the data obtained from these surveys can be incredibly difficult to process, even after you’ve added all the results to a spreadsheet and especially if you receive a high volume of responses.
How do you process this information, then? Should you read the answers one by one? What if you want to know what people are saying about your brand on social media?

Click through for a demo.

Comments closed

Running Powershell Against your Azure SQL DBs

Published 2019-08-08 by Kevin Feasel

Joey D’Antoni has a script which finds all of your Azure SQL Database instances and runs a Powershell script against each one in turn:

This code is pretty simple and looks for an Azure SQL Server in each resource group, and then looks for the databases that aren’t master on each server. In this example I’m setting the storage account for Azure Threat Detection, but you could do anything you wanted in that last loop.

Click through for the script. Things like parallelism and error handling are exercises for the reader.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Curated SQL Posts