Press "Enter" to skip to content

Day: March 7, 2022

Building posexplode() in the Serverless SQL Pool

Jovan Popvic rides to the rescue with JSON:

The array cells are pivoted and returned as simple scalar columns. Now you can simply use WHERE or GROUP BY clauses to filter or summarize information by array element values. Another very useful piece of information might be the index of every element (generated as pos column).

Spark enables you to use the posexplode() function on every array cell. The posexplode() function will transform a single array element into a set of rows where each row represents one value in the array and the index of that array element. As a result, one row with the array containing three elements will be transformed into three rows containing scalar cells. This flattened/normalized representation is much easier for the analysis.

Once the array is flattened and normalized, you can easily analyze the data and find how much people knowing SQL or Java.

Read on to see how you can implement the equivalent of POSEXPLODE() using OPENJSON() in the Azure Synapse Analytics serverless SQL pool.

Comments closed

Writing Extended Events to InfluxDB

Gianluca Sartori’s speaking my language:

The TIG software stack (TelegrafInfluxDBGrafana) is a very powerful combination of software tools that can help you collect, store and analyze data that has a time attribute. In particular, InfluxDB is a time series database, built with sharding, partitioning and retention policies in mind. It is absolutely fantastic for storing telemetry data, like performance counters from SQL Server or other software products.

In order to store data in InfluxDB, you can use Telegraf, a data collection agent that takes care of extracting telemetry data from the object to observe and upload it to the InfluxDB database. Telegraf is built with the concept of plugins: each object to observe has its own plugin and it’s not surprising at all to find a specialized plugin for SQL Server.

Click through for more details and how to set it up.

Comments closed

Executing SQL Statements in Azure Data Factory

Abhishek Narain announces a pretty nice improvement to Azure Data Factory and Synapse Pipelines:

We are introducing a Script activity in pipelines that provide the ability to execute single or multiple SQL statements.  

Using the script activity, you can execute common operations with Data Manipulation Language (DML), and Data Definition Language (DDL). DML statements like SELECT, UPDATE, and INSERT let users retrieve, store, modify, delete, insert and update data in the database. DDL statements like CREATE, ALTER, and DROP allow a database manager to create, modify, and remove database objects such as tables, indexes, and users.

Be sure to read the limitations at the bottom, however.

Comments closed

Building a SQL Server Inventory via Powershell

Lee Markum wants to figure out where all the servers are:

You’re a data professional and you’ve been given the keys to a new SQL Server environment. You know you need to build a SQL Server inventory so you know what is in your environment, but how do you get that information?

One of the things I have talked about in other posts is how to create a SQL Server inventory. I’ve discussed using the MAP Toolkit and building your own inventory database using T-SQL. Today, we’ll see another way to generate a list of SQL Servers in your environment.

Click through for that method. In the past, I’ve used nmap (with permission, of course) to figure out all the SQL Server instances in my environment. Fun times.

Comments closed

Processing Data: Aggregate and Join Operators

Erik Darling continues a series on understanding plains. First up is aggregate operators:

Aggregates can be useful for all sorts of things in a query plan, and can show up in many different forms.

It would be tough to cover all of them in a single post, but what I’d like to do is help all you nice folks out there understand some of their finer points.

Then we have join operators:

Anyone who tells you there are only three types of joins in SQL Server isn’t your friend.

Okay, maybe that’s harsh. Maybe they’re just getting you prepared for the bonne promenade through all the many different faces a join can wear in your query plans.

Maybe they have a great explanation for Grace Hash Joins in their back pocket that they’re waiting to throw in your face like a bunch of glitter.

Do read both.

Comments closed

Running a SQL Assessment from Powershell

David Alcock performs a bit of assessment:

My last post covered the SQL Assessment extension in Azure Data Studio and in this article I’m going to look at running the SQL Assessment API using PowerShell instead. Whilst in ADS we can export assessment results to HTML or into SQL via a script the PowerShell option is a bit more flexible, we can export straight to a SQL table and run the assessment across multiple instances.

I’ll follow the PowerShell instructions from the SQL Assessment API web page; this is the code that will run the assessment on my localhost instance and it will also write the results to a table (Assessment.Results) in the SQLAssessmentDemo database, the -Force parameter will create the objects if they don’t already exist:

Read on for that code as well as a few tweaks you can make to make your life easier.

Comments closed

DirectQuery and SQL Query Limitations

Chris Webb lays out the limits:

A few days ago I was tagged on a thread on Twitter: my old pal Mim was upset to find that he couldn’t use a SQL query with a Common Table Expression as the source of a table in DirectQuery mode. He worked out why without my help but I thought that it was worth me writing up an explanation for other people who might be struggling with the same problem.

That’s a fairly frustrating limitation, even if you have control of the database you’re querying.

Comments closed