Press "Enter" to skip to content

Curated SQL Posts

Editing Camera Image Metadata with R

Neil Saunders has some trail cameras:

The camera model I chose is the Campark T85 which for me, had the right combination of features and price point. One useful feature is the ability to transfer images and video to a phone wirelessly (albeit through a rather clunky phone app). Unfortunately, images retrieved in this way have one major flaw: an almost-complete absence of metadata. There is no GPS in the camera of course, but the EXIF data does not include the date/time of the image, nor the camera make.

With a little research, I found a way to add this information to the images later using R and some additional software named exiftool. Here’s how I did it.

Read on to see how Neil solved this issue with a bit of R.

Comments closed

Good Practices for Power BI Development

Reza Rad shares some thoughts with us:

DAX is the language of writing calculations in Power BI. We use DAX to write calculations such as year-over-year change and percentage, or percentage of the total or rank of customers by their yearly revenue. Writing calculations in DAX takes time, and you may likely need to re-use a calculation in multiple reports.

Creating copies of the PBIX file every time for reusing the calculation is not ideal. The better approach is to have a shared dataset created by DAX calculations and then create thin reports with live connections to the shared Power BI dataset. Using a shared dataset ensures that all the reports are using the same DAX calculations. If a change is needed, it is only needed in the shared dataset. Maintaining a solution like this would be much easier.

Click through for a dozen or so recommendations.

Comments closed

Monitoring the Serverless SQL Pool via Log Analytics

Sidney Cirqueira shows how to monitor SQL requests in Azure Synapse Analytics:

Today I would like to share a scenario that I was working on one of my serverless SQL Pool support cases. The customer asked for an advice on how to monitor serverless SQL requests by using log analytics.

The intention of this guide is to help you with choosing the configuration required to easily setup the Synapse Analytics Workspace monitoring and all other considerations about how to monitor serverless SQL requests with Azure Monitor. Spoiler: At the end of this article, I will share the latest version of the serverless workbook posted on the Azure_Synapse_Tool_Box. This includes a really cool way to see query execution information.

Read on for that and definitely check out the Azure Synapse Toolbox if you’re a Synapse user.

Comments closed

Find When a Table was Dropped

Andrea Allred does some sleuthing:

Say you have a user come to you and they dropped a table sometime yesterday, but they don’t remember when and now they need it back. You could start the restore process and roll through logs until you see the drop and then restore to the hour before or you could run this super cool query to get the time the table was dropped.

Click through to see when and even which user did it—assuming you don’t have everybody going through a connection pooled account.

Comments closed

Enabling SSH Remoting with Powershell 7

Patrick Gruenauer makes a remote connection:

With Powershell 7, we can enable and use SSH Remoting. Unlike WinRM, SSH is more popular because it is a more familiar technology and is also available on other platforms. In this post I will show you how to easily enable SSH on PowerShell 7 with a few lines of code.

Click through to see what you need to install on the machine and how to configure everything.

Comments closed

Contrasting INSERT INTO and SELECT INTO

Chad Callihan embraces the power of AND:

Data can be inserted into one temp table from another a couple of ways. There is the INSERT INTO option and the SELECT INTO option.

Are you devoted to one option over the other? Maybe you’re used to one and never experimented with the other. Let’s test each and compare performance to find out which is more efficient.

Both of these are useful, though Chad does mention a performance improvement with SELECT INTO. I tend to prefer INSERT INTO for “structured” scenarios because it lets me define the shape of the output table. When I don’t care what the shape is—for example, when I just need some data one time to perform an analysis—then I prefer SELECT INTO for its simplicity.

Comments closed

Pattern Learning in Amazon SageMaker

Vishaal Kapoor, et al, take us through an example of pattern learning in Amazon SageMaker:

Pattern learning automatically analyzes your data and surfaces textual constraints that may apply to your dataset. For the example with phone numbers, pattern learning can analyze the data and identify that the vast majority of phone numbers follow the textual constraint [1-9][0-9]{2}-[0-9][4]. It can also alert you that there are examples of invalid data so that you can exclude or correct them.

In the following sections, we demonstrate how to use pattern learning in Data Wrangler using a fictional dataset of product categories and SKU (stock keeping unit) codes.

Read on for the scenario.

Comments closed

Splitting Strings with KQL

Robert Cain splits the baby:

In databases, we often find columns that are stored in a hierarchy structure, not unlike a file path on your drive. For example, in the Microsoft Logs sample database the Perf table stores its counter path this way: \\computername\Memory\Available MBytes.

It would be helpful to have a way to easily break this path out into its individual parts. KQL provides us a way of doing this using the split function.

Check out examples of how you can perform splitting.

Comments closed

Getting Row Counts for Different DBMS Platforms

Brendan Tierney wants rowcounts:

A little warning before using these queries. They may or may not give the true accurate number of records in the tables. These examples illustrate extracting the number of records from the data dictionaries of the databases. This is dependent on background processes being run to gather this information. These background processes run from time to time, anything from a few minutes to many tens of minutes. So, these results are good indication of the number of records in each table.

Click through for examples in Oracle, MySQL, Postgres, SQL Server, and Snowflake. Though the SQL Server one does need a GROUP BY clause because it’s a sum of the partitions’ rows.

Comments closed