Press "Enter" to skip to content

Curated SQL Posts

SparkSQL CONCAT vs T-SQL CONCAT

Bill Fellows has a public service announcement:

The concat function is super handy in the database world but be aware that the SQL Server one is way better because it solves two problems. It combines everything into a string and it does not require NULL checking. In the before times, one had to down cast to a n/var/char type as well as check for NULL before appending strings via the plus sign.

The point of difference is so important that Bill busted out the marquee HTML tag. Which now leads me to wonder, was marquee or blink the bigger evil in the mid-to-late ’90s web?

Comments closed

Port Scanning for SQL Server

David Fowler performs one of the early steps of a penetration test:

Since witnessing a rather nasty cyber attack around a year ago, I’ve been thinking quite a bit about security. Do we really know how secure our SQL Servers are? Penetration testing is a great way to find out where our weaknesses and vulnerabilities are. Ideally you probably want to be getting regular pen tests conducted by external companies (although in my experience, some are better than others. I’ve known some who argue totally pointless issues and miss glaring holes which I know exist, but that’s a whole different story) but wouldn’t it be useful if we could conduct some of these tests ourselves?

In this series of posts, I’m going to try to knock together a little pen testing toolbox so that we can hopefully find some of these vulnerabilities. I’m no pen testing expert and this is never going to replace getting a professional pen tester in to test your setup but it might go some way to helping us understand some of our vulnerabilities and identify them.

Click through to see what David did, as well as an alert which helped pick out this port scanning operation.

Comments closed

Shortcuts Beat Duplication in Microsoft Fabric

Reza Rad makes a recommendation:

Microsoft Fabric uses OneLake as the single logical layer for storage management across the Fabric. OneLake offers Domains and workspaces to work with Fabric Items, and amid all those structures, you may require a table or file from one Domain to be used in another. OneLake’s Shortcut feature will give you this option without duplicating the data. In this article, I will explain what Shortcuts are and their importance in building an analytics solution.

Click through for a video and an article.

Comments closed

Unit Testing Dynamic SQL

Jay Robinson lays out a pattern:

Dynamic SQL (aka Ad Hoc SQL) is SQL code that is generated at runtime. It’s quite common. Nearly every system I’ve supported in the past 30 years uses it to some degree, some more than others.

It can also be a particularly nasty pain point in a lot of systems. It can be a security vulnerability. It can be difficult to troubleshoot. It can be difficult to document. And it can produce some wickedly bad results.

Click through for Jay’s process as well as recommendations and an example. It’s certainly worth thinking about.

Comments closed

Upgrading SQL Server Cloud VMs

Brent Ozar recommends you check your cloud provider’s VM listings:

If you’ve been in Azure or Amazon for a few years, you’re probably on old, slow hardware.

In the last 3 weeks, I’ve had two clients who’d both been early cloud adopters. When they’d migrated to the cloud, they both used Azure Ev3 VMs – at the time, a good choice for SQL Server due to its relatively high amount of memory. When the Ev3 VM types were announced in 2017, they were hosted on Intel Broadwell and Haswell processors with 2.3-2.4GHz processing speed.

Also, even if you’re locked into a 1-year or 3-year deal, I know that at least Azure is usually willing to switch your VM class registration if you contact your support person. I’m not positive if AWS does the same but it wouldn’t shock me.

Comments closed

Exporting Dynamics 365 Data into Delta Lake via Synapse Link

Jose Mendes performs a data migration:

It’s fair to say there have been some considerable changes in the Azure landscape over recent years.

This blog will show you how to configure Synapse Link to export D365 data in the Delta Lake format – an open-source data and transaction storage file format used in Lakehouse implementations.

Before you start considering using this approach, you will need to ensure you meet the following prerequisites (Microsoft documentation).

Read on for those prerequisites as well as a step-by-step guide on how to do it.

Comments closed

Data Exploration in R with dplyr

Adrian Tam continues a series on R:

When you are working on a data science project, the data is often tabular structured. You can use the built-in data table to handle such data in R. You can also use the famous library dplyr instead to benefit from its rich toolset. In this post, you will learn how dplyr can help you explore and manipulate tabular data. In particular, you will learn:

  • How to handle a data frame
  • How to perform some common operations on a data frame

I like dplyr a lot for its “functional flow”—you pipe outputs of one function to be inputs of the next function, so the chain makes a lot of sense. If you want high performance, though, it’s often not the best choice—that’s usually data.table.

Comments closed

Pairs Plots in Base R

Steven Sanderson shows how we can create a pairs plot using the pairs() function in R:

A pairs plot, also known as a scatterplot matrix, is a grid of scatterplots that displays pairwise relationships between multiple variables in a dataset. Each cell in the grid represents the relationship between two variables, and the diagonal cells display histograms or kernel density plots of individual variables. Pairs plots are incredibly versatile, helping us to identify patterns, correlations, and potential outliers in our data.

Click through for one example, how to interpret it, and how to customize the outputs.

Comments closed

ggplot2 in Python Notebooks

John Mount runs R in Python with rpy2:

For an article on A/B testing that I am preparing, I asked my partner Dr. Nina Zumel if she could do me a favor and write some code to produce the diagrams. She prepared an excellent parameterized diagram generator. However being the author of the book Practical Data Science with R, she built it in R using ggplot2. This would be great, except the A/B testing article is being developed in Python, as it targets programmers familiar with Python.

As the production of the diagrams is not part of the proposed article, I decided to use the rpy2 package to integrate the R diagrams directly into the new worksheet. Alternatively, I could translate her code into Python using one of: Seaborn objectsplotnineggpy, or others. The large number of options is evidence of how influential Leland Wilkinson’s grammar of graphics (gg) is.

Click through to see how you can execute R code within the context of Python, similar to how you can use the reticulate package to execute Python code in the context of R.

Comments closed

Creating an Image Classification Model in Oracle OCI Vision

Brendan Tierney separates the cats and the dogs:

In this post, I’ll build on the previous work on preparing data, to using this dataset as input to building a Custom AI Vision model. In the previous post, the dataset was labelled into images containing Cats and Dogs. The following steps takes you through creating the Customer AI Vision model and to test this model using some different images of Cats.

This post is part four of a series (first part, second part, third part) on custom image classification in Oracle.

Comments closed