Press "Enter" to skip to content

Curated SQL Posts

Fun with Hash Tables in Powershell

Robert Cain continues a series on Powershell data structures:

Hash tables are powerful, flexible items. In some languages, hash tables are also called dictionaries. In this post we’ll start with a few basics, then get into some fun things you can do with hash tables.

A quick note, some documentation calls them a hash table, others read hashtable, one word. For this post I’ll use the two word hash table, but it’s the same thing no matter what documentation or blog you read.

Click through for plenty of examples of what you can do with them.

Comments closed

Finding Bad (Worse?) NOLOCK Statements across Instances

Aaron Bertrand powers up for about six episodes straight, but the results are amazing:

In Part 1 of this series, I showed how to use a Visitor pattern to walk through one or more T-SQL statements to identify a problematic pattern where NOLOCK hints are applied to the target of an update or delete. The method in my initial examples was very manual, though, and won’t scale if this problem might be widespread. We need to be able to automate collecting a potentially large number of statements across an entire environment, and then try to eliminate false positives without manual intervention.

Read on to see how you can take what Aaron wrote last time and make it scalable.

Comments closed

Executing T-SQL with a Proxy Account

Tom Collins answers a question:

I have some t-sql code added to a job step on a SQL Server Agent job. The problem is I need to run the code as RUNAS . I though of executing the job with a proxy account – so progressed with the Credential & proxy set up.    But I still can’t view the Proxy\Credential in the RunAs list . Is there a way around this problem?

Read on to learn why and for the answer.

Comments closed

Installing Kubernetes on Docker Desktop

Joy George Kunjikkur takes another stab at Kubernetes:

This is 3rd or 4th time, I am learning Kubernetes (hereafter mostly refer as K8s short form) hands-on sessions. Every time I learn the kubectl command and its options, I forget as there were no chances to apply in the day job. Another mistake I did all those times was missing to post learning to this blog.

Hope this time I will get a chance to use it in the day job and not miss posting the Kubernetes learning on to this blog.

This post is very basic. The aim is to get started with the Kubernetes development environment using Docker Desktop. Below are the steps at a high level to get started. Detailed steps with videos are available on the internet.

Read on for some installation tips, including how to install the Kubernetes dashboard for a UI experience.

Comments closed

An Overview of Azure Purview

Angela Henry gives us an overview of Azure Purview:

Organizations are amassing more data than ever, yet it is getting more difficult for their employees to find that data and use it with confidence. What if there was a solution out there that not only told us what data sources we have, but could tell us how those data sources should be used, and who the stewards/producers of that data are? What if it could allow us to classify our data, and provided us insights into what our entire data estate looked like? It might sound like data nirvana, but it just might be possible with the newest Platform as a Service (PaaS) offering from Microsoft, Azure Purview.

In the first part of the series, Angela covers the basics and pricing, so check it out.

Comments closed

Working with Trees of Data in R

Martin Stingl shows off the data.tree package:

Lately I tried to visualize an hierarchy with Tableau Desktop. The problem was that the hierarchy had a variable depth because it was tree-based. Each row had an id and a parent_id. Normally hierarchies in Tableau are defined by pulling some fields together, such as product categoryproduct group and product id.

Handling tree-based hierarchies seems to be a lot more complex. I found a plugin at https://github.com/tableau/extension-hierarchy-navigator-sandboxed but this only works online.

So I asked myself how I can handle this using R. I found the R-package data.tree at https://github.com/gluc/data.tree. I want to describe how I use this package to preprocess my data.

Read on to see how this works and how you can turn a classical data representation of a tree (ID and parent ID) into a flattened structure with a fixed number of levels. H/T R-Bloggers.

Comments closed

Caching Function Results in an R Package

Maelle Salmon and Cristophe Dervieux show us ways to cache results of function calls using R:

Caching means that if you call a function several times with the exact same input, the function is only actually run the first time. The result is stored in a cache of some sort (more practical details later!). Every other time the function is called with the same input, the result is retrieved from the cache unless invalidated. You will often think of caching as something valid in only one R session, but we’ll see it can be persistent across sessions via storage on disk.

As a quick note, this makes sense when writing functions, which are expressions without side effects. If you have side effects, caching might not give you what you expect.

Comments closed

Orchestrating ML Pipelines with Amazon Managed Workflows for Airflow

Juston Leto, et al, show off MLOps capabilities in AWS:

The ability to scale machine learning operations (MLOps) at an enterprise is quickly becoming a competitive advantage in the modern economy. When firms started dabbling in ML, only the highest priority use cases were the focus. Businesses are now demanding more from ML practitioners: more intelligent features, delivered faster, and continually maintained over time. An effective MLOps strategy requires a unified platform that can orchestrate and automate complex data processing and ML tasks, and integrates with the latest tooling to best complete those tasks.

This post demonstrates the value of using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate an ML pipeline using the popular XGBoost (eXtreme Gradient Boosting) algorithm. For more advanced and comprehensive MLOps capabilities, including a purpose-built model orchestration framework and a continuous integration and continuous delivery (CI/CD) service for ML, readers are encouraged to check out Amazon SageMaker Pipelines.

Read on for a step-by-step tutorial on the process.

Comments closed

Tools for Data Modeling

Rayis Imayev shares information on a few database modeling tools:

A picture is worth a thousand words, the same way a visual database schema is better than a database model communicated by a multitude of data scripting text objects.

Someone may spend a significant amount of time trying to describe all the database tables’ attributes, constraints and relationships between tables with words while a visual Entity Relationship Diagram (ERD) may only take a few minutes (or seconds) to tell the same story.

There are different and similar tools available to help you to create these visual ERD artifacts. The more visual appeal those tools may demonstrate, the fewer efforts would be required to add new information by keyboard-typing. Visual drag-drop experience will prevail, and only to type explicit list of attributes/types/etc. of your data model entities, that’s where your keyboard-typing skills will still be necessary.

Click through for reviews of several tools, ranging from free to expensive.

Comments closed

Analytic Window Functions in SQL Server

Aveek Das takes a look at a few analytic window functions available since SQL Server 2012:

Since the introduction of SQL Server 2012, the analytic functions were added to the SQL Server database engine. Any version of SQL Server after SQL Server 2012 can execute analytic queries on it. These functions are used to calculate an aggregated value from the dataset but are based on a specific set of rows instead of the entire dataset. As compared to aggregate functions like SUM, COUNT, AVG, etc. which return scalar records, these functions can return multiple records based on the conditions. The most common examples of using these functions are to find moving averages, running totals, etc. SQL Server supports the following analytic functions.

1. CUME_DIST – Find the cumulative distribution of a numerical column

2. FIRST_VALUE – Finds the first value of a column from the group and prints the same for each row

3. LAST_VALUE – Finds the last value of a column from the group and prints the same for each row

4. LAG – Reads values after the specified number of rows for a column

5. LEAD – Reads values before the specified number of rows for a column

Click through for examples of how each works.

Comments closed