June 2024 – Page 7 – Curated SQL

Excel File Handling in R and Python

Published 2024-06-17 by Kevin Feasel

If you often work with Excel files and are looking to streamline your data import and export processes, R and Python offer some powerful packages to help you. Here, I’ll introduce you to some essential tools in both R and Python that will make handling Excel files a breeze.

Read on for three options in R and two in Python.

Comments closed

Bringing SQL Server Data into Microsoft Fabric

Published 2024-06-17 by Kevin Feasel

Nikola Ilic shows us the current options:

Options, options, options…Having the possibility to perform a certain task in multiple different ways is usually a great “problem” to have, although very often not each option is equally effective. And, Microsoft Fabric is all about “options”…You want to ingest the data? No problem, you can use notebooks, pipelines, Dataflows, or T-SQL. Data transformation needed? No worries at all – again, you may leverage notebooks, T-SQL, Dataflows…Data processing, you asked? Lakehouse (Spark), Warehouse (SQL), Real-Time Intelligence (KQL), Power BI…The choice is yours again.

In a nutshell, almost every single task in Microsoft Fabric can be completed in multiple ways, and there is no “right” or “wrong” tool, as long as it gets the job done (of course, as efficiently as possible).

Nikola lays out two pre-requisites and then shows us two options we can currently use, and three potential options we currently cannot use.

Comments closed

Low-Downtime Migration Techniques from SQL Server 2017 to 2022

Published 2024-06-17 by Kevin Feasel

Yohei Kawajiri describes three techniques for performing a SQL Server migration:

It is possible to configure a SQL Server Always On availability group with a primary replica running on SQL Server 2017 and a secondary replica running on SQL Server 2022, but there are important considerations and limitations to keep in mind:

Backward Compatibility: SQL Server supports having replicas on different versions, but the primary replica must be on an older version than or equal to the secondary replicas. Therefore, having SQL Server 2017 as the primary and SQL Server 2022 as the secondary is valid.

Database Upgrade Path: When you decide to upgrade the primary replica to a newer version, you need to follow a specific upgrade path to ensure minimal downtime and data integrity.

Building an availability group? Yeah, makes a lot of sense. Performing log shipping? Sure, I could see that working. Database mirroring? I did not expect to read that one, mostly because it’s been deprecated for a decade.

Comments closed

Task Scheduling in PostgreSQL with pg_cron and pg_timetable

Published 2024-06-17 by Kevin Feasel

Radim Marek compares two extensions:

Working with PostgreSQL, and virtually any database system, extends far beyond merely inserting and retrieving data. Many application and business processes, maintenance tasks, reporting, and orchestration tasks require the integration of a job scheduler. While third-party tools can drive automation, you can also automate the execution of predefined tasks directly within the database environment. Although system-level cron might be a starting point, the power of the database system lies in its ability to store all the necessary information alongside your data/schema. In this article, we will explore pg_cron and pg_timetable as two distinct PostgreSQL-specific tools for scheduled task automation.

Read on to learn more about each.

Comments closed

Windows Local Admins and sysadmin in SQL Server

Published 2024-06-17 by Kevin Feasel

Jeff Iannucci continues a series on security:

If you have been reading along with our series of “30 SQL Server Security Checks in 30 days” posts, you’ve probably noticed a theme for a lot of these posts, where we recommend reviewing which principals have CONTROL SERVER permissions or are in the sysadmin role.

Full disclosure: I hope you aren’t tired of that yet, because there will be more of those posts.

However, today I wanted to turn your attention towards what might be considered potential shadow members of the sysadmin role. These are the members of the server’s local Windows Administrators group.

Read on for Jeff’s explanation.

Comments closed

Querying Deadlocks in Azure SQL DB

Published 2024-06-17 by Kevin Feasel

Josephine Bush wants to find the deadlocks:

A couple of weeks ago, a developer came to me and wanted to know how to figure out what was causing a deadlock. I honestly didn’t know where to look or if this was even being captured in Azure SQL DB already. It turns out that Microsoft has you covered with deadlock tracking. At least for a period of time. It looks like you can go back about a month, maybe.

Read on to see how you can find this information in Azure SQL DB. If you’re working in on-prem SQL Server and you don’t have any tooling set up, you can find some deadlocks in the system health extended event.

Comments closed

Blocking from Async Stats Updates

Published 2024-06-17 by Kevin Feasel

Tom Zika diagnoses an issue:

I recently encountered an issue where an index rebuild set to wait_at_low_priority ended up blocking an asynchronous statistics update. This interaction led to a large blocking chain where queries were waiting on the async stats update and started to timeout.

Read on for an explanation of all of the players involved, then a demo, and finally two solutions.

Comments closed

An Introduction to healthyR

Published 2024-06-14 by Kevin Feasel

Steven Sanderson covers a package:

This article will introduce you to the healthyR package. healthyR is a package that provides functions for analyzing and visualizing health-related data. It is designed to make it easier for health professionals and researchers to work with health data in R. It is an experimental package that is still under active development, so some functions may change in the future along with the package structure and scope.

Unfortunately, the package needs some love and attention. Which I am trying to give it. Given that information, I will be updating the package to include more functions and improve the existing ones. I will also be updating the documentation and adding more examples to help users get started with the package.

So let’s get started!

Read on for that overview, including an explanation of why the package exists and several examples of how to use it.

Comments closed

An Intro to Vetiver in R

Published 2024-06-14 by Kevin Feasel

Colin Gillespie introduces an R package for MLOps:

Most R users are familiar with the classic workflow popularised by R for Data Science. Data scientists begin by importing and cleaning the data, then iteratively transform, model, and visualise it. Visualisation drives the modeling process, which in turn prompts new visualisations, and periodically, they summarise their work and report results.

Click through for a demonstration of how to create and deploy a model using vetiver.

Comments closed

Dynamic Warehouse and Lakehouse Connections in Data Pipelines

Published 2024-06-14 by Kevin Feasel

Koen Verbeeck doesn’t want to hard-code the connection string:

When you develop data pipelines in Microsoft Fabric (the Azure Data Factory equivalent in Fabric, not to be confused with deployment pipelines), you will most likely have some activities with a connection to a warehouse, a lakehouse or a KQL database (for the remainder of the blog post I’ll talk about a warehouse, but it can be any of those three data stores). For example, in a Script, Lookup, or Copy activity. When you deploy your data pipeline to another workspace – using, you might’ve guessed it, deployment pipelines – the pipeline itself is copied to the other workspace. E.g., we deploy a pipeline from the development workspace to the test workspace.

Read on to see what this means for warehouse connections and how you can work around the existing messiness.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Month: June 2024