Press "Enter" to skip to content

Author: Kevin Feasel

Executing via Proxy in SSIS

Andy Brownsword submits a series of requests through a proxy:

When executing packages for SSIS the default option would be to use the SQL Server Agent service account. We might not want to share an account between our services and Integration Services packages due to security risks.

Let’s take a common example: Suppose we have a package which reads from a file share. If permissions aren’t set up correctly this can fail. Even if access is corrected, the service account can cache that failure. The only way to resolve may be to restart the service. This isn’t something we’d want to be doing on production systems.

Read on to see how you can use a proxy to bypass this problem.

Comments closed

An Overview of Data Types in R

Steven Sanderson talks data types:

Imagine your data as a diverse collection of individuals. Some might be numbers (like age or weight), while others might be text (like names or addresses). These different categories are called data types, and R recognizes several key ones:

Click through for that list. It’s a bit different from what you’d expect if you come at this from a SQL or C-based programming language background. But they all make good sense when you remember that R is a domain-specific language for statistics, so it’s going to emphasize the things that make the most sense for statisticians and data scientists.

Comments closed

dataConvergeDefinition and DirectQuery Partitions

Chris Webb talks about hybrid tables:

Hybrid tables – tables which contain both Import mode and DirectQuery mode partitions to hold data from different time periods – have been around for a while. They are useful in cases where your historic data doesn’t change but your most recent data changes very frequently and you need to reflect those changes in your reports; you can also have “reverse hybrid tables” where the latest data is in Import mode but your historic data (which may not be queried often but still needs to be available) is in DirectQuery mode. Up to now they had a problem though: even when you were querying data that was in the Import mode partition, Power BI still sent a SQL query to the DirectQuery partition and that could hurt performance. That problem is now solved with the new dataCoverageDefinition property on the DirectQuery partition.

Read on to see what dataCoverageDefinition does.

Comments closed

Installing or Upgrading SQL Server on a Workstation

Aaron Bertrand shares some advice:

Lately, I’ve seen many people struggling to upgrade their workstation to the latest version of SQL Server. The main source of the problem is usually the web installer/wrapper, which can fail for a variety of reasons that aren’t always made clear by the error messages. Today, I’m going to walk through one way you can approach a workstation upgrade and avoid these web installer issues altogether. I’ll focus on SQL Server 2022, but most of the information applies to any modern major version. Most of the information applies if you’re installing a brand-new instance, too.

Click through for several good tips and practices.

Comments closed

Accessibility in UI Design

Benedict Ampea-Badu shares some tips:

In this last segment of the series, we’ll delve into the crucial elements of navigation and flow, exploring further the part of web accessibility that ensures users can effortlessly navigate online content. Additionally, we’ll shine a spotlight on the indispensable practice of regular testing, an ongoing commitment that goes beyond compliance, providing invaluable insights into real user experiences and fostering continuous improvement.

Ensuring a seamless online experience for everyone and facilitating smooth navigation boils down to adhering to the fundamentals of web accessibility. It boils down to carefully planning how websites are laid out, making it easy for users to explore different things effortlessly. When developers grasp how users interact and tackle possible issues, they go beyond just ticking boxes on a checklist. Instead, they create websites that truly put users first, considering how people actually use and prefer to browse.

This is particularly about accessibility in web applications but many of the same principles apply to things like Power BI reports.

Comments closed

Improving Aurora Postgres Performance by Lowering random_page_cost

Shayon Mukherjee does some performance tuning:

Recently I have been working with some queries in Postgres where I noticed either it has decided not to use an index and perform a sequential scan, or it decided to use an alternative index over a composite partial index. This was quite puzzling, especially when you know there are indexes in the system that can perform these queries faster.

So what gives?

After some research, I stumbled upon random_page_cost (ref).

Read on to learn more about what this is, how you can change it, and a warning before you go wild in changing it.

Comments closed

Using Databricks System Tables

Dustin Vannoy has a primer on system tables in Databricks:

Monitoring is important, so I’ve covered the topic a few times in the past. I’ve talked about collecting your Spark application logs and Spark metrics. These are a good way to track what is happening and what is going wrong as your code runs. In the video related to this post I focus on a different side of monitoring. The evolving capabilities offered by Databricks System Tables. I have some sample queries and links to help you get started and begin to get value from system tables. This will need to be updated (I’ll try) as new tables go into public preview status. So let’s discuss the questions I had when I first started researching this feature:
1) What do the Databricks system tables offer me for monitoring?
2) How much does this overlap with the application logs and metrics?

Click through for a video and a walkthrough.

Comments closed

Supporting Replication in dbatools

Jess Pomfret makes an announcement:

Replication, everyone loves to hate it, but it’s been our most requested feature as far as adding commands to dbatools. For ages we’ve said ‘sounds great’ and ‘we would love that’, but when we started looking into it the energy soon fizzled away, due to it’s dependency on RMO – Replication Management Objects, as opposed to SMO – SQL Server Management Objects, things weren’t as easy as we had hoped.

That said, it’s now there as of version 2.1.1. It supports snapshot, transactional, and merge replication. No peer-to-peer, but is anyone actually surprised at that?

Comments closed

Looping in Python

Jack Wallen has us spinning in circles:

What is a loop? Other than a fruity breakfast cereal, a loop is a portion of code that repeats a set number of times until a desired process is complete.

Here’s an example of a loop that many can relate to. You go to the ATM, insert your card, and start with your transaction. First, you check your balance, then you remove money from your account. The loop keeps running with each transaction until you tell the ATM you are finished.

Read on for loop examples, though for is definitely much more common and popular than while. I’d also recommend learning a bit about list comprehensions, which are a way of turning a for loop into a one-line operation: [f(x) for x in my_list] is a simple example: for each row in my_list, perform function f() with that row’s value as the input. Then, take all of the results of f(x) and stick them together in a new list.

Comments closed

Clustered Index Rebuilds and Transaction Log Space

Chad Callihan talks about a potential problem:

Of all the problems that can cause a database transaction log to fill up, perhaps one lesser-known cause is a large clustered index rebuild. Let’s look at a scenario where this issue could cause a headache.

Click through for the scenario. I will say that I disagree with reorganizing the index under pretty much any circumstances. Jeff Moden, in a long and deep but extremely interesting presentation he gave to us at TriPASS three years ago, laid out why that is. I do agree that rebuilding a clustered index should be uncommon and may be a sign of a data modeling problem.

The other thing to note is that this problem is bigger when you rebuild online and your data drive is the same as your log drive. When you rebuild an index online, SQL Server needs enough space to make a copy of the data, so the rule of thumb is that you’ll have 2x the normal disk space requirements during online rebuild. It’s not necessarily exactly 2x because of fragmentation, compression, and other factors, but that’s why it’s a rule of thumb and not a “guarantee or your money back” type of thing.

Comments closed