Press "Enter" to skip to content

Category: Uncategorized

Thoughts on Certification

Eugene Meidinger is certifiable:

This being a complex topic, I thought I’d lay out the various factors to give a more comprehensive answer than you can easily fit in a tweet.

So the first two questions we need to answer are “Why do certs exist?” and “Why do people take them?”. Without these, we can’t give a good answer to whether you should take them. Certifications often exist for reasons that have nothing to do with your personal best interest. It is necessary to understand that fact.

Giving the economist’s spin, certifications are imperfect signals of reputation. When you know nothing else about a candidate, business partner, vendor, or ranting homeless person on the street, that cert can let you update your priors about the person. The exclusivity of the certification goes a long way in building credence: the MCM (or MCSM) has such a positive reputation even years after its cancellation because it was so difficult an exam that the only way a person could pass is if that person really knew the topic extremely well. By contrast, the old MCSE certifications from the early 2000s were a joke because anybody could memorize a brain dump, spit out answers, and get a cert.

The economist in me also says that certifications tend to be a net drain because you’re spending time on an imperfect signal when there are probably better imperfect signals out there. Your blog, YouTube/Twitch channel (assuming you’re not just playing Slay the Spire all day), and GitHub repo are going to tell me more about your interests and technical capabilities.

Read what Eugene has to say. I think we agree on the broad strokes but I’m probably more in the “not worth it” camp than he is with the exception of cases where it’s necessary to land a business contract (e.g., needing to be a Microsoft Gold Partner).

1 Comment

Disable Lightweight Pooling

Randolph West explains why enabling lightweight pooling in SQL Server is almost always a bad idea:

When can I enable lightweight pooling then?
Don’t. But if you must, these are the conditions under which Microsoft suggests it may be useful:
– Large multi-processor servers are in use.
– All servers are running at or near maximum capacity.
– A lot of context switching occurs (greater than 20,000 per second).

We can measure context switching with a performance counter in Performance Monitor on Windows, so the last two items on this list can be monitored. Use the Context Switches/sec counter to track context switches.

I’m sure there were a few customers who benefited from this, but I’ve neither seen nor heard of a case where it did actually help.

Comments closed

Dealing With Massive Database Counts

Edward Harrison describes some of the issues you run into when working on a SQL Server instance with thousands of databases:

One of the headaches we frequently have to deal with is the excessive amount of time it takes for SQL Servers with the discussed design to go through any planned patches. We have seen this while carrying out some planned maintenance for one of our customers and the work simply comprised failing over a SQL Server instance, from the active node, over onto the passive node so that the former active node could be patched. The passive node had had its SQL Server patched earlier in the day while the active node was still on the older patch level.

This SQL Server had 10,000 databases and all that we needed to do was to simply click the buttons to failover the SQL Server and then wait for it come online on the other node. In the end, this failover took nearly four hours to complete because upgrade scripts had to be applied to each of the 10,000 databases.

Even a couple thousand databases on an instance can slow things down considerably. Consider an operation which takes 10 milliseconds like laying out a database in the object explorer in SSMS. With 10K databases, that 10ms activity (note that I’m making up the number) would take 100 seconds. And these are often blocking tasks, meaning you’re sitting there spinning your wheels for 100 seconds before you can do the thing you wanted to do. The pain is real. And for server activities, most administrative functions (backups, CHECKDB, index rebuilding, etc.) is done one at a time, meaning you have a lot of time spent on this stuff.

Comments closed

Saving To Excel From Azure Data Studio

Bob Pusateri shows us how you can export to Excel from Azure Data Studio:

In SQL Server Management Studio, there’s no single-step way to save a result set to Excel. Most commonly I will just copy/paste a result set into a spreadsheet, but depending on the size of the result set and the types of data involved, that doesn’t always play nicely.

But Azure Data Studio does it WAY better, trust me. If you want that result set in a spreadsheet, just save it as one and poof – you have an Excel file!

Considering that Excel is the most popular BI tool, it makes sense to support it.

Comments closed

Things To Know About Databricks UAP

Kara Annanie has five things you should know about the Databricks Unified Analytics Platform:

4.     A Spark Dataframe is not the same as a Pandas/R Dataframe
Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer. This means that you need to use a Spark Dataframe to realize the benefits of the cluster when coding in Python or R within Databricks.

This is a nice set of tips for people getting started with Spark and Databricks.

Comments closed

Updating Power BI Reports Based On Website Changes

Kasper de Jonge shows how you can scrape a webpage using an M function and check for specific updates:

A quick hack today. Got this question from someone who needed to be able to update a report and show users that something had changed. This is easy when you have access to a database and can add data to it but in this case that was not possible.
So I came up with a hacky (and great :P) way to do this. and wanted to share it in case it came handy in your box of tricks :).

It is a little bit hacky, but much less so on a website with a last updated date visible someplace.

Comments closed

When SLEEP_TASK Waits Are Important

Ginger Keys gives an example where the SLEEP_TASK wait indicates a performance problem:

Ordinarily SLEEP_TASK is a nonspecific wait type in SQL Server which occurs when a task sleeps while waiting for a generic event to occur, according to Microsoft documentation.  This wait type can usually be safely ignored, however on some occasions it can happen when a script does not execute completely or hangs up for long periods of time.

The SLEEP_TASK wait means that a thread is waiting on a resource or waiting for some event to occur, and could indicate background task scheduling, a query plan exchange operator that isn’t tracked by CXPACKET, or it could be a hashing operation that spills to tempdb.

Read the whole thing. For a bit more information, check out the SQLskills description of this wait type.

Comments closed

SOS_SCHEDULER_YIELD And Hypervisors

Jonathan Kehayias has an interesting article on generating SOS_SCHEDULER_YIELD waits due to CPU over-subscription on the host:

The test environment that I used for this is a portable lab I’ve used for demos for VM content over the last eight years teaching our Immersion Events. The ESX host has 4 cores and 8GB RAM and hosts three virtual machines, a 4vCPU SQL Server with 4GB RAM, and two 2vCPU Windows Server VM’s with 2GB RAM that are used strictly to run Geekbench to produce load.  Within SQL Server, I a reproducible workload that drives parallelism and is repeatable consistently, that I have also used for years in teaching classes.
For the tests, I first ran a baseline where the SQL Server VM is the only machine executing any tasks/workload at all, but the other VMs are powered on and just sitting there on the hose.  This establishes a base metric in the host and had 1% RDY time average during the workload execution, after which I collected the wait stats for SOS_SCHEDULER_YIELD.

From there, Jonathan starts cranking up the load on the application servers and sees what it does to SQL Server ready time. This is a great reason not to over-subscribe on CPUs on mission-critical hosts.

Comments closed

Curated SQL Returns Wednesday

In honor of almost everybody being out of the office today and tomorrow, Curated SQL returns for its Boxing Day extravaganza, which looks exactly like any other day to me, but my producer promises me it will be extravagant.

Comments closed

What’s New With Kafka 2.1

Stephane Maarek updates us on the goings-on with Apache Kafka:

Kafka 2.1 is quite a special upgrade because you cannot downgrade due to a schema change in the consumer offsets topics. Otherwise the procedure to upgrade Kafka is still the same as before, see: https://kafka.apache.org/documentation/#upgrade

One of the big changes is support for Java 11. It’s a shame that Spark currently doesn’t support versions past 8.

Comments closed