Press "Enter" to skip to content

Author: Kevin Feasel

Thoughts on Certification

Eugene Meidinger is certifiable:

This being a complex topic, I thought I’d lay out the various factors to give a more comprehensive answer than you can easily fit in a tweet.

So the first two questions we need to answer are “Why do certs exist?” and “Why do people take them?”. Without these, we can’t give a good answer to whether you should take them. Certifications often exist for reasons that have nothing to do with your personal best interest. It is necessary to understand that fact.

Giving the economist’s spin, certifications are imperfect signals of reputation. When you know nothing else about a candidate, business partner, vendor, or ranting homeless person on the street, that cert can let you update your priors about the person. The exclusivity of the certification goes a long way in building credence: the MCM (or MCSM) has such a positive reputation even years after its cancellation because it was so difficult an exam that the only way a person could pass is if that person really knew the topic extremely well. By contrast, the old MCSE certifications from the early 2000s were a joke because anybody could memorize a brain dump, spit out answers, and get a cert.

The economist in me also says that certifications tend to be a net drain because you’re spending time on an imperfect signal when there are probably better imperfect signals out there. Your blog, YouTube/Twitch channel (assuming you’re not just playing Slay the Spire all day), and GitHub repo are going to tell me more about your interests and technical capabilities.

Read what Eugene has to say. I think we agree on the broad strokes but I’m probably more in the “not worth it” camp than he is with the exception of cases where it’s necessary to land a business contract (e.g., needing to be a Microsoft Gold Partner).

1 Comment

Investigating THREADPOOL Waits

Erik Darling has an interesting query against sys.dm_exec_query_stats to help you determine what might cause your THREADPOOL wait problems:

When THREADPOOL strikes, even the best monitoring tools can have a bunch of blank spots hanging around in them.

If you’re on SQL Server 2016 or better, there are some helpful columns in sys.dm_exec_query_stats.

Erik also has advice to help you out if you are running into these waits.

Comments closed

Baselining SQL Server with the First Responder Kit

Ajay Dwivedi has a GitHub project showing a method to collect baseline measures for SQL Server using Brent Ozar’s First Responder Kit:

With sp_BlitzFirst & sp_WhoIsActive in a SQL Agent job with scheduled execution for every 10-15 minutes, you can look back in time in terms of What was running, its execution stats, file stats, wait stats and Perfmon counters. This would help you to answer anybody as why your server was slow at a particular point in time.

Click through for the post and check out the GitHub repo as well. Baselining is extremely important for proper administration.

Comments closed

Converting One Column Into Multiple With Power Query

Imke Feldmann shows us how to pivot a single column into a fixed number of columns using Power Query:

The demand to unstacking a column into a table is not rare (see here for example: PowerBIForum  ) . Also if you copy a table from a post in the Power BI community forum  to the enter-data-section in Power BI, it will show up as such a one-column-table.

Note that this is different from the Entity-Attribute-Value model, as there’s no entity or attribute—it’s just values.

Comments closed

Sample Spark-Submit Config Settings

Leela Prasad shares a few sample configuration settings for Spark-Submit jobs:

Before going further let’s discuss on the below parameters which I have given for a Job.
spark.executor.cores=5 
spark.executor.instances=3
spark.executor.memory=20g
spark.driver.memory=5g 
spark.dynamicAllocation.enabled=true 
spark.dynamicAllocation.maxExecutors=10 

Click through to see what these do and why Leela chose these settings. The Spark documentation has the full list of settings but it’s good to hear explanations from practitioners.

Comments closed

Datasets In Spark

Ayush Hooda explains the differences between DataFrames and Datasets in Apache Spark:

The Datasets API provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine. You can define Dataset objects and then manipulate them using functional transformations (map, flatMap, filter, and so on) similar to an RDD. The benefits are that, unlike RDDs, these transformations are now applied on a structured and strongly typed distributed collection that allows Spark to leverage Spark SQL’s execution engine for optimization.

Read on for more details and a few examples of how to operate DataFrames and Datasets.

Comments closed

Restoring Databases From Azure

John Morehouse shows how we can restore a database from Azure Blob Storage:

So how do you restore from Azure storage? You do so from an URL.  Let’s take a look!

When you backup a database to Azure, there are two types of blobs that can be utilized, namely page and block blobs.   Due to price and flexibly, it is recommended to use block blobs.  However, depending on which type you used to perform the backup will dictate how the restores are performed.  Both methods require the use a credential, so that information will need to be known before being able to restore from Azure.

Click through for examples using both page blobs and block blobs.

Comments closed

Azure Data Factory Data Flows

Marlon Ribunal shows how we can perform some amount of data transformation in an Azure Data Factory V2 data flow:

Azure Data Factory (ADF) offers a convenient cloud-based platform for orchestrating data from and to on-premise, on-cloud, and hybrid sources and destinations. But it is not a full Extract, Transform, and Load (ETL) tool. For those who are well-versed with SQL Server Integration Services (SSIS), ADF would be the Control Flow portion.

You can scale out your SSIS implementation in Azure. In fact, there are two (2) options to do this: SSIS On-Premise using the SSIS runtime hosted by SQL Server or On Azure using the Azure-SSIS Integration Runtime.

Azure Data Factory is not quite an ETL tool as SSIS is. There is that transformation gap that needs to be filled for ADF to become a true On-Cloud ETL Tool. The second iteration of ADF in V2 is closing the transformation gap with the introduction of Data Flow.

Despite it not being nearly as complete as SSIS, there are useful data transformations available in Azure Data Factory, as Marlon shows.

Comments closed

Preventing SQL Server Startup With A Simple INI File

Solomon Rutzky is a month early with this:

In the event shown directly above, towards the bottom, in the final “<Data>” element that starts with “\\?\C:\ProgramData...“, that entry does point to a folder containing a Report.wer file. It is a plain text containing a bunch of error dump info, but nothing that would seem to indicate where to even start looking to fix this. And, nothing useful for searching on, at least not as far as my searching around revealed.

Conclusion
There you have it: a nearly untraceable way to prevent SQL Server from starting.

Read on to see how what Solomon did.

Comments closed