Press "Enter" to skip to content

Author: Kevin Feasel

Formatting with RegEx in SQL Server

Shane O’Neill has a problem:

This is a contrived example but I was given a script that got the “Discipline”, “DocumentVersion”, “DocumentNumber”, “SectionNumber”, and “SectionName” out of the above.

And while it works, I hate that formatting. Everything is all squashed and shoved together.

No, thanks. Let’s see if we can make this more presentable.

Shane has a regular expression. Now Shane has two problems.

In all seriousness, regular expressions are extremely powerful in the right scenario. Shane mentions being okay with it not in the database engine and I’m usually alright with that, but there are cases when it’s really helpful like figuring out if a particular input is valid. One example I have on a project is finding legitimate codes (like ISBN) where you can solve the problem easily with a regex but my source data is abysmal. I can use the SQL# regular expression functions to drop into CLR and figure out whether that value is any good, something I would have a lot more trouble with in T-SQL alone.

Comments closed

Monitoring Entity Framework

Grant Fritchey loves Entity Framework:

Yes, Entity Framework will improve your job quality and reduce stress in your life.

With one caveat, it gets used correctly.

That’s the hard part right? There is tons of technology that makes things better, if used correctly. There are all sorts of programs that make your life easier, if used correctly. Yet, all of these, used incorrectly, can make your life a hell.

One nit that I’ve always had with Entity Framework is that it’s very difficult to tell what part of the code the call was coming from. You really have no idea. So when my friend, Chris Woodruff, asked me on Twitter what would be the best way to monitor TagWith queries in Entity Framework, well, first, I had to go look up what TagWith was, then I got real excited, because, hey, here’s a solution.

That “I love Entity Framework” is the lead-in to a one-act play of mine with people with pitchforks, tar, and feathers. Nevertheless, Grant shows us how we can tag code in C# and capture that data in extended events. I’d read it but I’m too busy sharpening my pitchfork.

Comments closed

An Overview of dbatools with Jess and Bert

Bert Wagner has a new video available:

dbatools is one of the coolest community projects I’ve seen – it is amazing how many commands are available to help make managing your SQL Server instances a breeze.

This week I had the opportunity to learn how to use dbatools to automate backups, change recovery models, and discover additional dbatools commands from dbatools contributor Jess Pomfret.

Jess Pomfret then goes into more detail on the commands in the video:

The final tip I had for Bert was how to use Find-DbaCommand to help him find the commands he needed to complete his tasks.

A lot of the commands have tags, which is a good way to find anything relating to compression.

That was a nice collaboration.

Comments closed

Automated Query Capture With Logic Apps

Arun Sirpal shows how we can use Azure Logic Apps to automate periodic capture of running queries in Azure SQL Database:

Have you ever wanted to capture the T-SQL, waits, sessions IDs (etc) at a specific time for Azure SQL Database? Sure there are a few ways to do this. Extended Events comes to mind but I wanted to do something different.

For this blog post I decided to use Brent Ozar’s famous sp_BlitzWho command (in expert mode) coupled with Azure Logic Apps. At a high level it is simple. At a specific time trigger the execution of sp_BlitzWho stored procedure and query it for later use.

Click through to see how to set this up.

Comments closed

Creating A Big Data Cluster

Chris Adkin continues a series on big data clusters in SQL Server 2019:

This post post will focus on creating a big data cluster so that you can get up and running as fast as possible, as such the storage type used will be ephemeral, this perfectly acceptable for “Kicking the tyres”. For production grade installations integration with a production grade storage platform is required via a storage plugin. Before we create our cluster, with the assumption we are doing this with an on premises infrastructure, the following pre-requisites need to be met:

Read the whole thing, but wait until part 4 before putting anything valuable in it.

Comments closed

Gartner Advanced Analytics Magic Quadrant Updates

William Vorhies summarizes the changes to the Gartner Advanced Analytics magic quadrant:

The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out and once again there are big changes in the leaderboard.  Say what you will about our profession but as a platform developer you certainly can’t rest on your laurels.  Some traditional leaders have fallen (SAS, KNIME, H2Oai, IBM) and some challengers have risen (Alteryx, TIBCO, RapidMiner).

Databricks is making a big push and there’s more movement than usual in this year’s chart. Check it out.

Comments closed

Lessons Learned From A Kafka Streams Implementation

Rishi Dhanaraj provides us with some lessons learned from implementing Kafka Streams to read data from Cassandra and Mongo and write into Mongo:

This Python script ran on a single machine, and is from the early days of the company. However, this script didn’t scale since it cannot run in a distributed manner. As a result, this Python job ends up flapping—crashing and restarting regularly in production depending on the load it needs to process.

Second, the Python script puts read pressure on MongoDB and Cassandra, because it has to query the databases for each batch of walk-ins and Zenreach Messages. MongoDB and Cassandra are our primary databases for serving customer read queries. So we wanted to remove the additional read pressure added by this job, which currently competes for resources with our customers.

For these reasons, we wanted to move to a streaming solution—specifically, Kafka Streams. We already switched to Kafka Streams for walk-in detection, which my teammate Eugen Feller explained in a previous post.

Click through for a review of the architecture and some tips if you want to do this yourself.

Comments closed

Overwriting Data In Use With Databricks

Piotr Starczynski shows us how we can read data from a table, transform it, and write it back to the same file:

Recently I have reached interesting problem in Databricks Non delta. I tried to read data from the the table (table on the top of file) slightly transform it and write it back to the same location that i have been reading from. Attempt to execute code like that would manifest with exception:“org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from”

The lets try to answer the question How to write into a table(dataframe) that we are reading from as this might be a common use case?

This problem is trivial but it is very confusing, if we do not understand how queries are processed in spark.

Click through for the answer. I’m a little squeamish about doing this because my expectation is for data to flow from one source to another source; feeding the data back to the initial source feels strange, like running a load of clothes through the washer and dryer and then dumping them back into the hamper with the remainder of the dirty clothes.

Comments closed

Hiding Future Dates In DAX Measures

Marco Russo shows how we can define year-to-date measures which don’t include values for incomplete months:

The rows between September 2009 and December 2009 should not be visible. The goal here is to display a blank value in these out-of-range, “future” months.
A similar issue exists for the year-over-year calculation (YOY). Even though the measure tries to show a blank value in case of missing values in current or previous year, the amounts for August 2009 and for CY 2009 might be considered wrong.

The answer is certainly not trivial but it does make for a much nicer display.

Comments closed

Tricky Interview Questions

Jason Brimhall talks about trick questions at interviews:

Anybody that has interviewed for a job has most likely run into the trick question. Some interviewers like to throw out multiple trick questions all in an effort to trip up the candidate and get the candidate to doubt him/her self. Sure, there can be some benefit to throwing out a trick question or four. One such benefit would be to see how the candidate performs under pressure (see them squirm).

The downside to throwing out trick questions, in my opinion, would be that you can turn a serious candidate into an uninterested candidate. So, when throwing out the tricks, tread carefully.

Incidentally, I don’t think his example question was that tricky, in that there are good reasons to do what he shows. I have one question I like to ask during phone screens which is of a similar vein. I won’t share the question for obvious reasons, but answering it requires a reasonable amount of knowledge of the product and a little bit of cleverness.

On the whole, my interview philosophy is to ask questions which directly relate to the job at hand. If the job involves doing a lot of work with warehousing and ETL with SSIS, ask questions around columnstore indexes, tuning SSIS packages, and some of the types of red flags when looking at packages. I’ve found that people who really don’t know what they’re doing sort themselves out easily enough if you ask relevant questions.

Comments closed