Press "Enter" to skip to content

Author: Kevin Feasel

Structuring Databricks Notebooks

Paul Andrew has put together a basic structure for Databricks notebooks using titles, markdown, and widgets:

For me, one of the hardest parts of developing anything is when you need to pick up and rework code that has been created by someone else. That said, my preferred Notebook structure shown below is not about technical performance or anything complicated. This is simply for ease of sharing and understanding, as well as some initial documentation for work done.

In my example I created a Scala Notebook, but this could of course apply to any flavour.

This makes good use of markdown capabilities without being too heavy. I like it. The same general principles apply if you’re putting together Jupyter notebooks outside of Databricks.

Comments closed

Editing Rows in SSMS–Behind the Scenes

Randolph West explains what happens when you use the editor built into SQL Server Management Studio to modify rows directly:

A customer recently brought up an interesting thesis, that if you edit a table’s values using SQL Server Management Studio (SSMS) using the edit feature, that the table is dropped and recreated in the background when you commit the changes.

This is false, but there had to be a good reason why they were under this misapprehension.

Read on to understand what does happen and where the customer might have gotten this idea.

Comments closed

The Problems with RDPing into SQL Server Hosts

Emanuele Meazzo enumerates the issues with using RDP to connect directly to a server hosting SQL Server (or any other database system):

When there are SERIOUS performance problems, i.e. your instance is totally pinned as far as resources go, the only way to log into the instance to do something about it is using DAC; even if you are on the same machine, if you’re trying to log in, as usual, it won’t make any difference.
Enable Remote DAC and save yourself from future troubles, now.

There are several problems. Better is to run client tools from a client machine and something like SSH or Powershell remoting to do things against a server itself.

Comments closed

Testing SQL Server Storage Performance

Brent Ozar walks us through CrystalDiskMark 7 to check whether storage speed is up to snuff:

The Peak Performance + Mix setting runs a pretty cool mix of tests that will push your storage hard. Note that I don’t try to get CDM to replicate exactly how SQL Server does IO: I’m just trying to get a quick 5-minute idea of whether my storage is hot or not.

Click through for the demo. Mind you, this is something you want to do before setting up SQL Server…

Comments closed

Beginner’s Guide to Azure Data Factory

Cathrine Wilhelmsen has started a new series:

Azure Data Factory = Azure Data Factory v2

This means that today, when I talk about “Azure Data Factory”, I refer to “Azure Data Factory v2” and skip the “v2” part of the name. I mostly pretend that Azure Data Factory v1 doesn’t exist anymore 🙂

That’s something we all do.

This first post is a quick “What is this product?” intro, giving us a basis for later posts.

Comments closed

KSQL to ksqlDB

Jay Kreps announces a new naming for KSQL:

Today marks a new release of KSQL, one so significant that we’re giving it a new name: ksqlDB. Like KSQL, ksqlDB remains freely available and community licensed, and you can get the code directly on GitHub. I’ll first share about what we’ve added in this release, then talk about why I think it is so important and explain the new naming.

There are two new major features we’re adding: pull queries and connector management.

This looks really interesting.

Comments closed

Visualizing Kafka Data Using D3

Mihalis Tsoukalos extracts, explores, and visualizes data (with D3) from a Kafka topic:

Now that you have your data in JSON format, you will use D3.js in order to visualize it. As JavaScript code is embedded in HTML files, the final version of the D3.js code can be found in visualize-spatial.html, which contains the following code:

D3 is extremely powerful, though that power comes with a fairly steep learning curve.

Comments closed

Failing SQL Server 2019 Installation

Paul Randal walks through an installation error with SQL Server 2019:

I have a brand new Windows 10 laptop that I use solely for teaching, as the HDMI and SVGA connectors on my main laptop somehow have bad connections to the motherboard after a few years of traveling around the world. On the new laptop I have SQL Server 2017, and now that SQL Server 2019 RTM has shipped, and I’ve finished teaching for the year, I set out to install 2019 side-by-side with 2017 so I can move my teaching environment to 2019 and work on demos of the new features pertinent to what I teach.

Thinking this would be a smooth process, I kicked off the install wizard, went through it, hit go, and walked away. I came back half an hour later to see that all install steps had failed.

Read on for Paul’s solution.

Comments closed

Resuming Index Operations but Using Different Options

John Morehouse has an interesting use case for resumable indexes:

Documentation on ALTER INDEX provides which options we can set when resuming a rebuild or creation operation:

<resumable_index_option> ::=
 {
    MAXDOP = max_degree_of_parallelism
    | MAX_DURATION =<time> [MINUTES]
    | <low_priority_lock_wait> 
 }

<low_priority_lock_wait>::= 

    WAIT_AT_LOW_PRIORITY ( MAX_DURATION = <time> [ MINUTES ] ,  
                          ABORT_AFTER_WAIT = { NONE | SELF | BLOCKERS } ) 

This means that we can change the MAXDOP, MAX_DURATION, and WAIT_AT_LOW_PRIORITY.

I’m going to gather that this was not necessarily the original intent, but it’s pretty nice, as it means that you can resume with fewer cores and lower priority during the day, but more cores and higher priority after hours.

Comments closed

Memory-Optimized Tables and Error Log Entries

Shaun J. Stuart points out impoliteness on the part of In-Memory OLTP:

It’s nice that they are labeled with [INFO], so you can be fairly sure they aren’t a major issue, but they still annoyingly fill up the log with information that is of no use to anyone outside of Microsoft. It would be nice if you could disable these messages but, to my knowledge, you cannot.

These are the types of error messages which should, by default, not write to the error log. My real bugbear is “Hey, we successfully backed up the transaction log!” You should not need a trace flag to turn that off; you should need one to turn it on for diagnostic purposes.

Comments closed