Press "Enter" to skip to content

Author: Kevin Feasel

Using Temporal Tables for Created and Updated Timestamps

Daniel Hutmacher has an interesting use case for temporal tables:

You have a table that you want to add “created” and “updated” timestamp columns to, but you can’t update the application code to update those columns. In the bad old times, you had to write a trigger to do the hard work for you. Triggers introduce additional complexity and potentially even a performance impact.

So here’s a nicer way to do it, trigger-free.

Click through for the solution, as well as a warning from Daniel.

Comments closed

Rounding Times in SQL Server

Steve Stedman has the low-down on time rounding in SQL Server:

One thing that I end up having to look search on regularly is rounding of dates and times in Transact SQL, having looked this up too many times I finally realized that it is time for me to do my own blog post for it.

First off, whats the difference between rounding and truncating in these examples. Rounding rounds to the closest second, so 10:00:31 is rounded up to 10:01:00, and 10:00:29 is rounded down to 10:00:00. With truncation, it simple changes the truncated area to 0’s. so 10:00:31 gets truncated down to 10:00:00, and so does 10:00:59. Sometimes you may want rounding, and sometimes you may want truncation (floor) for your specific needs.

After having used date_trunc() in Postgres, I’d really like something similar in SQL Server.

Comments closed

Starting SQL: a Video Series

Erik Darling wraps up a slew of videos:

Over the last month, I’ve given away all my beginner SQL Server training content. I hope you’ve enjoyed it, and maybe even learned a thing or two.

After this, I’ll be getting back to my regular blogging.

There are a lot of videos to check out, and right now, Erik has a big discount off of his advanced training, so go, go, go.

Comments closed

Generating Mock Data for SQL Server

Chad Callihan has a few options for creating fake data:

It’s easy enough to create a handful of records for testing in SQL Server. What if you want 100 rows or 1000 rows? What if you want data that looks more legitimate compared to gibberish? In this post, we’ll look at different ways to generate mock data.

One of the trickiest things about creating mock data is getting the distributions right. For example, ABS(CHECKSUM(NEWID()) is great (just as RAND(CHECKSUM(NEWID())), but the results follow a uniform distribution because of the nature of checksums and random number generators. This makes charting numeric values look unnatural. Here’s an example I put together of generating data off of a normal distribution. It does take more effort, but if you’re generating this fake data to show it to users in tools like Power BI or Tableau, having data follow reasonable distributions is a good thing. That is, use whatever distribution makes sense for the particular data element: uniform, normal, Pareto (power law), gamma, etc.

Comments closed

Monitoring SQL Server on Linux with Telegraf, InfluxDB, and Grafana

Amit Khandelwal extends a solution for SQL Server on Windows:

In this blog, we will look at how we configure near real-time monitoring of SQL Server on Linux and containers with the Telegraf-InfluxDB and Grafana stack. This is built on similar lines to Azure SQLDB and Managed Instance solutions already published by my colleague Denzil Ribeiro. You can refer to the above blogs to know more about Telegraf, InfluxDB and Grafana. 

Click through for the quick version, and then the step-by-step process.

Comments closed

Updating SQL Server Container Memory Limits

Andrew Pruski doesn’t have time to restart containers:

When running multiple SQL Server containers on a Docker host we should always be setting CPU and Memory limits for each container (see the flags for memory and cpus here). This helps prevent the whole noisy neighbour situation, where one container takes all the host’s resources and staves the other containers.

But what if we forget to set those limits? Well, no worries…we can update them on the fly!

Click through to see how you can change the memory limits on a running container.

Comments closed

How Spark Determines Task Numbers and Parallelism

The Hadoop in Real World team explains how the Spark engine decides how many tasks to create for a job and how many can run in parallel:

In this post we will see how Spark decides the number of tasks and number of tasks to execute in parallel in a job.

Let’s see how Spark decides on the number of tasks with the below set of instructions.

[… instructions]

Let’s also assume dataset_Y has 10 partitions and dataset_Y has 5 partitions.

Click through for the full explanation.

Comments closed

Helpful Tools for Apache Kafka Developers

Dave Klein has a few tools to make working with Apache Kafka a little easier:

We like to save the best for last, but this tool is too good to wait. So, we’ll start off by covering kafkacat.

kafkacat is a fast and flexible command line Kafka producer, consumer, and more. Magnus Edenhill, the author of the librdkafka C/C++ library for Kafka, developed it. kafkacat is great for quickly producing and consuming data to and from a topic. In fact, the same command will do both, depending on the context. Check this out:

Read on for more information on this tool, as well as several others.

Comments closed

Performance Tips when Working with Large Datasets in R

Mira Celine Klein continues a series on performance tuning R code:

Whether your dataset is “large” not only depends on the number of rows, but also on the method you are going to use. It’s easy to compute the mean or sum of as many as 10,000 numbers, but a nonlinear regression with many variables can already take some time with a sample size of 1,000.

Sometimes it may help to parallelize (see part 3 of the series). But with large datasets, you can use parallelization only up to the point where working memory becomes the limiting factor. In addition, there may be tasks that cannot be parallelized at all. In these cases, the strategies from part 2 of this series may be helpful, and there are some more ways:

Click through for four options.

Comments closed

Shrinking Convolutional Neural Networks for TinyML

Pete Warden writes up a tip:

A colleague recently asked for more details on an approach I recommended, but which she hadn’t seen any documentation for. I realized that it was something I’d learned from talking to model builders at Google, and I wasn’t sure there was anything written up, so in the spirit of leaving a trail of breadcrumbs for anyone coming after, I thought I should put it into a quick blog post.

The summary is that if you have MaxPool or AveragePool after a convolutional layer in a network, and you’re targeting a resource-constrained system like a microcontroller, you should try removing them entirely and replacing them with a stride in the convolution instead. This has two main benefits, but to explain it’s easiest to diagram out the network before and after.

Click through for the full explanation.

Comments closed