Kevin Feasel – Page 749

Fun with Computed Columns

Published 2021-07-12 by Kevin Feasel

Reitse Eskens runs into an issue with computed columns:

I got a question some time ago about a query not performing. There seemed to be an issue with SQL not using the best possible index. Now this issue is already vague and it took me some time to understand what was happening but with a simple demo I could replicate some of the issues.

Read on to learn what the problem was, as well as Reitse’s solution for it.

Comments closed

A Measure of Utilization by SQL Server Version

Published 2021-07-12 by Kevin Feasel

Steve Stedman some numbers:

Here listed is the current percentages of SQL server versions running our Daily Check-up with Database Health Monitor.

Do note that the population for this is “People who use Steve Steadman’s SQL Daily Checkup product” and not organizations which use SQL Server as a whole, so it’s not wise to apply findings from the first directly onto the second. That said, it doesn’t surprise me that 2016 is the most in-use version of SQL Server in this sample. It’s a little surprising how many 2014 instances there are, but that might be related to cardinality estimator changes.

1 Comment

Running Azure ML On-Premises via Azure Arc

Published 2021-07-09 by Kevin Feasel

Tsuyoshi Matsuzaki takes us through running Azure Machine Learning via Azure Arc:

First of all, you must run Azure Arc enabled Kubernetes on-premise or on 3rd party cloud. For running Arc-enabled Machine Learning later, use machines with more than 4 CPUs, since Arc-enabled ML requires enough resources.
In this post, I assume that we run KIND (Kubernetes in Docker) cluster on on-premise Ubuntu server. (For test purpose, I have used Ubuntu 18.04 on a single virtual machine in Azure, Standard D3 v2, which has 4 CPUs and 14 GB memory.)

Click through to see how it’s done.

Comments closed

Font Choices with ggplot2

Published 2021-07-09 by Kevin Feasel

Kenneth Tay takes us through font options in R’s ggplot2 package:

I was recently asked to convert all the fonts in my ggplot2-generated figures for a paper to Times New Roman. It turns out that this is easy, but it brought up a whole host of questions that I don’t have the full answer to.
If you want to go all out with using custom fonts, I suggest looking into the extrafont and showtext packages. This post will focus on what you can do without importing additional packages.

A quick word of warning: R’s behavior with respect to fonts differs quite a bit between Windows and Mac/Linux. This becomes especially apparent if you do end up installing something like extrafont. H/T R-Bloggers.

Comments closed

Case-Insensitive Collations in Redshift

Published 2021-07-09 by Kevin Feasel

Mengchu Cai, et al, show us how to change collation with Redshift:

Amazon Redshift is a fast, fully managed, cloud-native data warehouse. Tens of thousands of customers have successfully migrated their workloads to Amazon Redshift. We hear from customers that they need case-insensitive collation for strings in Amazon Redshift in order to maintain the same functionality and meet their performance goals when they migrate their existing workloads from legacy, on-premises data warehouses like Teradata, Oracle, or IBM. With that goal in mind, AWS provides an option to create case-insensitive and case-sensitive collation.
In this post, we discuss how to use case-insensitive collation and how to override the default collation. Also, we specifically explain the process to migrate your existing Teradata database using the native Amazon Redshift collation capability.

Specifically, it appears that they have two collations exposed: one which is case-sensitive and the other which is case-insensitive.

Comments closed

Event Streaming for Security

Published 2021-07-09 by Kevin Feasel

Kai Waehner has a new series, and part 1 is all about using Apache Kafka as the backbone for a cybersecurity infrastructure:

This introductory post explored the basics of cybersecurity and how it relates respectively why it requires data in motion powered by Apache Kafka. The rest of the series will go deeper into specific topics that partly rely on each other.
Threat intelligence is only possible with situational awareness. Forensics is complementary. Deployments differ depending on security, safety, and compliance requirements.

Click through for the article.

Comments closed

SQL Server on Azure Container Instances

Published 2021-07-09 by Kevin Feasel

Arun Sirpal has a series for us. Part 1 involves spinning up SQL Server on ACI:

This is Microsoft’s serverless technology which allows us to deploy containers without having to worry about managing the underlying hardware. It’s a way to get access to SQL fast (faster than traditional methods like installing a virtual machine) to do things like test code fixes etc.
There a couple of ways of doing this, you can use the portal, PowerShell or Azure CLI, I actually like Azure CLI.

Part 2 gives you an idea of what you get:

In the last post we built an image of SQL server 2019 Linux hosted in Azure Container Instance for fast access to SQL server. So, your next question is probably, lets see some database action?
When you connect to SSMS its not different, the feel and look, is, SQL server. Lets have a tour.

The normal warning with Azure Container Instances is that they’re great for development and testing efforts (in part because of how inexpensive it is compared to alternatives on Azure) but won’t have the same uptime or high availability guarantees that a service like Azure Kubernetes Service will have.

Comments closed

Extending MDF Files without an Outage

Published 2021-07-09 by Kevin Feasel

David Klee creates some files:

Do you have quite large MDF files on your database? By large, I mean hundreds of gigabytes (or larger). Have you ever noticed that your SQL Server disk stall metrics for these data files are much higher than the storage latency metrics exhibited on the underlying operating system layer? It could be that your SQL Server data files are being hammered too hard and you don’t have enough data files to help the SQL Server storage engine distribute the load. We do this for tempdb, right? Why don’t we do this enough for our user databases as well? It’s easy for a brand-new database from day zero, but what about existing databases that have grown out of control with a single data file attached? Let me show you how to adjust this for existing databases without an outage!

Check it out. This is a part of database administration I’d never really thought much about, so it often ended up being a blind spot for me.

Comments closed

Optimizing BERT Models on Google Colab

Published 2021-07-08 by Kevin Feasel

Kevin Jacobs fine-tunes some NLP processes:

BERT is a language model and can thus be used for predicting the next word in a sentence. Furthermore, BERT can be used for automatic summarization, text classification and many more downstream tasks. Google Colab provides you with a cloud-based environment on which you can train your machine learning models on a GPU. The downside is that your data is uploaded to the Google cloud. Google Colab gives you the opportunity to finetune BERT.

Click through to see how.

Comments closed

Streaming Foreign Key Joins in Kafka Streams

Published 2021-07-08 by Kevin Feasel

John Roesler and Adam Bellemare take us in depth on a feature:

Before 2.4.0, the absence of foreign-key joins in Kafka Streams was palpable. As soon as you have a KTable abstraction, you start to think of relational-DB-esque things that you’d like to do with it, and joining two tables is near the top of the list. In addition, Kafka users often started out by implementing change data capture (CDC) of their main database tables, resulting in the production of normalized record streams reflecting the database model. These records often contain foreign-key references, requiring you to either denormalize entirely within your source database (which can be quite expensive), or handle them downstream in your consumer. The ability to compute denormalization on the fly is exactly in the sweet spot of use cases for Kafka Streams.
In versions prior to 2.4, there were workarounds available to compute a foreign-key join, using the ability to transform the table, filter it, aggregate on properties, and join on primary keys. But these workarounds were complex, prone to bugs, and not very efficient. A concrete plan to implement first-class support for this crucial operation was first put together when Jan Filipiak proposed KIP-213 in 2017. Adam Bellemare took over driving the proposal in 2018 and brought it to a conclusion in time for the 2.4.0 release.

Click through for examples of how it all works, as well as how you might optimize foreign key joins.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Author: Kevin Feasel