Press "Enter" to skip to content

Curated SQL Posts

Scaling Anomaly Detection With Kafka And Cassandra

Paul Brebner has started a series on anomaly detection using Kafka and Cassandra, starting with an introduction:

Let’s look at the application domain in more detail. In the previous blog series on Kongo, a Kafka focussed IoT logistics application, we persisted business “violations” to Cassandra for future use using Kafka Connect. For example, we could have used the data in Cassandra to check and certify that a delivery was free of violations across its complete storage and transportation chain.

An appropriate scenario for a Platform application involving Kafka and Cassandra has the following characteristics:

  1. Large volumes of streaming data is ingested into Kafka (at variable rates)

  2. Data is sent to Cassandra for long term persistence

  3. Streams processing is triggered by the incoming events in real-time

  4. Historic data is requested from Cassandra

  5. Historic data is retrieved from Cassandra

  6. Historic data is processed, and

  7. A result is produced.

It looks like he’s focusing on changepoint detection, which is one of several good techniques for generalized anomaly detection.  I’ll be interested in following this series.

Comments closed

Cloning And Columnstore Statistics

Niko Neugebauer points out a fix in SQL Server 2019:

I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.

This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
WHY ?
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.

Click through for more details and a comparison between SQL Server versions.

Comments closed

Hybrid Columnstore And B+ Tree Designs

Adrian Colyer reviews a Microsoft paper on the combination of columnstore and B+ tree indexes on a single table:

The authors conducted a series of microbenchmarks as follows:

  • scans with single predicates with varying selectivity to study the trade-off between the range scan of a B+ tree vs a columnstore scan

  • sort and group-by queries to study the benefit of the sort order supported by B+ trees (columnstores in SQL Server are not sorted).

  • update statements with varying numbers of updated rows to analyze the cost of updating the different index types

  • mixed workloads with different combinations of reads and updates

It’s interesting to read an academic paper covering the topic, particularly when you can confirm that it works well in practice too.

Comments closed

Finding Trace Flag Usage With dbachecks

Rob Sewell points out an addition to dbachecks:

This will show you
  • the UniqueTag which will enable you to run only that check if you wish
  • AllTags which shows which tags will include that check
  • Config will show you which configuration items can be set for this check

The trace flag checks require the app.sqlinstance configuration which is the list of SQL instances that the checks will run against. You can also specify the instances as a parameter for Invoke-DbCheck as well.

Click through for an example.

Comments closed

Graph Additions In SQL Server 2019

Shreya Verma announces one of the new additions to graph database support in SQL Server 2019:

SQL Server 2017 and Azure SQL Database introduced native graph database capabilities used to model many-to-many relationships. The first implementation of SQL Graph introduced support for nodes to represent entities, edges to represent relationships and a new MATCH predicate to support graph pattern matching and traversal.

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of these features that is now available for public preview in SQL Server 2019Edge Constraints on Graph Edge Tables.

In the first release of SQL Graph, an edge could connect any node to any other node in the database. With Edge Constraints users can enforce specific semantics on the edge tables. The constraints also help in maintaining data integrity. This post describes how you can create and use edge constraints in a graph database. We will use the following  graph schema created in the WideWorldImporters database for the samples discussed here.

I know that SQL Server 2017 was a bit underwhelming for graph database work, so I will be interested in seeing how much of the gap they cover in this release.

Comments closed

Best Practices For Tabular Models

Ginger Grant has started a new series on best practices for Analysis Services Tabular models:

Data Type Selection

The data type selected will impact the physical storage used, not the compression of the models in memory.  It is important whenever possible to reduce the cardinality of the data in order to be able to sort the data effectively.  When storing decimal numbers, unless you need many significant digits, store the data as Currency as it will take less space in physical storage than decimal.

Click through for additional tips.

Comments closed

Thoughts On UTF-8 Encoding In SQL Server 2019

Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:

Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:

  1. can be used …

    1. as a database-level default Collation
    2. as a column-level Collation
    3. by appending “_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC” in their name, or being of level 140 or newer)
    4. with only the CHAR and VARCHAR
  2. (implied) have no effect on NCHAR and NVARCHAR data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents

  3. “This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)

Solomon takes his normal, thorough approach to the problem and finds several issues.

Comments closed

Visualizing Stock Data With Lares

Bernando Lares shows off some stuff his lares can do around visualizing time series data:

The overall idea of these functions is to visualize your stocks and portfolio’s performance with a just a few lines of simple code. I’ve created individual functions for each of the calculations and plots, and some other functions that gather all of them into a single list of objects for further use.

On the other hand, the lares package is “my personal library used to automate and speed my everyday work on Analysis and Machine Learning tasks”. I am more than happy to share it with you for your personal use. Feel free to install, use, and comment on any of its code and functionalities and I’ll happy to help you with it. I have previously shared other uses of the library in other posts which might also interest you: Visualizing ML Results (binary)Visualizing ML Results (continuous)and AutoML to understand datasets.

  • NOTE 1: The following post was written by a non-economist or professional investor. I am open to your comments and technical corrections if needed. Glad to learn as always!

  • NOTE 2: I will be using the less customizable functions in this post so we can focus more on the outputs than in the coding part; but once again, feel free to use the functions and dive into the library to understand or change them!

  • NOTE 3: All currency units are USD ($).

It does seem to be easy to use for this scenario.

Comments closed

Basic Linux For The SQL DBA

Kellyn Pot’Vin-Gorman continues her series on getting SQL Server DBAs ramped up on Linux:

Let’s begin with discussing WHY it’s not a good idea to be root on a Linux host unless absolutely necessary to perform a specific task. Ask any DBA for DB Owner or SA privileges, and you will most likely receive an absolute “No” for the response. DBAs need to have the same respect for the host their database runs on. Windows hosts have significantly hardened user security by introducing enhancements and unique application users to enforce similar standards at the enterprise server level, and Linux has always been this way. To be perfectly blunt, the Docker image with SQL Server running as root is a choice that shows lacking investigation to what privileges are REQUIRED to run, manage and support an enterprise database. This is not how we’d want to implement it for customer use.

Unlike a Windows OS, the Linux kernel is exposed to the OS layer. There isn’t a registry that requires a reboot or has a safety mechanism to refuse deletion or write to files secured by the registry or library files. Linux ASSUMED if you are root or if you have permissions to a file/directory, you KNOW what you’re doing. Due to this, it’s even more important to have the least amount of privileges to perform any task required.

Proper deployment would have a unique MSSQL Linux login owning the SQL Server installation and a DBAGroup as the group vs. the current configuration of ROOT:ROOT owning everything. With all the enhancements to security, this is one area that as DBAs, we should request to have adhered to. Our databases should run as a unique user owning the bin files and database processes.

Much of this post is walking us through some basics of security, but it also includes helpful built-in commands unrelated to security, like df to view free disk space.

Comments closed

Troubleshooting KSQL

Robin Moffatt walks us through a few scenarios where KSQL queries aren’t returning any data:

Probably the most common question in the Confluent Community Slack group’s #ksql channel is:

Why isn’t my KSQL query returning data?

That is, you’ve run a CREATE STREAM, but when you go to query it…

ksql> SELECT * FROM MY_FIRST_KSQL_STREAM;

…nothing happens. And because KSQL queries are continuous, your KSQL session appears to “hang.” That’s because KSQL is continuing to wait for any new messages to show you. So if your run a KSQL SELECT and get no results back, what could be the reasons for that?

Robin gives us five reasons why this might be.

Comments closed