Press "Enter" to skip to content

Author: Kevin Feasel

How DynamoDB Indexing Works

Shubham Agarwal explains how indexing works within DynamoDB:

Global secondary index in DynamoDb – An index with a partition key and a sort key that can be different from the base table. A global secondary index is very helpful when you need to query your data without primary key.

  •  The primary key of a global secondary index can be partition key or composite  (partition key and sort key).

  • Global secondary indexes can be created at the same time that you create a table. You can also add a new global secondary index to an existing table, or delete an existing global secondary index

  • A global secondary index lets you query over the entire table, across all partitions.

  • The index partition key and sort key (if present) can be any base table attributes of type string, number, or binary.

  • With global secondary index queries or scans, you can only request the attributes that are projected into the index. DynamoDB will not fetch any attributes from the table.

  • There are no size restrictions for global secondary indexes.

Click through to learn more about these as well as local secondary indexes.

Comments closed

Auto-Generated Filtered Stats

Dmitry Piliugin shows one way that we can optimize a query whose main problem is poor row estimates:

A model variation is a new concept in the cardinality estimation framework 2014, that allows easily turn on and off some model assumptions and cardinality estimation algorithms. Model variations are based on a mechanism of pluggable heuristics and may be used in special cases. I think they are left for Microsoft support to be able to address some client’s CE issues pointwise.

Today we are going to view some interesting model variation, that creates filtered statistics on-the-fly. I should give a disclaimer here.

Warning: All the information below is presented for purely educational and curiosity purposes. This is completely undocumented and unsupported and should not ever be used in production systems unless Microsoft support will recommend you. More to the point, the usage of this model variation may affect the overall server performance in a negative way. This should be used for experiments and in the test environment only.

It’s interesting reading, though do heed that warning.  This also isn’t a quick operation (seeing as how the database engine is creating filtered statistics), so it’s not a first-best choice.  But worth keeping your back pocket.

Comments closed

Plotting From SQL Server Machine Learning Services

Marlon Ribunal has a quick demo showing how to generate a ggplot2 plot using SQL Server Machine Learning Services:

Let’s install the package if it hasn’t been installed yet. The easiest way to do that is to run RGUI.exe that came with your SQL Server 2017 In-Database Machine Learning installation. You can find it here:

C:\Program Files\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64

Take note that you need to run the executable as Administrator. Also, if you’ve installed the R engine prior to your SQL Server 2017 In-Database Machine Learning with R, you have to explicitly tell the R package installer where you want your package installed.

> install.packages("ggplot2", lib="C:\\Program Files\\Microsoft SQL Server\\MSSQL14.MSSQLSERVER\\R_SERVICES\\library", dep = TRUE)

dep = TRUE tells the installer to install dependencies. ggplot2 depends on a lot of other packages. You can check dependencies using MiniCRAN.

Another option for installation is to bootstrap install via T-SQL:  you can execute external scripts which run install.packages() directly rather than using RGUI, if that makes more sense with your deployment process.

Comments closed

Stored Procedure To Build Filegroups And Files

Tracy Boggiano has a lengthy script which dynamically builds out files and filegroups for you:

Here are the six different ways you can run this procedure to files and filegroups for your database:

  1. Set up partitions with a set size for each file

  2. Set up partitions based on the size of a table

  3. By monthly date range with a set size for each file

  4. By monthly date range with sized based on the size of a table

  5. One file with a set size

  6. One file based on the size of a table

She then has examples for each call, so check it out.

Comments closed

The Value Of Spinlocks

Jeremiah Peschka walks us through the concepts behind spinlocks:

Here’s our fist stab at a spinlock, badly written in C:

// lock_value is an integer that's shared between multiple threads
while (lock_value != 0) {
  // spin
}

lock_value = 1;

do_something();

lock_value = 0;

The general idea here is correct – we have some lock_value and we only want to allow processes into the critical section of code if the another process doesn’t “hold the lock.” Holding the lock means that lock_value is set to a non-zero value.

There are few things that make this code bad. These things, coincidentally, are part of what make concurrent programming difficult.

Spinlocks are a critical part of maintaining internal consistency, but they can also accidentally ruin performance on your system if done wrong.  Read the whole thing.

Comments closed

“Failed To Delete Database” In Azure

Arun Sirpal walks us through a reason why you might not be able to delete an Azure SQL Database:

Failed to delete the database: TestDB. ErrorCode: 400 ErrorMessage: Database ‘TestDB’ cannot be deleted because it is used as a sync metadata database which still contains sync groups and/or sync agents.

After some digging around I remembered that I tried to setup a data sync group where this database was involved in bi-directional synchronization with another Azure SQL Database. The other database was deleted a long time ago and clicking on the sync option within the portal confirmed that nothing was actually ever connected or ready.

Arun walks us through the wrong way before getting us down the right path.  I recommend the latter.

Comments closed

Searching For Valid Characters For T-SQL Regular Identifiers

Solomon Rutzky is on a mission to find a national treasure:

Quite often these types of things are not that easy. Yes, it is very tempting to assume that the limited test is good enough and that we did indeed find the exact list of characters (plus we would need to add in the four extra characters: at sign (@), dollar sign ($), number sign (#), and underscore (_)). However, based on my experiences, it seems that more often than not, doing an exhaustive test results in a slightly different answer that invalidates the previous conclusion (which was based on the limited test). So, while it does take more time to do more extensive testing, it seems like we have little choice if we truly want to know how these things actually work.

What that means is, at the very least, we need to get the complete list of characters accepted by SQL Server for non-delimited identifiers to make sure that the totals match the number of code points returned by the searches done in Step 1.

This post is an interesting dive into the oddities of Unicode, but leaves us on a cliffhanger.  Also, full-crazy Nicolas Cage beats mullet-wearing Tom Hanks any day.

1 Comment

Contrasting Plotly And Seaborn

Natasha Sharma contrasts the Seaborn and Plotly libraries for visualizing data:

It was important to use a library which can provide easy and high-class interactivity. Before embedding the plots into my website code, I tested a few different libraries like Matplotlib and Seaborn in order to visualize the results and to see how different they can look. After few trials, I came across Plotly library and found it valuable for my project because of its inbuilt functionality which gives user a high class interactivity.

In this post, I am going to compare Seaborn and Plotly using – Bar Chart and Heatmap diagram. I will be using Breast cancer dataset to visualize these plots. But before jumping into the comparison, the dataset I used needed preprocessing like data cleaning so, I followed steps.

In this case, the contrast is mostly lines of code versus visual quality; read on for more.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Building Custom Widgets In SQL Operations Studio

Prashanth Jayaram shows how to build a custom widget in SQL Operations Studio, using database growth metrics as an example:

In this article, you will learn the following:

  • Introduction to SQL Operations Studio

  • How to run a custom SQL query and view it as a chart

  • How to use default and custom widgets

  • Various panes and options in SSOS

  • Explain the different chart options

  • Create a custom insight

  • Details to define a custom widget step by step

  • And more…

Building these dashboard widgets is pretty easy, and Prashanth shows it step by step.

Comments closed