Press "Enter" to skip to content

Curated SQL Posts

Lambda Architecture

Sebastiao Correia discusses Lambda architecture:

The batch layer stores all the data with no constraint on the schema. The schema-on-read is built in the batch views in the serving layer. Creating schema-on-read views requires algorithms to parse the data from the batch layer and convert them in a readable way. This allows input data to freely evolve as there is no constraint on their structure. But then, the algorithm that builds the view is responsible to manage the structural change in order to still deliver the same view as expected. 

This shows a coupling between the data and the algorithms used for serving the data. Focusing on data quality is therefore not enough and we may ask the question of the algorithm quality. As the system lives and evolves, the algorithms may become more and more complex. These algorithms must not be regarded as black boxes, but a clear understanding of what they are doing is important if we want to have a good data governance. Moreover, during the batch view creation, data quality transformations could be done so as to provide data of better quality to the consumer of the views.

Lambda is an interesting architectural concept, as it tries to solve the age-old “fast or accurate?” problem with “both.”  Get your fast estimates streamed through a speed layer, but your accurate, slow calculations handled through the serving layer.  Definitely check out this article.

Comments closed

CDH Update

Cloudera reports that CDH 5.7 includes a large number of changes to Hue, the web-based Hive UI:

Single-page app: The initial page loads very quickly and asynchronously fetches the list of tables, table statistics, data sample, and partition list. Subsequent navigation clicks will trigger only 1 or 2 calls to the server, instead of reloading all the page resources again. As an added bonus, the browser history now works on all the pages.

These are some nice changes.  I still don’t think a web app replaces quality tooling (like Management Studio), but if a web app is what you have, it should at least be nice.

Comments closed

Cycle Your Logs

Andrea Allred shows how to cycle error logs to prevent them from growing out of hand:

What is a good size?  I usually try to get it to roll over around 10 MB.  I use a monitoring tool and when the large error log alert is triggered, I have it run sp_cycle_errorlog for me so mine always stay a healthy size.  You don’t need fancy tools to do this though.  If you know about how fast your logs grow, you can set up a SQL Agent job to run it on a schedule to keep your logs healthy.

How many logs should I keep? This is completely up to you, but since I keep my logs so small, I try to keep 15 of them.  Why so many? I do it so I can go back and see issues further back if needed. You can adjust the amount you keep by right clicking on SQL Server Logs in SSMS and selecting “Configure”

I’d personally prefer to keep more logs—at least 32-45 days worth—but that’s going to depend upon the environment.

Comments closed

Max And Min Decimal Values

Robert Davis gives us the formula for the max and min decimal values given a scale and precision:

Unfortunately, the mathematical approach has flaws. First of all, Power(10, 38) exceeds the range of any numerical data type in SQL Server. There is no way to store or work with this value in calculations. Secondly, once you try to raise 10 to a power greater than 30, you start seeing floating point calculations (the values are approximate). For example, select Power(Cast(10 as decimal(38, 0)), 31) — casting as decimal(38, 0) because it exceeds int or bigint — yields 9999999999999999600000000000000. That’s clearly an approximated value and is not going to work for calculations where we’re expecting a precise value. So, that leaves the hacky version I didn’t want to do because I just cannot count on the mathematical approach working. Here is the base query using the hacky calculation

Read on for the hacky version.

Comments closed

Bidirectional Cross-Filtering And M

Chris Webb shows how to create a report with with a single slicer that allows the ability to show data for the current day, week, month, and year:

The way to achieve this is not all that different from the calculated column approach, but it requires a separate table to model the many-to-many relationship between all the required relative period selections and the dates in them, as well as the use of bidirectional cross-filtering between tables (which I blogged about here). The data model I used for this report looks like this

Be sure to read the comments to make sure you don’t get into a scenario in which a user can select multiple periods and get duplicated data.

Comments closed

Undocked Query Windows

Michael Swart notes that undocked query windows now feel all grown up:

The March 2016 Refresh (13.0.13000.55 Changelog) updates SSMS to use the new Visual Studio 2015 shell. Part of that change means that undocked windows are now top-level windows.

Top level windows are windows without parents so the undocked window is not a child window of the main SSMS window (but it is part of the same process). And so it gets its own space in the task bar, and participates in alt+tab when you switch between windows.

Also these undocked windows can be a collection of query windows.

One reason I rarely used child windows is that I’d undock something, switch to a browser tab underneath, and then switch back and watch the undocked window pop over my browser tab.  This sounds like a good improvement.

Comments closed

Columnstore Replication

Niko Neugebauer notes that certain columnstore index types are now supported for replication in SQL Server 2016:

At this point you can notice, that from our 3 tables there are only 2 are available for the transactional replication, with the disk-based table with a Clustered Columnstore Index is not being available for the replication.
This means that there are no improvements since SQL Server 2014 for the DWH/BI scenarios in this direction and this is definitely sad.
Well, we can always go a different direction, like in the case of Availability Groups in SQL Server 2014, where readable secondaries are supported only for the Nonclustered Columnstore Indexes. In SQL Server 2016 we can use Nonclustered Columnstore Index even on all columns if needed and get the principle improvements for the Batch Execution Mode.

Notice here that even though we can select the InMemory tables with Clustered Columnstore, there are a couple of additional important settings that needs to be configured to make things function. So clicking through the GUI Wizard will not set things correctly up by default.

The short answer, Niko states, is that you can only replicate non-clustered columnstore indexes at this time.  I can see some use for replicating clustered columnstore tables (warehouse scale-out scenarios, perhaps), but it  wasn’t at the top of my columnstore improvement list.

Comments closed