Press "Enter" to skip to content

Author: Kevin Feasel

Where Columnar Databases Struggle

Teo Lachev makes a good point regarding columnar databases:

A large company uses the SAP HANA ERP system. Users requires real-time access to transactional data. To avoid performance degradation, SLT replication (trigger-based change data capture) replicates data to another SAP HANA system that is used solely for reporting. The problem is that the more detailed the report gets and the more columns it has, the slower it gets and SAP HANA throws out of memory exceptions.

SAP HANA is an in-memory columnar database like Tabular. So, it stores data in columns, not rows. Columnar databases are primarily designed for analytical reports which typically have a few columns (sales by customer, product, date), but can potentially aggregate large datasets. As the reporting grain lowers and more columns are added (order number, order line item, customer name, phone number, etc.), a columnar database has to cross-join more and more columns. This is not efficient and performance quickly degrades irrespective that storage is fast. SSAS Tabular and Power BI are no different. SAP HANA complicates the issue further by preventing direct access to tables and requiring “analytical” views that join tables and potentially nest other views.

Read the whole thing.  Teo has a great point:  there are trade-offs between different data platform technologies, and choosing the right one is important.

Comments closed

Azure Data Lake Store: Now On Blob Storage

James Serra announces Azure Data Lake Store Gen2:

Big news!  The next generation of Azure Data Lake Store (ADLS) has arrived.  See the official announcement.

In short, ADLS Gen2 is the combination of the current ADLS (now called Gen1) and Blob storage.  Gen2 is built on Blob storage.  By GA, ADLS Gen2 will have all the features of both, which means it will have features such as limitless storage capacity, support all Blob tiers (Hot, Cool, and Archive), the new lifecycle management feature, Azure Active Directory integration, hierarchical file system, and read-access geo-redundant storage.

A Gen2 capability is what is called “multi-modal” which means customers can use either Blob object store APIs or the new Gen2 file system APIs.  The key here is that both blob and file system semantics are now supported over the same data.

One very interesting thing to me is that Gen2 pricing is half of Gen1.

Comments closed

Reporting Services And Availability Groups

Thomas Rushton has a good rant worked up:

Yes, you may have an availability group – well done – and you may have installed SSRS on both servers. But you’ve only set up the reporting application to point to one of those? And you’ve given the link https://<<Listener_Name>>/reports out to the users? Head/desk. I told you at the time that SSRS doesn’t play nicely with AGs. [Nearly misposted as SSRS doesn’t play nicely with SSRS, which, while valid, isn’t the point here…]

Here’s what you need to do to fix this / make sure it doesn’t happen:

Click through to learn what you need to do to make sure there are no problems.

Comments closed

Testing Network Speed Between Azure Regions

Dave Bermingham has some quick inter-region tests for Azure network performance:

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

Read on for Dave’s results.

Comments closed

Running SSMS As A Different Windows Account

Jason Brimhall shows a couple ways of running SQL Server Management Studio with different Active Directory credentials:

One of the tasks I find myself doing on a fairly regular basis is running SSMS as a different Windows User. The two biggest use cases for this are: a) to test an account to prove that it is working (or not) and has the appropriate level of access, and b) to use SSMS to connect to a Domain SQL Server from a computer in a different domain (or not on the domain).

In addition to needing to do these tasks for myself, I find that I need to show somebody else how to do the same thing on a fairly consistent basis. Considering the finite keystrokes we all have (which I referenced here), it is time for me to “document” how to do this task.

The “/netonly” command line parameter is one I’ve occasionally forgotten to my inevitable chagrin.

Comments closed

Gathering Punctuation With tidytext

Julia Silge uses the tidytext package to compare works of literature in terms of punctuation usage:

Commas are the PUNCTUATION WINNER, except in Anne of Green Gables and Ulysses, where periods win out. These two novels are dramatically different from each other in other ways, though, and Ulysses is an outlier overall with almost no spoken dialogue via quotation marks and an unusual use of colons to semicolons. Exclamation marks are used relatively more in Wuthering Heights and Alice in Wonderland!

Exploring text in these kinds of ways is so fun, and tools for this type of text mining are developing so fast. You can incorporate information like this into modeling or statistical analysis; Mike Kearney has a package called textfeatures that lets you directly extract info such as the number of commas or number of exclamation marks from text. Let me know if you have any questions!

Yet more proof that Ulysses was an awful book.

Comments closed

Converting XML To R Dataframes

Joachim Zuckarelli announces a new package:

The new R package flatxml provides functions to easily deal with XML files. When parsing an XML document fxml_importXMLFlat produces a special dataframe that is ‘flat’ by its very nature but contains all necessary information about the hierarchical structure of the underlying XML document (for details on the dataframe see the reference for the fxml_importXMLFlat function). flatxml offers a set of functions to work with this dataframe.

Apart from representing the XML document in a dataframe structure, there is yet another way in which flatxml relates to dataframes: the fxml_toDataFrame function can be used to extract data from an XML document into a dataframe, e.g. to work on the data with statistical functions. Because in this case there is no need to represent the XML document structure as such (it’s all about the data contained in the document), there is no representation of the hierarchical structure of the document any more, it’s just a normal dataframe.

Very interesting.  I’ve struggled a bit more with the xml2 package than I’d care to admit, so I might give this one a try.  H/T R-bloggers

Comments closed

Executing Python Code With SQL Server

Chris Hyde has started a series on executing external scripts with SQL Server ML Services:

We’ll start off with a simple step-by-step introduction to the sp_execute_external_script stored procedure.  This is the glue that enables us to integrate SQL Server with the Python engine by sending the output of a T-SQL query over to Python and getting a result set back. .  For example, we could develop a stored procedure to be used as a data set in an SSRS report that returns statistical data produced by a Python library such as SciPy.  In this post we’ll introduce the procedure and how to pass a simple data set into and back out of Python, and we’ll get into manipulating that data in a future post.

If you’d like to follow along with me, you’ll need to make sure that you’re running SQL Server 2017 or later, and that Machine Learning Services has been installed with the Python option checked.  Don’t worry if you installed both R and Python, as they play quite nicely together.  The SQL Server Launchpad service should be running, and you should make sure that the ability to execute the sp_execute_external_script procedure is enabled by running the following code:

Chris’s talks on ML Services (either R or Python) were great and I expect this series to be as well.

Comments closed

Exploding Rows With Tally Tables

Riley Major shows how to use a tally table to turn one row into several rows:

Imagine a simple table containing contact information. It has two places for phone numbers (Phone1 and Phone2). Let’s say you wanted to split those off into a related phone numbers table so you could support many more numbers for a single contact. As part of that process, you’d need to create two rows for each existing contact– one for each phone number.

One way to get two phone numbers is to handle each phone number as a separate process, combining a list of all contacts and their first number with those same contacts and their second number. So you’d SELECT everything from the table once for the first phone number and then UNION it with another SELECT. This works, but if you have a large list of columns or a complex set of JOINs, you end up repeating a lot of code.

Instead, what if you simply, magically double the rows and then pick and choose the columns you wanted from each of the rows?

Tally tables are a great “Get out of a bad design (relatively) cheap” tool.  They’re not something I use on a daily basis, but they’ve made life easier for me on dozens of occasions.

Comments closed

Finding The Max Number Of Sequences You Can Have In SQL Server

Jon Shaulis looks into how many sequences you can have on a SQL Server instance:

I thought this was an interesting question, but it makes sense to have some concern about it. If you are using Sequences over identity inserts, this typically means you need more control over how those numbers are handled from creation to archive. So what if you need a lot of different Sequences stored on your server to handle your needs? Will you run out of Sequence objects that you can create?

This information is not intuitively simple to find, this is one of those times where you need to read various articles and connect the dots. Let’s start simple

This is one of those cases where Swart’s 10% Rule comes into play.

Comments closed