Press "Enter" to skip to content

Curated SQL Posts

Architecting Semi-Structured Data Solutions

James Serra gives four architectural scenarios for handling large quantities of semi-structured data:

An evolution of the three previous scenarios that provides multiple options for the various technologies.  Data may be harmonized and analyzed in the data lake or moved out to a EDW when more quality and performance is needed, or when users simply want control.  ELT is usually used instead of ETL (see Difference between ETL and ELT).  The goal of this scenario is to support any future data needs no matter what the variety, volume, or velocity of the data.

Hub-and-spoke should be your ultimate goal.  See Why use a data lake? for more details on the various tools and technologies that can be used for the modern data warehouse.

Check it out for a high-level architectural view of contemporary warehousing choices.  I prefer having both systems in play:  the EDW answers known business questions and gives you back report information relatively quickly; whereas the Hadoop cluster allows you to do spelunking, data cleansing, and answer unanticipated business questions.

Comments closed

Analyze Transaction Log Hierarchy

Paul Randal has written code to determine how much log an operation has generated:

Over the weekend there was a discussion on the MVP distribution list about the sys.dm_tran_database_transactions DMV and how one cannot use it to accurately determine how much log an operation has generated because it doesn’t provide a roll-up of the sub-transaction metrics to the outer transaction. This makes the output somewhat non-intuitive.

The discussion prompted me to write some code I’ve been meaning to do since 2012, when SQL Server 2012 introduced a field in LOP_BEGIN_XACT log records that tracks the transaction ID of the parent transaction, allowing the hierarchy of transactions to be investigated.

The actual code is at the bottom of the article, and is available in a zip file here.

It provides two stored procs, sp_SQLskillsAnalyzeLog and sp_SQLskillsAnalyzeLogInner, with the former making use of the latter, and the latter calling itself recursively.

I’d consider this squarely in the advanced troubleshooting realm.  Definitely read the whole thing and Paul’s code.

Comments closed

The Joy Of Trello

Chrissy LeMaire praises Trello:

The SQLPS Trello board worked out so well, that when the topic of getting overdue some improvements into SSMS was brought up, Microsoft suggested that setting up another board for SQL Server Management Studio might work well too.

If you use either of these products, I strongly suggest you join us and give Microsoft your feedback. They actively participate on both boards which is just amazing.

I’ve never been a huge Trello fan, but I do admit that I’m looking for a good tool for personal planning, so maybe I’ll give it another try.

Comments closed

Fibonacci Series Calculation

Daniel Hutmacher shows a few methods for calculating Fibonacci sequences in T-SQL:

There’s a really nice mathematical way that I found on Stack Overflow, using the golden ratio (from Wikipedia). And it’s actually set-based. I don’t have the requisite mathematical skills to evaluate the correctness or precision. Note that it returns afloat value result, the upside of which is that you can calculate much higher values. The downside is the loss of precision that comes with float.

This is a bit of a brain teaser but if you learn the techniques, they can definitely come in handy in the future.  I like the third solution because it’s a reminder that writing code is just as much about understanding the domain as it is understanding the syntax.

Comments closed

DocumentDB Updates

Jen Stirrup brings news on DocumentDB updates:

Let’s have a look at the biggest news: DocumentDB now has protocol support for MongoDB. From the industry perspective, this is great news. MongoDB is one of the most easily-recognised NoSQLs databases. Now that DocumentDB can be a great supporting act for MongoDB, it means that architects have a broader range of tools to support business needs in the enterprise data architecture, whilst increasing business capability using the Azure cloud. How does it work? Well, or MongoDB at the wire protocol level, which means that the existing MongoDB drivers will function against DocumentDB. For IT departments, it means that enterprises can persist data in DocumentDB behind the scenes. With it, it brings the reliability, stability and resilience of the Azure cloud. It also means that these technologies are accessible for small to medium enterprises in a way that they can afford, backed up with the stability and support from Microsoft Azure that they may find difficult to service on their own.

DocumentDB has a long way to go before it catches up to MongoDB, but these are improvements that close the gap a little bit.

Comments closed

Introducing Timing Delays

Andrea Allred introduces a timing delay to solve a log growth problem:

We have a lovely Availability Group that holds A LOT of data that is broken into partitions.  We have 42 partitions and they are usually moving information around daily between them.   The index rebuilds on them were making our logs HUGE because the the Availability Group was taking too long to catch up, we tried both Synchronous and Asynchronous mode.  We would see all kinds of errors.  We were doing horrible things like auto shrinking our transaction log after the indexing finished and  ignoring alarms during the time the database was rebuilding.  We had requested more and more space from our storage team and sometimes the job wouldn’t even finish because it ran out of space.  Our first idea was to split out the index rebuilds so that we could do one partition at a time.

I’d call this a bit of a hack, but if it’s documented and it does the job better than any known alternative, it’s a good decision.

Comments closed

Power BI With Mapping

Dustin Ryan shows how to use the Bing Maps API in Power BI visualizations:

The good news is that Power Query in Power BI is flexible enough for us to take advantage of the Bing Maps API so that we can lookup the missing pieces of information we need. In this blog post, I’m going to show you how you can use the Bing Maps API to look up an address based on a latitude and longitude or use a street address to find a latitude and longitude for the location.

I love how map visualizations nowadays are so easy.

Comments closed

Compatibility Level 130 With Legacy Cardinality

Jos de Bruijn notes that there is a way to use the legacy cardinality estimator even when you are in Compatibility Level 130 (new in SQL Server 2016):

In SQL Server 2016 there are a lot of goodies under Compatibility Level 130, such as performance improvements with Columnstore and In-Memory OLTP and all the Query Optimizer enhancements done over the years under Trace Flag 4199. So we are faced with a situation where there are workloads that can benefit from some enhancements that come with Compatibility Level 130, but that cannot use the new CE.

The solution is to use one of the new database-scoped configuration options, namely the Legacy Cardinality Estimation option. You can enable this using the following ALTER DATABASE command:

I’ve had a very positive experience with the new cardinality estimator, but I certainly appreciate the option being there just in case.

Comments closed

Power BI Updates

Julie Smith points out new Power BI updates and she is excited about them:

Today, Microsoft announced the April Power BI Desktop updates and they include something I and my colleagues at Innovative Architects have been long awaiting: formatting for table reports!

While it was understandable that initially Power BI wanted to focus on visualizations first and foremost, the display for pure tabular data in Power BI was lackluster. There was no control over any of the following:

  1. Font size

  2. Colors –of any element in a table, font, background color, title, totals. Nada

  3. Borders

I am simultaneously impressed that the Power BI team can so consistently push out crazy numbers of updates and glad that I don’t personally need to keep up on all of the changes…

Comments closed

Row-Level Security With Power BI

Koen Verbeeck discusses dynamic, row-level security using Power BI:

Apparently, in PBI the domain is used, not the email. Good to know if row level security is implemented in PBI someday (note: currently it’s only available in the Power BI Service).

OK, now let’s upload this report to the Power BI Service so we can implement the RLS. When the report is opened, it seems  that my carefully crafted measures are ignored. Maybe Power BI ignores the USERNAME() function as long as there is no RLS defined.

I like the fact that you can test by role, making it a lot easier to see what others would see under those circumstances and finding bugs with security implementation.  If you have any non-public information and you use Power BI, this is a must-read.

Comments closed