Thinking Post-DRAM

Joe Chang argues that we may benefit more from a hardware architecture which uses lower-latency, lower-capacity RAM:

There are different types of SRAM. High-performance SRAM has 6 transistors, 6T. Intel may use 8T Intel Labs at ISSCC 2012 or even 10T for low power? (see real world tech NTV). It would seem that SRAM should be about six times less dense than DRAM, depending on the number of transistors in SRAM, and the size of the capacitor in DRAM.

There is a Micron slide in Micro 48 Keynote III that says SRAM does not scale on manufacturing process as well as DRAM. Instead of 6:1, or 0.67Gbit SRAM at the same die size as 4Gbit DRAM, it might be 40:1, implying 100Mbit in equal area? Another source says 100:1 might be appropriate.

Eye-balling the Intel Broadwell 10-core (LCC) die, the L3 cache is 50mm2, listed as 25MB. It includes tags and ECC on both data and tags? There could be 240Mb or more in the 25MB L3? Then 1G could fit in a 250mm2 die, plus area for the signals going off-die.

There is a lot of depth in this blog post.

The Importance Of Auditing

Louis Davidson has a parable about database design and systems auditing:

This brings me to my data question. If an order is processed in a store, but the expected data is not created, did that order ever occur?

Very often, the staff of a business are very focused on pleasing the customer, making sure they get their product, but due to software limitations, may not end up not capturing information about every sale in a satisfactory manner. Most of the blame I have seen lies in software that doesn’t meet the requirements of a customer, making capturing desired details tedious to achieve when the process is in the norm. Very often the excuse programmers give is that too much work of the work to build a system would need to be done for the atypical cases, but requirements are requirements, and it is generally essential that every action that occurs in a business is captured as data.

Read on for more.  My conjoined twin case is, how much information do we have about why users give up?  For example, if you have a three-part form, how many users get through part one, part two, and part three?  There’s some natural level of attrition, but if you see an abnormally low follow-through rate, that might indicate a bug or major issue.  Auditing is hard work, as you have to hit both sides of the problem at the same time.

Where Azure Analysis Services Fits

Melissa Coates explains where Azure Analysis Services fits in common BI architectures:

(2) Data Sources

  • From a single source such as a data warehouse. This is the most traditional path for BI development, and still has a very valid place in many BI/analytics deployments. This scenario puts the work of data integration on the ETL process into the data warehouse, which is the most appropriate place.

  • Directly from various systems.  This can be done, but works well only in specific cases – it definitely won’t work well if there are a lot of highly normalized tables, or if there’s not a straightforward way to relate the disparate data together. Trying to go directly to the source systems & skip an intermediary data warehouse puts the “integration” burden on the data source view in Analysis Services, so plan for plenty of time testing if you’re going to try this route (i.e., it can be much harder, not easier). Note that this option only makes sense if the data is stored in Analysis Services because it needs to be related together somehow (i.e., DirectQuery mode, discussed next in #3, with > 1 data source won’t work if a user tries to combine data sources because the data is not inherently related).

If you’re thinking about Azure Analysis Services, this post is a good one.

Understanding HTAP

James Serra explains what Hybrid Transactional and Analytical Processing means:

HTAP is used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing.  The term was created by Gartner in 2014.

In the SQL Server world you can think of it as: In-memory analytics (columnstore) + in-memory OLTP = real-time operational analytics.  Microsoft supports this in SQL Server 2016 (see SQL Server 2016 real-time operational analytics).

I’m not completely sold on HTAP yet, particularly once you get to high-scale OLTP systems doing hundreds of thousands of transactions per second.  That said, there’s always more and more pressure to get data available for analytics faster and faster.

SQL As A Declarative Language

Lukas Eder discusses one benefit to a declarative language like SQL:

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

It may be my innate contrarian curmudgeonliness, but I am moving more and more toward the idea that the easiest way to deal with data is a combination of SQL and functional programming languages, leaving OO out of the picture.

SQLite With Powershell

Phil Factor combines SQLLite, Powershell, and SQL Server:

 Although I dearly love using SQL Server, I wouldn’t use it in every circumstance; there are times, for example, when just isn’t necessary to use a Server-based RDBMS for a data-driven application. The open-source SQLite is arguably the most popular and well-tried-and-tested database ever. It is probably in your phone, and used by your browser. Your iTunes will use it. Most single-user applications that need to handle data will use SQLite because it is so reliable and easy to install.

It is specifically designed as a zero-configuration, embedded, relational database with full ACID compliance, and a good simple dialect of SQL92. The SQLite library accesses its storage files directly, using a single library, written in C, which contains the entire database system. Creating a SQLite database instance is as easy as opening a simple cross-platform file that contains the entire database instance. It requires no administration.

There’s a lot going on in this interesting article; I recommend giving it a read.

Optimizing Large Documents For Space

Raul Gonzalez drops a 2 TB table’s size in half:

So at work, I’d say space matters, and in order to optimize our storage requirements it’s very important to know about SQL Server internals, specially the Storage Engine, which happens to be one of my favorite topics of study.

In my quest to release some space I got to this database, just one table which is 165M of XML documents stored as NVARCHAR(MAX).

It was interesting walking through the process.  Some part of me wonders if it’s a bit complex for the next maintainer to handle, but saving a terabyte of disk space is worth a few extra pages of documentation…

OLAP On Hadoop

Tim Spann discusses OLAP options on the Hadoop stack:

Apache Kylin

For an introduction to this interesting Hadoop project, check out this article.   Apache Kylin originally from eBay, is a Distributed Analytics Engine that provides SQL and OLAP access to Hadoop datasets utilizing Hive and HBase.   It can use called through SparkSQL as well making for a very useful project.   This project let’s you work with PowerBI, Tableau and Excel with more tool support coming soon.    You can doMOLAP cubes and support many users with fast queries over billions of rows.   Apache Kylin provides JDBC and ODBC drivers.

There are a few interesting options here.


Kenneth Fisher discusses synchronous versus asynchronous in programming terms:

Synchronous – Code that runs one one line at a time. Each line of code is completed before the next one starts. If an external call is made then it is completed before the next line of code runs.

Asynchronous – Code that is launched and runs separately from the initial code. If a SQL job is launched from inside a batch of code (using sp_start_job for example) then the job is running in parallel (at the same time as) to the remainder of the batch of code.

Understanding which operations are synchronous versus asynchronous, and which operations are blocking versus non-blocking versus semi-blocking, will do wonders for improving application performance.


Anders Pedersen shares an easily-avoidable tale of woe:

ETL.  Spec said only Address Line 1 is needed to be loaded, so the developers only bring that line in (plus name, city etc.).  Fast forward 8 years, I get a request on my desk: “Please add Address Line 2 to import, and all tables.  Oh, and we need historical data for previously loaded files.  And for all address types”.

No normalization in this database (which is just one of about 40 databases with SIMILAR structure, but not identical).

Read on for the damage done, as well as another example of foresight saving the day.


January 2017
« Dec