Press "Enter" to skip to content

Category: Architecture

IoT Versus Event Hub

James Serra clarifies the differences between Azure’s IoT Hub and its Event Hub:

The majority of the time, if the data is coming directly from the devices, either directly or via a field-based gateway, IoT Hub will be the more appropriate choice.  Event Hub will generally be the more appropriate choice if either the data will not be coming to Azure directly from the devices, but rather either cloud-to-cloud through another provider, intra-cloud, or if the data is already landing on-premise and needs to be streamed to the cloud from a small number of endpoints internally.  There are exceptions to both conditions, of course.

Both solutions offer very high throughput data ingestion and can handle tremendous streaming data volumes.  In fact, today, IoT Hub is primarily a set of additional services that wrap an underlying Event Hub.

Read on for more scenarios and limitations in each.  They definitely serve different use cases.

Comments closed

Thinking About Availability Group Outages

Brent Ozar reminds us to think about graceful degradation of applications:

There’s a gray bar across the top that says, “This site is currently in read-only mode; we’ll return with full functionality soon.”

That’s not a hidden feature of Always On Availability Groups. Rather, it’s a hidden feature of really dedicated developers whose application:

This is where a bit of foresight and hard work can really pay off.  Read the whole thing.

Comments closed

Thinking Post-DRAM

Joe Chang argues that we may benefit more from a hardware architecture which uses lower-latency, lower-capacity RAM:

There are different types of SRAM. High-performance SRAM has 6 transistors, 6T. Intel may use 8T Intel Labs at ISSCC 2012 or even 10T for low power? (see real world tech NTV). It would seem that SRAM should be about six times less dense than DRAM, depending on the number of transistors in SRAM, and the size of the capacitor in DRAM.

There is a Micron slide in Micro 48 Keynote III that says SRAM does not scale on manufacturing process as well as DRAM. Instead of 6:1, or 0.67Gbit SRAM at the same die size as 4Gbit DRAM, it might be 40:1, implying 100Mbit in equal area? Another source says 100:1 might be appropriate.

Eye-balling the Intel Broadwell 10-core (LCC) die, the L3 cache is 50mm2, listed as 25MB. It includes tags and ECC on both data and tags? There could be 240Mb or more in the 25MB L3? Then 1G could fit in a 250mm2 die, plus area for the signals going off-die.

There is a lot of depth in this blog post.

Comments closed

The Importance Of Auditing

Louis Davidson has a parable about database design and systems auditing:

This brings me to my data question. If an order is processed in a store, but the expected data is not created, did that order ever occur?

Very often, the staff of a business are very focused on pleasing the customer, making sure they get their product, but due to software limitations, may not end up not capturing information about every sale in a satisfactory manner. Most of the blame I have seen lies in software that doesn’t meet the requirements of a customer, making capturing desired details tedious to achieve when the process is in the norm. Very often the excuse programmers give is that too much work of the work to build a system would need to be done for the atypical cases, but requirements are requirements, and it is generally essential that every action that occurs in a business is captured as data.

Read on for more.  My conjoined twin case is, how much information do we have about why users give up?  For example, if you have a three-part form, how many users get through part one, part two, and part three?  There’s some natural level of attrition, but if you see an abnormally low follow-through rate, that might indicate a bug or major issue.  Auditing is hard work, as you have to hit both sides of the problem at the same time.

Comments closed

Where Azure Analysis Services Fits

Melissa Coates explains where Azure Analysis Services fits in common BI architectures:

(2) Data Sources

  • From a single source such as a data warehouse. This is the most traditional path for BI development, and still has a very valid place in many BI/analytics deployments. This scenario puts the work of data integration on the ETL process into the data warehouse, which is the most appropriate place.

  • Directly from various systems.  This can be done, but works well only in specific cases – it definitely won’t work well if there are a lot of highly normalized tables, or if there’s not a straightforward way to relate the disparate data together. Trying to go directly to the source systems & skip an intermediary data warehouse puts the “integration” burden on the data source view in Analysis Services, so plan for plenty of time testing if you’re going to try this route (i.e., it can be much harder, not easier). Note that this option only makes sense if the data is stored in Analysis Services because it needs to be related together somehow (i.e., DirectQuery mode, discussed next in #3, with > 1 data source won’t work if a user tries to combine data sources because the data is not inherently related).

If you’re thinking about Azure Analysis Services, this post is a good one.

Comments closed

Understanding HTAP

James Serra explains what Hybrid Transactional and Analytical Processing means:

HTAP is used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing.  The term was created by Gartner in 2014.

In the SQL Server world you can think of it as: In-memory analytics (columnstore) + in-memory OLTP = real-time operational analytics.  Microsoft supports this in SQL Server 2016 (see SQL Server 2016 real-time operational analytics).

I’m not completely sold on HTAP yet, particularly once you get to high-scale OLTP systems doing hundreds of thousands of transactions per second.  That said, there’s always more and more pressure to get data available for analytics faster and faster.

Comments closed

SQL As A Declarative Language

Lukas Eder discusses one benefit to a declarative language like SQL:

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

It may be my innate contrarian curmudgeonliness, but I am moving more and more toward the idea that the easiest way to deal with data is a combination of SQL and functional programming languages, leaving OO out of the picture.

Comments closed

SQLite With Powershell

Phil Factor combines SQLLite, Powershell, and SQL Server:

 Although I dearly love using SQL Server, I wouldn’t use it in every circumstance; there are times, for example, when just isn’t necessary to use a Server-based RDBMS for a data-driven application. The open-source SQLite is arguably the most popular and well-tried-and-tested database ever. It is probably in your phone, and used by your browser. Your iTunes will use it. Most single-user applications that need to handle data will use SQLite because it is so reliable and easy to install.

It is specifically designed as a zero-configuration, embedded, relational database with full ACID compliance, and a good simple dialect of SQL92. The SQLite library accesses its storage files directly, using a single library, written in C, which contains the entire database system. Creating a SQLite database instance is as easy as opening a simple cross-platform file that contains the entire database instance. It requires no administration.

There’s a lot going on in this interesting article; I recommend giving it a read.

Comments closed

Optimizing Large Documents For Space

Raul Gonzalez drops a 2 TB table’s size in half:

So at work, I’d say space matters, and in order to optimize our storage requirements it’s very important to know about SQL Server internals, specially the Storage Engine, which happens to be one of my favorite topics of study.

In my quest to release some space I got to this database, just one table which is 165M of XML documents stored as NVARCHAR(MAX).

It was interesting walking through the process.  Some part of me wonders if it’s a bit complex for the next maintainer to handle, but saving a terabyte of disk space is worth a few extra pages of documentation…

Comments closed

OLAP On Hadoop

Tim Spann discusses OLAP options on the Hadoop stack:

Apache Kylin

For an introduction to this interesting Hadoop project, check out this article.   Apache Kylin originally from eBay, is a Distributed Analytics Engine that provides SQL and OLAP access to Hadoop datasets utilizing Hive and HBase.   It can use called through SparkSQL as well making for a very useful project.   This project let’s you work with PowerBI, Tableau and Excel with more tool support coming soon.    You can doMOLAP cubes and support many users with fast queries over billions of rows.   Apache Kylin provides JDBC and ODBC drivers.

There are a few interesting options here.

Comments closed