Press "Enter" to skip to content

Category: Architecture

Online Database Modelling Services

Robert Sheldon has a look at some web-based database modelling services:

Here I look at five tools that provide online modeling services, ranging from commercial products to free, open-source solutions. They include Vertabelo, GenMyModel, dbDiffo, WWW SQL Designer, and DbDesigner.net. Each one takes a different approach to delivering its services and completing basic tasks, such as adding tables, columns, or relationships. You’ll find that some tools are more feature-rich and user-friendly than others, but each one has its own advantages and charms.

That said, none of the tools provide the level of functionality you get with an advanced on-premises solution such as PowerDesigner or ER/Studio, but not everyone needs such an extensive set of features—or the price tag that goes with them. In fact, all five online tools come either completely free or have a free version available, making it possible to try all of them without having to commit one way or the other. Not surprisingly, the free versions associated with the commercial products have limitations on their use.

The hat I’d throw into the ring is draw.io.  It’s a Visio look-alike, so it’s more useful for high-level strategic diagrams than a true model.

Comments closed

Availability Groups And Failover Cluster Instances

Derik Hammer digs into a more complex architecture for HA+DR:

The Availability Group will handle the data synchronization between the data centers. Asynchronous mode is recommend due to potential network latency hindering primary site performance. With the combination of these two features, you meet HA with the FCI’s automatic failover locally and DR with manual failover of the Availability Group between sites.

Derik’s conclusion is important:  this isn’t the type of thing a brand new accidental DBA should try to build; there are a lot of moving parts here.  But if you want a robust solution and have the support people available to handle the relative complexity, this is a good option.

Comments closed

The Value Of Sparse Columns

Erin Stellato discusses sparse columns:

In conclusion, we see a significant reduction in disk space and IO when sparse columns are used, and they perform slightly better than non-sparse columns in our simple data modification tests (note that retrieval performance should also be considered; perhaps the subject of another post).

Sparse columns have a lot of potential value, but in my experience, they fall short in one huge way:  you cannot compress tables with sparse columns.  Given that both sparse columns and data compression are things which benefit from scale, it’s important to make the right choice upfront.

Comments closed

Setting Up A Scrub Server

Tom Norman talks about setting up a data scrubbing server:

With that said, when I setup a Scrub server, it is in a very secure area where the data access is very, very limited. For example, in my current company, the server is in a separate domain from Production and QA/Dev. Only DBAs are allowed to access this server. If you have multiple DBAs at your location, you may want to even limit which DBAs have access to this server. Our goal is to automate the entire scrubbing process so no one has to access the data including copying backup files from Production and to a shared scrub location for QA/Dev to retrieve.

Scrub servers are a way of stripping personally identifiable or sensitive information from production data so developers can safely use the data in lower, less secure environments.

Comments closed

Options To Capture Changed Data

Koen Verbeeck looks at various ways of capturing changed data:

  • In some very rare cases, you can actually use change data capture or change tracking on the source system. If you get one of those features implemented, you’re golden. But most of the time you’re not, as a lot of administrators don’t like them because of potential performance impact.

Koen lists several options.  One additional option is to use triggers to capture changes in a queue table.  If you are dealing with SCD-1 changes (in which you do not need a full reckoning of history) or periodic SCD-2 (in which you keep history but are okay with smashing some changes together if they’re within a time period between ETL loads), loading IDs of changed records into a queue table is reasonably efficient and gets around trying to make sure everybody updates the modified date.  It has its own drawbacks, though, starting with it using triggers…

Comments closed

Dropping Unique Key Constraints

Kenneth Fisher doesn’t like the way SQL Server implements unique key constraints:

So now that we all know the basics what could possibly have gone wrong? Well I was handed an error.

Msg 3723, Level 16, State 5, Line 21
An explicit DROP INDEX is not allowed on index ‘TblUniqueConstraint.uni_TblUniqueConstraint’. It is being used for UNIQUE KEY constraint enforcement.

Someone had created a process several years ago that dropped and re-created indexes (I’m not going to go into why right now). Well this particular index is used to enforce a unique constraint and so it can’t be dropped. If you want to follow along here is some quick code to duplicate the problem.

The appropriate way to drop a unique key constraint is ALTER TABLE [TableName] DROP CONSTRAINT [ConstraintName].

I disagree with Kenneth that there’s no value in unique key constraints (I’m guessing implicit in here is “in comparison to using CREATE UNIQUE NONCLUSTERED INDEX” syntax).  There’s a semantic difference between an index which happens to be unique versus a unique key constraint.  They’re implemented very similarly, but the point of the latter is to tell anybody using the data model that this set of attributes must be unique.

Comments closed

Relational Lives On

Tony Davis on NoSQL:

There have been some spectacular examples where the lack of transactional integrity of NOSQL databases led to financial disaster. Even ardent NoSQL enthusiasts did U-turns on the value of ACID-compliance. And therefore, slowly, inexorably many NoSQL database begin to acquire the essential characteristics of a relational database. MongoDB now offers joins; N1QL and U-SQL bring good old SQL-style querying to “NoSQL” data. Many of the NoSQL databases are now laboring towards some form of proper transactional support.

I enjoyed Robert Young’s first comment:

the notion that NoSql “databases” are more flexible isn’t even true: chaotic, yes. but flexible means being able to move without breaking, and NoSql, due to the lack of schema, means that all manner of inconsistencies and redundancies are allowed. that’s not flexible, that’s nuts.

Comments closed

Azure Storage Options

James Serra walks us through the list of storage options available on Azure:

Microsoft Azure is a cloud computing platform and infrastructure, created by Microsoft, for building, deploying and managing applications and services through a global network of Microsoft-managed and Microsoft partner-hosted datacenters.  Included in this platform are multiple ways of storing data.  Below I will give a brief overview of each so you can get a feel for the best use case for each, with links provided that go into more detail:

There are several options available, running the gamut from unstructured data (blob storage, file & disk storage), semi-structured data (data lake store), to structured data (Azure SQL Database) and a few points in between.

Comments closed

JSON Leads To New Wave Of 1NF Failures

Jovan Popovic talks about storing JSON in SQL Server:

Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don’t need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).

If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.

Okay, we’ve hit my first major problem with JSON support:  rampant violation of first normal form.  You can create check constraints on JSON code, and that’s pretty snazzy I guess, but I know a better way to store relational data in a relational database system.  JSON support is great when you ask SQL Server to be a holder of text blobs, but this is begging for bad design decisions.

Comments closed

The Logical Data Warehouse

Robert Sheldon is looking beyond the Enterprise Data Warehouse:

Organizations looking to take control of this onslaught of information are turning to other solutions to meet their data needs, either in addition to or instead of the traditional EDW. Quite often this means turning to a logical architecture that abstracts the inherent complexities of the big data universe. Such an approach embraces mixed environments through the use of distributed processing, data virtualization, metadata management, and other technologies that help ease the pain of accessing and federating data.

Dubbed the logical data warehouse (LDW), this virtual approach to a BI analytics infrastructure originated with Mark Beyer, when participating in Gartner’s Big Data, Extreme Information and Information Capabilities Framework research in 2011. According to his blog post “ Mark Beyer, Father of the Logical Data Warehouse, Guest Post ,” Beyer believes that the way to approach analytical data is to focus on the logic of the information, rather than the mechanics:

This feels like something that first-movers are starting to adopt, but won’t be mainstream for another 6-8 years.  That should give the idea some time to mature as we see the first round of successes and (more importantly) failures.

Comments closed