Press "Enter" to skip to content

Day: May 4, 2016

Unsharing The Database

Randy Shoup talks about scaling up through breaking out a shared database:

For an early- and mid-stage startup, a monolithic database is absolutely the appropriate architecture choice. With a small team and a small company, a single shared database made it simple to get started. Moving fast meant being able to make rapid changes across the entire system. A shared database made it very easy to join data between different tables, and it made transactions across multiple tables possible. These are pretty convenient.

As we have gotten larger, those benefits have become liabilities. It has become a single point of failure, where issues with the shared database can bring down nearly all of our applications. It has become a performance bottleneck, where long-running operations from one application can slow down others. Finally, and most importantly, the shared database has become a coupling point between teams, slowing down our ability to make changes.

I have my misgivings (as you’d expect from a database snob), particularly because I value highly the benefits of normalization and see sharded systems as a step backwards in that regard.  But even with that said, there are absolutely benefits to slicing out orthogonal sections of data; the point of disagreement is in those places in which two teams’ entities and attributes overlap.

Comments closed

Using Flow + Power BI For Reporting

John White combines Microsoft Flow with Power BI to generate real-time reports:

I may blog about that solution in the future, but with the Future of SharePoint event rapidly coming up, my BI Focal fellow collaborator, Jason Himmelstein convinced me that there was something more interesting that we could do with this. How about near real time monitoring of Twitter conversations for the event? All of the pieces were in place.

We rolled up our sleeves, and in relatively short order, had a solution. Jason has written about the experience on his SharePoint Longhorn blog, and he has included the videos that we put together, so I can be a little less detailed in this post.

Today I learned about Microsoft Flow.  Looks a lot like Zapier.

Comments closed

Recalculating Days

Brian Mitchell shows how to re-calculate prior days in Azure Data Lake using partitioning:

The question is what is the right time period to use? The answer is it depends on the size of your partitions.  Generally, for managed tables in U-SQL, you want to target about 1 GB per partition.  So, if you are bringing in say 800 mb per day then daily partitions are about right.  If instead you are bringing in 20 GB per day, you should look at hourly partitions of the data.

In this post, I’d like to take a look at two common scenarios that people run into.  The first is full re-compute of partitions data and the second is a partial re-compute of a partition.  The examples I will be using are based off of the U-SQL Ambulance Demo’s on Github and will be added to the solution for ease of your consumption.

The ability to reprocess data is vital in any ETL or ELT process.

Comments closed

Simplifying Spark Application Development

Ian Hellstrom has scripts to simplify Apache Spark application rollout:

When creating Apache Spark applications the basic structure is pretty much the same: for sbt you need the same build.sbt, the same imports, and the skeleton application looks the same. All that really changes is the main entry point, that is the fully qualified class. Since that’s easy to automate, I present a couple of shell scripts that help you create the basic building blocks to kick-start Spark application development and allow you to easily upgrade versions in the configuration.

Check these out if you’re interested in Spark.

Comments closed

BI Features In 2016

Jessica Moss compares 2014 versus 2016 with respect to Business Intelligence features:

Good-bye, Business Intelligence Edition
The biggest surprise to me was the removal of the Business Intelligence edition that was initially introduced in SQL Server 2012.  Truthfully, it never seemed to fit in the environments where I worked, so I guess it makes sense.  Hopefully, fewer licensing options will make it easier for people to understand their licensing and pick the edition that works best for them.

2016 looks to be a great version for BI.

Comments closed

Table Partitioning And Performance

Kendra Little has a video on table partitioning:

Learn why SQL Server’s table partitioning feature doesn’t make your queries faster– and may even make them slower.

In this 20 minute video, I’ll show you my favorite articles, bugs, and whitepapers online to explain where table partitioning shines and why you might want to implement it, even though it won’t solve your query performance problems.

Articles discussed are by Gail Shaw, Remus Rusanu, and the SQL Customer Advisory Team (SQLCAT). Scroll down below the video for direct links to each resource.

Check out the video.

Comments closed

Thoughts On Standard Edition

Joey D’Antoni points out new features available in Standard Edition:

In my circles, there are number of people who are complaining about the lack of features in standard edition. While I do agree that Always Encrypted should be in every version, as lack of strong data encryption is a problem that continues to confound IT. Putting Always Encrypted in all editions would be a good start to having wide ISV adoption of the Always Encrypted feature.

However, even without Always Encrypted, Microsoft added a LOT of new features to Standard Edition. Let’s list them (no specific order here):

There’s a pretty good amount of value in upgrading, even if you’re living on Standard Edition.

Comments closed