Press "Enter" to skip to content

Author: Kevin Feasel

In Defense Of The Layered Architecture

Sylvia Fronczak defends the layered architecture approach to software development:

There are disadvantages to the layered architecture approach. And some aspects of it have been deemed not architecturally pure for domain-driven design.

However, there are disadvantages to most approaches. The key is to understand both the advantages and disadvantages. Pick what architectural patterns work best for your particular problem.

Here are some factors you want to keep in mind.

  1. The layered architecture is very database-centric. As mentioned before, since everything flows down, it’s often the database that’s the last layer. Critics of this architecture point out that your application isn’t about storing data; it’s about solving a business problem. However, when so many applications are simple CRUD apps, maybe the database is more than just a secondary player.

  2. Scaleability can be difficult with a layered architecture. This is tied to the fact that many layered applications tend to take on monolithic properties. If you need to scale your app, you have to scale the whole app! However, that doesn’t mean your layered application has to be a monolith. Once it becomes large enough, it’s time to split it out—just like you would with any other architecture.

  3. A layered application is harder to evolve, as changes in requirements will often touch all layers.

  4. A layered architecture is deployed as a whole. That’s even if it’s modular and separated into good components and namespaces. But that might not be a bad thing. Unless you have separate teams working on different parts of the application, deploying all at once isn’t the worst thing you can do.

I completely disagree with critique #1 (as opposed to Sylvia’s defense):  most of the time, the database will outlive that puny application by a matter of decades.  That is, when some team is developing version 8 of the application, they’ll use essentially the same database as in V1.  Creating a design which optimizes for data storage and recall is, in general, much more important than which Javascript library you’re using or how you lay out those files.

Comments closed

The Key Hierarchy And SQL Server Encryption

David Fowler walks us through the various keys used in encrypting data in SQL Server:

I’m sure that we all know that SQL Server includes all sorts of interesting functionality to allow us to encrypt our data and like with all encryption techniques, that data is encrypted using keys.

In SQL Server we’ve got a number of different keys, we’ve got the Service Master Key, Database Master Keys, Symmetric Keys, Asymmetric Keys and Certificates.  These keys can be used to encrypt data but they can also be used to encrypt other keys and this is where the key hierarchy comes in.

Warning:  it’s turtles all the way down.

Comments closed

Power BI August Release And SSAS Performance Improvements

Chris Webb points out something new in the Power BI August 2018 release:

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

Read on for Chris’s test and analysis of the resulting MDX output.

Comments closed

Ways To Check For Non-Existence

Brent Ozar shows two methods for finding records missing associated child records:

You’re writing a query, and you wanna check to see if rows exist in a table.

I’m using the free Stack Overflow database, and I wanna find all of the users who have not left a comment. The tables involved are:

  • In dbo.Users, the Id field uniquely identifies a user.
  • In dbo.Comments, there’s a UserId field that links to who left the comment.

A quick way to write it is:

And this works fine. When you read the query, you might think SQL Server would run that SELECT * FROM dbo.Comments query for every single row of the Users table – but it’s way smarter than that, bucko. It scans the Comments index first because it’s much larger, and then joins that to the Users table.

But this isn’t the only way to query these tables, and Brent shows how to tell which method works better.

Comments closed

Is sp_execute_external_script Replacing CLR?

Solomon Rutzky makes me invoke Betteridge’s Law of Headlines:

With the additional (and annoying) configuration step required to get SQLCLR Assemblies to load starting in SQL Server 2017, some people have been wondering what is going on with SQLCLR. Considering that this new restriction is the only real change to SQLCLR since SQL Server 2012 (three versions ago), then with (limited) support for languages such as R (starting in SQL Server 2016) and Python (starting in SQL Server 2017) being added, it might even look like SQLCLR is being deprecated (i.e. phased-out) in favor of these new languages.

Could this be true? There is no official indication, but could it be unofficially / “effectively” deprecated? Well, let’s take a look.

As someone who likes CLR, I want both CLR and Machine Learning Services to co-exist.  This would be true even if ML Services supported F# and the lesser .NET languages.

1 Comment

Naive Bayes In Python

Kislay Keshari explains the Naive Bayes algorithm and shows an implementation in Python:

Naive Bayes in the Industry

Now that you have an idea of what exactly Naive Bayes is and how it works, let’s see where it is used in the industry.

RSS Feeds

Our first industrial use case is News Categorization, or we can use the term ‘text classification’ to broaden the spectrum of this algorithm. News on the web is rapidly growing where each news site has its own different layout and categorization for grouping news. Companies use a web crawler to extract useful text from HTML pages of news articles to construct a Full Text RSS. The contents of each news article is tokenized (categorized). In order to achieve better classification results, we remove the less significant words, i.e. stop, from the document. We apply the naive Bayes classifier for classification of news content based on news code.

It’s a good overview of the topic and a particular implementation in Python.  Naive Bayes is a technique which you want in the bag:  there are a lot of techniques which tend to be better in specific domains, but Naive Bayes is easy to implement and usually provides acceptable performance.

Comments closed

Working With Vectors In R

Dave Mason continues his quest to learn R, focusing on vectors.  First, he looks at vector-based mathematical operations:

Now we can determine the number of customers gained vs number of customers lost (plus/minus) for each month of the quarter by subtracting one vector from another. Each vector has the same number of elements (three), and the result is also a vector of three elements:

> net_customer_gain <- new_customers - customers_lost
> net_customer_gain
Jan Feb Mar 
-15  30   3 

The sum() function can be used to add up all the elements of a vector. Below, we get the total number of new customers and lost customers for the first quarter:

> sum(new_customers)
[1] 270
> sum(customers_lost)
[1] 252

Then he shows off subsetting in vectors:

To extract multiple elements from a vector, pass in an integer class vector to the square brackets. The values of the integer vector correspond to the elements to be extracted. Here we will extract the first, third, and fourth elements of the jersey_numbers vector:

> jersey_numbers[c(1,3,4)]
Pierce  Rondo  Allen 
    34      9     20  

The values of the integer vector can be in any order:

> jersey_numbers[c(4,1,3)]
 Allen Pierce  Rondo 
    20     34      9

Vectors are a critical part of understanding R.

Comments closed

The Problem With Meta-Packages

John Mount has a critique of meta-packages:

Derek Jones recently discussed a possible future for the R ecosystem in “StatsModels: the first nail in R’s coffin”.

This got me thinking on the future of CRAN (which I consider vital to R, and vital in distributing our work) in the era of super-popular meta-packages. Meta-packages are convenient, but they have a profoundly negative impact on the packages they exclude.

I’m not really sold on Jones’s argument, but I do think Mount has a good point.

Comments closed

Using Log Shipping With Delayed Recovery

Kenneth Igiri shows how you can combine log shipping with delayed recovery to prevent data loss due to user mistakes:

Now let’s talk about the key benefit of this delay. In the scenario, where a user inadvertently drops a table, we can recover the data quickly from the Secondary Database as long as the Delay period has not elapsed. In this example, we drop the table Sales.Orderlines on BOTH databases and verify that the table no longer exists in BOTH databases.

It’s a fairly expensive technique but potentially still cheaper than restoring from backups.

Comments closed

Grouping And Aggregating: Optimizing The Optimizer

Itzik Ben-Gan shows an example of how you can nudge the SQL Server optimizer to the right answer by rewriting a query:

As you can see, the groups are obtained by scanning the index on the groups table, and the aggregate is obtained by applying a seek in the index on the main table. The higher the density of the grouping set, the more optimal this plan is compared to the default strategy for the grouped query.

Just like we did earlier for the default scan strategy, let’s estimate the number of logical reads and plan cost for the seeks strategy. The estimated number of logical reads is the number of reads for the single execution of the Index Scan operator that retrieves the groups, plus the reads for all of the executions of the Index Seek operator.

The estimated number of logical reads for the Index Scan operator is negligible compared to the seeks; still, it’s CEILING(1e0 * @numgroups / @rowsperpage). Take Query 4 as an example; say the index idx_sid fits about 600 rows per leaf page (actual number depends on actual shipperid values since the datatype is VARCHAR(5)). With 5 groups, all rows fit in a single leaf page. If you had 5,000 groups, they would fit in 9 pages.

Plus some love for the APPLY operator.  Read the whole thing.

Comments closed