Press "Enter" to skip to content

Month: August 2018

Configuring tempdb

Jeff Mlakar looks at some basic guidelines for tempdb and shows how to configure this database:

The basic guidelines are:

  • Each tempdb data file should be the same initial size

  • Autogrowth to tempdb files should be an explicit value in MB instead of a percentage. Choose a reasonable value based on the workload. Ex. 64MB, 128MB, 256MB, etc.

  • The number of tempdb files should be 1 per logical processor core up to 8. At that point the performance should be monitored and if more tempdb files are needed they should be added in sets of 4.

  • Ideally the tempdb files are sized up to the max they will need and never have to autogrow.

  • Use trace flags 1117 and 1118 for versions of SQL Server < 2016. In SQL Server 2016 these trace flags are defaults.

    • Trace flag 1117: when a file in the filegroup meets the autogrow threshold, all files in the filegroup grow together

    • Trace flag 1118: Removes most single page allocations on the server, reducing contention on the SGAM page. TLDR; no more mixed extents – use the whole page.

There are some good pieces of advice here, and Jeff includes a great example of a terrible setup.

Comments closed

Making A Readable Presentation Template

Meagan Longoria has some advice for presentation templates:

The title text is 36pt Segoe UI Light, the subheading text is 24pt Segoe UI, and the speaker info text is 14 pt Segoe UI.

Those font sizes alone make it very hard to read from the back of even the smaller rooms at the conference.

In addition to being too small, the gray text for the speaker info doesn’t have enough contrast from the white background. We want to get a contrast ratio of at least 4.5:1 (but 7:1 would be better). The contrast ratio for these colors is 4.0.

While sans serif fonts are generally thought to be easier to read in presentations, it’s better to use fonts with a stroke width that is not too thin – not necessarily wider characters, but thicker lines that make up each letter. So Segoe UI Light would not be my first choice for a title font, but Segoe UI or Segoe UI Bold might be ok.

Also, the red used on the right half of the slide is VERY bright for an element that is purely decorative, to the point that it might be distracting for some people. And the reason we need to squish our title into two lines of too-small text is because that giant red shape takes up half the page. What is more important: a “pretty” red shape to make our slide look snazzy or being able to clearly read the title of the presentation?

There’s a lot along these lines, and it’s great food for thought.  Meagan includes a set of recommendations at the end, so be sure to catch those.

Comments closed

Binning And Recoding In R

Sebastian Sauer shows a few methods of practical data reshaping in R:

Recoding means changing the levels of a variable, for instance changing “1” to “woman” and “2” to “man”. Binning means aggregating several variable levels to one, for instance aggregating the values From “1.00 meter” to “1.60 meter” to “small_size”.

Both operations are frequently necessary in practical data analysis. In this post, we review some methods to accomplish these two tasks.

Click through for examples of techniques you can use.

Comments closed

Running Apache Kafka On Kubernetes

Rohit Bakhshi walks us through how to install Kafka on a Kubernetes cluster:

Now available on GitHub in developer preview are open-source Helm Chart deployment templates for Confluent Platform components. These templates enable developers to quickly provision Apache Kafka, Apache ZooKeeper, Confluent Schema Registry, Confluent REST Proxy, and Kafka Connect on Kubernetes, using official Confluent Platform Docker images.

Helm is an open-source packaging tool that helps you install applications and services on Kubernetes. Helm uses a packaging format called charts. A chart is a collection of YAML templates that describe a related set of Kubernetes resources.

For stateful components like Kafka and ZooKeeper, the Helm Charts use both StatefulSets to provide an identity to each pod in the form of an ordinal index, and Persistent Volumes that are always mounted for the pod. For stateless components, like REST Proxy, the Helm Charts utilize Deployments instead to provide an identity to each pod. Each component’s charts utilize Services to provide access to each pod.

Read on for more.

Comments closed

Creating A Basic Availability Group

Ginger Keys shows us two methods for creating Basic Availability Groups in SQL Server 2016 and later:

AlwaysOn Basic Availability Groups (BAGs) are available with SQL Server 2016 and 2017 Standard edition.  The functionality is generally the same as database mirroring (which has been deprecated).  This feature replicates transactions to a database on a secondary server, and is useful for disaster recovery should something happen to the primary server.

If you have a database that requires an extra layer of protection or ‘BAG of tricks’, deploying a Basic Availability Group is useful for providing disaster recovery and high availability for the one database.  Also there is major cost savings since it is not necessary to purchase SQL Enterprise edition…this can be done in Standard edition of SQL Server.

BAGs provide a failover environment for only one database, and there can only be two replicas in the group.  Replication can be synchronous or asynchronous, and there is no read access, no backups and no integrity checks on the secondary.   The secondary replica remains inactive unless there is a failover, and Basic AGs can remain on-premises or span from on-prem to Azure.

Read on for the two methods.

Comments closed

In Defense Of The Layered Architecture

Sylvia Fronczak defends the layered architecture approach to software development:

There are disadvantages to the layered architecture approach. And some aspects of it have been deemed not architecturally pure for domain-driven design.

However, there are disadvantages to most approaches. The key is to understand both the advantages and disadvantages. Pick what architectural patterns work best for your particular problem.

Here are some factors you want to keep in mind.

  1. The layered architecture is very database-centric. As mentioned before, since everything flows down, it’s often the database that’s the last layer. Critics of this architecture point out that your application isn’t about storing data; it’s about solving a business problem. However, when so many applications are simple CRUD apps, maybe the database is more than just a secondary player.

  2. Scaleability can be difficult with a layered architecture. This is tied to the fact that many layered applications tend to take on monolithic properties. If you need to scale your app, you have to scale the whole app! However, that doesn’t mean your layered application has to be a monolith. Once it becomes large enough, it’s time to split it out—just like you would with any other architecture.

  3. A layered application is harder to evolve, as changes in requirements will often touch all layers.

  4. A layered architecture is deployed as a whole. That’s even if it’s modular and separated into good components and namespaces. But that might not be a bad thing. Unless you have separate teams working on different parts of the application, deploying all at once isn’t the worst thing you can do.

I completely disagree with critique #1 (as opposed to Sylvia’s defense):  most of the time, the database will outlive that puny application by a matter of decades.  That is, when some team is developing version 8 of the application, they’ll use essentially the same database as in V1.  Creating a design which optimizes for data storage and recall is, in general, much more important than which Javascript library you’re using or how you lay out those files.

Comments closed

The Key Hierarchy And SQL Server Encryption

David Fowler walks us through the various keys used in encrypting data in SQL Server:

I’m sure that we all know that SQL Server includes all sorts of interesting functionality to allow us to encrypt our data and like with all encryption techniques, that data is encrypted using keys.

In SQL Server we’ve got a number of different keys, we’ve got the Service Master Key, Database Master Keys, Symmetric Keys, Asymmetric Keys and Certificates.  These keys can be used to encrypt data but they can also be used to encrypt other keys and this is where the key hierarchy comes in.

Warning:  it’s turtles all the way down.

Comments closed

Power BI August Release And SSAS Performance Improvements

Chris Webb points out something new in the Power BI August 2018 release:

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

Read on for Chris’s test and analysis of the resulting MDX output.

Comments closed

Ways To Check For Non-Existence

Brent Ozar shows two methods for finding records missing associated child records:

You’re writing a query, and you wanna check to see if rows exist in a table.

I’m using the free Stack Overflow database, and I wanna find all of the users who have not left a comment. The tables involved are:

  • In dbo.Users, the Id field uniquely identifies a user.
  • In dbo.Comments, there’s a UserId field that links to who left the comment.

A quick way to write it is:

And this works fine. When you read the query, you might think SQL Server would run that SELECT * FROM dbo.Comments query for every single row of the Users table – but it’s way smarter than that, bucko. It scans the Comments index first because it’s much larger, and then joins that to the Users table.

But this isn’t the only way to query these tables, and Brent shows how to tell which method works better.

Comments closed

Is sp_execute_external_script Replacing CLR?

Solomon Rutzky makes me invoke Betteridge’s Law of Headlines:

With the additional (and annoying) configuration step required to get SQLCLR Assemblies to load starting in SQL Server 2017, some people have been wondering what is going on with SQLCLR. Considering that this new restriction is the only real change to SQLCLR since SQL Server 2012 (three versions ago), then with (limited) support for languages such as R (starting in SQL Server 2016) and Python (starting in SQL Server 2017) being added, it might even look like SQLCLR is being deprecated (i.e. phased-out) in favor of these new languages.

Could this be true? There is no official indication, but could it be unofficially / “effectively” deprecated? Well, let’s take a look.

As someone who likes CLR, I want both CLR and Machine Learning Services to co-exist.  This would be true even if ML Services supported F# and the lesser .NET languages.

1 Comment