Press "Enter" to skip to content

Author: Kevin Feasel

Windows Versus SQL Authentication

Kenneth Fisher looks into why we generally consider Windows authentication more sound than SQL authentication:

A quick search on the internet took me here: Choosing an Authentication Mode. And if you go down to the section Connecting Through Windows Authentication it points out a few important things and then even farther down the section Disadvantages of SQL Server Authentication has a bit more. Then I found a couple of good forum questions here and here. In summary (and only discussing actual security features):

Click through for the answers.  Also read Cristian Satnic’s comments below, as Cristian is correct about wanting to keep passwords hashed instead of encrypted.  Incidentally, Windows passwords aren’t encrypted, either—they’re hashed.

Comments closed

Pausing Online Index Rebuilds

Andrew Pruski demos resumable online index rebuilds in SQL Server 2017:

Pretty cool, huh? Full information on this can be found here: –https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-index-transact-sql

I think this is very useful but we do need to be careful. The documentation says that pausing an online index rebuild for a long time may affect query performance and disk utilisation. This is due to the newly rebuild index being created side-by-side to the original one so we’ll need to watch out for that.

It’s a good first look at what might be a very interesting solution for companies tight on maintenance window time.

Comments closed

Default Column Storage

Paul Randal explains how default column values are stored:

And selecting the initial 10 rows can be demonstrated to return the 3rd column using the initial default set in step 3. (It makes no difference if any rows are added between steps 3 and 4.)

This means that there *must* be two default values stored when a new column is added: one for the set of already-existing rows that don’t have the new column and one for any new rows. Initially these two default values will be the same, but the one for new rows can change (e.g. in steps 4 and 5 above) with breaking the old rows. This works because after the new column is added (step 3 above), it’s impossible to add any more rows that *don’t* have the new column.

And this is exactly how it works. Let’s investigate!

In typical Paul Randal fashion, this is both a look at internals and an interesting explanation.

Comments closed

Securing Azure Storage

Christos Matskas has an article on securing Azure blobs and containers:

All communication with the Azure Storage via connection strings and BLOB URLs enforce the use of HTTPS, which provides Encryption in Transit. You can enforce the use of “Always HTTPS” by setting the connection string like this: “DefaultEndpointsProtocol=https;AccountName=myblob1…” or in SAS signatures, as in the example below:

https://myblob1.blob.core.windows.net/?sv=2015-04-05&ss=bfqt&srt=sco&sp=rwdlacup&se=2016-09-22T02:21:41Z&st=2016-09-21T18:21:41Z&spr=https&sig=hxInpKBYAxvwdI9kbBglbrgcl1EJjHqDRTF2lVGeSUU%3D

To protect data at rest, the service provides an option to encrypt the data as they are stored in the account. There’s no additional cost associated with encrypting the data at rest and it’s a good idea to switch it on as soon as the account is created. There is a one-click setting at the Storage Account level to enable it, and the encryption is applied on both new and existing storage accounts. The data is encrypted with AES 256 cipher and it’s now generally available to all Azure regions and Azure clouds (public, government etc)

There’s some good information here, making it worth the read.

Comments closed

Survival Analysis

Joseph Rickert explains what survival analysis is and shows an example with R:

Looking at the Task View on a small screen is a bit like standing too close to a brick wall – left-right, up-down, bricks all around. It is a fantastic edifice that gives some idea of the significant contributions R developers have made both to the theory and practice of Survival Analysis. As well-organized as it is, however, I imagine that even survival analysis experts need some time to find their way around this task view. (I would be remiss not to mention that we all owe a great deal of gratitude to Arthur Allignol and Aurielien Latouche, the task view maintainers.) Newcomers, people either new to R or new to survival analysis or both, must find it overwhelming. So, it is with newcomers in mind that I offer the following slim trajectory through the task view that relies on just a few packages: survival, KMsurv, Oisurv and ranger

The survival package, which began life as an S package in the late ’90s, is the cornerstone of the entire R Survival Analysis edifice. Not only is the package itself rich in features, but the object created by the Surv() function, which contains failure time and censoring information, is the basic survival analysis data structure in R.

Survival analysis is an interesting field of study.  In engineering fields, the most common use is calculating mean time to failure, but that’s certainly not the only place you’re liable to see it.

Comments closed

Outliers In Histograms

Edwin Thoen has an interesting solution to a classic problem with histograms:

Two strategies that make the above into something more interpretable are taking the logarithm of the variable, or omitting the outliers. Both do not show the original distribution, however. Another way to go, is to create one bin for all the outlier values. This way we would see the original distribution where the density is the highest, while at the same time getting a feel for the number of outliers. A quick and dirty implementation of this would be

hist_data %>% 
  mutate(x_new = ifelse(x > 10, 10, x)) %>% 
  ggplot(aes(x_new)) +
  geom_histogram(binwidth = .1, col = "black", fill = "cornflowerblue")

Edwin then shows a nicer solution, so read the whole thing.

Comments closed

Handling Rogue Queries In Spark

Alicja Luszczak, et al, introduce the Query Watchdog:

The previous query would cause problems on many different systems, regardless of whether you’re using Databricks or another data warehousing tool. Luckily, as an user of Databricks, this customer has a feature available that can help solve this problem called the Query Watchdog.

Note: Query Watchdog is available on clusters created with version 2.1-db3 and greater.

A Query Watchdog is a simple process that checks whether or not a given query is creating too many output rows for the number of input rows at a task level. We can set a property to control this and in this example we will use a ratio of 1000 (which is the default).

It looks like this is an all-or-nothing process, but a very interesting start.

Comments closed

Deploying Reports With Powershell

Jana Sattainathan has created a few Powershell functions to automate dealing with SQL Server Reporting Services report deployment:

In this post, I want to publish a few functions that I created around SSRS. They are related to and depend on each other.

  • Get-SSRS – Given the SSRS URI returns the WSDL endpoint

  • Get-SSRSReport – Returns one or more reports based on inputs

  • Get-SSRSSharedDataSource – Returns one or more shared data sources based on inputs

  • Get-SSRSReportDataSource – Returns the data source information on a report by report basis based on inputs

  • Set-SSRSReportDataSource – Sets the data source of a report to the given data source.

  • Install-SSRS – Deploys an SSRS report to a specific folder and also optionally sets the datasource for the deployed report

Very useful.

Comments closed

Cloudera Accessing Azure Data Lake Store

The Azure Data Lake team has announced that you can now access Azure Data Lake Store using a Cloudera cluster:

The Azure Data Lake (ADL) vision from the beginning has been to transform business data into intelligence by providing analytics on any data at cloud scale. ADL enterprise customers gain insights on their business data using a wide range of tools and platforms. Today’s release of Cloudera Enterprise 5.11 brings another very valuable and widely-used Hadoop computation platform to the set of platforms that can leverage ADLS. No matter what big data analytics platform you choose, Azure Data Lake Store provides a single high throughput enterprise-scale hierarchical file system data lake repository for big data.

Anyone with an Azure subscription can now deploy Cloudera clusters with ADLS. To get started, you can use the Cloudera Enterprise Data Hub template or the Cloudera Director template on Azure Marketplace to create a Cloudera cluster. Once the cluster is up, see here for more information on how to set up your Cloudera cluster with ADLS today!

That’s an interesting development.

Comments closed

Code Formatting

Bert Wagner has a few tips on code formatting to make it more readable:

The second example above consistently indents lines, adds new lines, and follows consistent coding patterns. This makes it easy to skim the code quickly.

Books have chapters, headings, and paragraphs defined by formatting that make it easy to find what is needed at a glance — formatting code makes it possible to find things easily too.

The examples Bert uses are all C#, but apply to most languages.  I think consistency is key, even more so than your ideal format.  This reduces friction between developers, at least outside of the “what should our coding standards be?” meetings…

Comments closed