August 2019 – Page 4

On August 7, 2019, AMD finally unveiled their new 7nm EPYC 7002 Series of server processors, formerly code-named “Rome” at the AMD EPYC Horizon Event in San Francisco. This is the second generation EPYC server processor that uses the same Zen 2 architecture as the AMD Ryzen 3000 Series desktop processors. These new processors are socket compatible with the previous generation AMD EPYC 7001 Series processors, so they will work in existing model servers (with a BIOS update). Despite that, you will need a new model server to be able to use PCIe 4.0 support from the newer processors.
The AMD EPYC 7002 series includes 19 public launch SKUs that have anywhere from 8 to 64 physical cores, plus SMT, for twice the number of logical cores per processor. There are fourteen SKUs that will work in both one-socket and two-socket servers. There are also five less expensive processor SKUs (which have a “P” suffix) that only work in one-socket servers. This processor family has enough compute horsepower, memory bandwidth and capacity, and I/O bandwidth to support large server workloads on a single-socket server.

It certainly looks competitive. And that’s a great thing for consumers, even those who never make the switch, as it will force Intel to up its game.

Comments closed

SQL Server Database Recovery Models

Published 2019-08-27 by Kevin Feasel

John McCormack goes into the three database recovery models available in SQL Server:

This post is about database recovery models for SQL Server databases. Having the correct recovery model for a database is crucial in terms of your backup and restore strategy for the database. It also defines if you need to do maintenance of the transaction log or if you can leave this task to SQL Server. Let’s look at the various recovery models and how they work.

Click through for a description of each, including recommendations of when to choose each.

Comments closed

Configuring an Azure Databricks Cluster

Published 2019-08-26 by Kevin Feasel

Jon Gurgul explains cluster settings in Azure Databricks:

We need compute to run our notebooks and this is achieved by creating a cluster. A cluster is merely a number of Virtual Machines behind the scenes used to form this compute resource. The benefit of Azure Databricks is that compute is only chargeable when on.
Let’s now click the Clusters icon and set up a simple cluster. Once you have loaded the page you can use the “Create Cluster” button.

Click through for an explanation of what each of the settings means.

Comments closed

The Basics of Apache Airflow

Published 2019-08-26 by Kevin Feasel

Divyansh Jain explains what Apache Airflow is and takes us through a sample solution:

Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. The main functions of Apache Airflow is to schedule workflow, monitor and author.

It’s another interesting product in the Hadoop ecosystem and has additional appeal outside of that space.

Comments closed

Common VM Configuration Issues

Published 2019-08-26 by Kevin Feasel

Erik Darling takes us through some of the most common issues when setting up VMs to run SQL Server:

Everyone’s on VMs these days. That’s cool. Nothing against computers.
But here’s how people screw up SQL Server on VMs bigtime:
– Fewer than 4 cores (this is a licensing minimum)
– Multiple single-core sockets
– Not fully reserving memory
– Oversubscribing hosts

All four of those are bad and Erik explains why.

Comments closed

Azure SQL DB Serverless

Published 2019-08-26 by Kevin Feasel

John Morehouse takes us through the current public preview of Azure SQL Database Serverless:

Microsoft continues to expand the Azure ecosystem rapidly. One of the most recent delivered improvements was Azure SQL Database Serverless option. The new serverless model, which is currently in public preview, provides a compute tier for a single database that scales dynamically with the workload. This new compute tier is optimal for single database that have unpredictable usage patterns. Previously, you would provision the compute tier to your Azure SQL Database which allowed to have more granular control on scalability. The serverless tier scaling is effectively controlled by the service itself and will scale when needed.

This looks like it’d be good for a dev server, where the occasional startup cost is fine and there can be significant usage requirement differences based on time—after the devs go home, you’re not likely to need much compute.

Comments closed

Secure Strings in Powershell

Published 2019-08-26 by Kevin Feasel

Greg Moore shows how to build out secure strings in Powershell:

One of the more common problems I’ve faced with a particular client is setting up data extractions from outside sources such as SFTP servers. Since this client deals with PII data, it’s essential that this is done in as a secure manner as possible. For example, all connections need to be encrypted using protocols such as SFTP instead of FTP.
Securely connecting is particularly easy to accomplish if one uses a 3^rd party module such as Posh-SSH. However, you are still stuck with the problem of logging into the remote SFTP server securely.

Keeping credentials secured is something which Powershell makes pretty easy. I can’t think of a reason why you’d ever need to keep credentials in plaintext here.

Comments closed

Estimated Execution Plans in Azure Data Studio

Published 2019-08-26 by Kevin Feasel

Dave Bland walks us through the “Explain” button in Azure Data Studio:

At first I was thinking maybe it is a short cut to help. Nope, it is actually an easy way to get the estimated execution plan. The key is estimated execution plan. While in SQL Server Management Studio we have been able to easily get both the estimated or actual execution plan.
Just as a reminder, the actual execution plan requires the query to actually run. While the estimated plan will use statistics to generate the plan. Generally, if the actual and estimated are capture very close to each other, you are more than likely not going to see a difference between the tow.
Now let’s talk about the Explain button. It will return two things, the XML of the execution plan and a graphical representation of the execution plan.

There are a few differences between Azure Data Studio’s implementation of execution plans and SQL Server Management Studio’s.

Comments closed

The State of DevOps for Data Platform Professionals

Published 2019-08-26 by Kevin Feasel

Kendra Little summarizes the Accelerate: State of DevOps Report 2019 with a focus on what this means for data platform professionals:

While there are a ton of valuable insights in the report, in this post I will focus in on the findings which I believe are most relevant to those of us who work “close to a database.” There are three very interesting aspects of the research which hit close to home:
1. Speed and stability are not tradeoffs
2. Heavy change processes negatively impact speed and stability
3. Communities of practice are a common and successful tool to transform culture

Read on for Kendra’s detailed notes.

Comments closed

DBCC CHECKDB Error on Azure SQL Database

Published 2019-08-26 by Kevin Feasel

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Month: August 2019

Reviewing the AMD EPYC Line for SQL Server

SQL Server Database Recovery Models

Configuring an Azure Databricks Cluster

The Basics of Apache Airflow

Common VM Configuration Issues

Azure SQL DB Serverless

Secure Strings in Powershell

Estimated Execution Plans in Azure Data Studio

The State of DevOps for Data Platform Professionals

DBCC CHECKDB Error on Azure SQL Database