Thinking About Databases At Scale

Chris Adkin has a great post explaining some of the hardware and query principles behind scale issues:

All execution plans iterators that require memory grants have two fundamental code paths, one path for when the memory grant is blown and memory spills out into tempdb and one for when the memory grant is correct or under-estimated. Perhaps the database engine team may at some point include a third option, which is for when the grant can be accommodated inside the CPU cache.

As an example, if you run a log record generation intensive workload on the same CPU socket as the log writer, usually socket 0, this will run in a shorter time compared to running the exact same workload in a different socket

This is the type of post where I catch just enough of it to know that I need to dig deeper and learn more.

Tracking Database Restoration-Related Errors

Adrian Buckman has a script which tracks error messages related to database restorations:

In one of my previous posts I went over a scenario where an Auto restore job was logging Restore errors to a table and the error that was being inserted was ‘3013 – RESTORE LOG is terminating abnormally’ and this was due to SQL Server only providing the Last most error produced which is stored within ERROR_NUMBER() and ERROR_MESSAGE() at point of error.

I found this error less than useful so I set out to try and log something more meaningful , which I ended up doing for the specific error (4305) which was being encountered at the time, but I wanted to make this better and less specific to the 4305 error.

This is a very interesting post and a good example of using built-in error handling functionality to help automate your processes.

BimlExpress 2017

Cathrine Wilhelmsen alerts us to a new version of BimlExpress:

.net intellisense

Wait, what? .NET intellisense? Does that mean C# and VB intellisense? You bet it does!

Now, let’s start with a small disclaimer. The release notes say: “Added preview mode of .NET intellisense for early adopters”. That means that this feature is not available out of the box for everyone, it has to be enabled per product key by Varigence. So how do you get it? It’s very simple: E-mail Varigence. Help them out by providing feedback and suggestions for improvements. If you want to go crazy, you may even mention BimlExpress in social media, blog posts or if you’re presenting somewhere. But that’s not a requirement. Just e-mail Varigence and ask. They’re nice guys 🙂

Cathrine also has a webinar coming up tomorrow on the topic.

Using Sqoop To Move Data To Hadoop

The folks at Redglue have a few hints on using Sqoop to move data from a relational database to Hadoop:

  • “Data gets updated” problem

Data gets updated many times and loading data with Sqoop is not a single event as data that you are importing can be updated (INSERTed, DELETed or UPDATed). What is important here, is that, HDFS is an “append-only filesystem” (exceptions made to HBase and Hive with ACID, but they are mostly tricks) and the options are pretty simple: replace the dataset, add data to dataset (partition for example) or merge datasets between old and new data.

If the data that you are loading is a small dataset, don’t think twice, replace and overwrite it.

If the data that you are loading is a big data set, a “incremental” load is recommended. This can be a little tricky as Sqoop needs to know what modification were done since the last incremental or full import.

I’m not a huge fan of Sqoop and prefer to use my own ingest mechanisms, but it’s an easy way to get started.

Basics Of Database Properties

Grant Fritchey explains some of the more important database properties:

Recovery Model

The Recovery Model option is something that you will manipulate constantly as you create databases. A full overview of what the different recovery models are and why you would choose each one will be covered in detail in a blog post later when we talk about database backups. Just to introduce the concept, if you set Recovery to Full, you will need to set up backups for your log. If you set Recovery to Simple then the logs will clean up on their own. There’s a lot more to the topic than just that, but that’s the simple part.

Read on for more.

SQL Server 2017 RC1 Now Available

The SQL Server team announces RC1 of SQL Server 2017:

In SQL Server 2017 RC1, there were several feature enhancements of note:

  • SQL Server on Linux Active Directory integration – With RC1, SQL Server on Linux supports Active Directory Authentication, which enables domain-joined clients on either Windows or Linux to authenticate to SQL Server using their domain credentials and the Kerberos protocol. Check out the getting started instructions.

  • Transport Layer Security (TLS) to encrypt data – SQL Server on Linux can use TLS to encrypt data that is transmitted across a network between a client application and an instance of SQL Server. SQL Server on Linux supports the following TLS protocols: TLS 1.2, 1.1, and 1.0. Check out the getting started instructions.

  • Machine Learning Services enhancements – In RC1, we add more model management capabilities for R Services on Windows Server, including External Library Management. The new release also supports Native Scoring.

  • SQL Server Analysis Services (SSAS)  In addition to the enhancements to SSAS from previous CTPs of SQL Server 2017, RC1 adds additional Dynamic Management Views, enabling dependency analysis and reporting. See the Analysis Services blog for more information.

  • SQL Server Integration Services (SSIS) on Linux  The preview of SQL Server Integration Services on Linux now adds support for any Unicode ODBC driver, if it follows ODBC specifications. (ANSI ODBC driver is not supported.)

  • SQL Server Integration Services (SSIS) on Windows Server  RC1 adds support for SSIS scale out in highly available environments. Customers can now enable Always On for SSIS, setting up Windows Server failover clustering for the scale out master.

Linux AD support is big.

Executing Powershell In SSIS

Daniel Calbimonte shows how to execute Powershell via C# in SQL Server Integration Services:

Get started

The article will include the following topics:

  1. Get the list of services using PowerShell in C#.

  2. How to send SSIS Parameters to PowerShell using the script task.

  3. How to use the PowerShell.addscript function.

  4. How to invoke a PowerShell script in C#

I think this is a fairly limited scenario—if you’re going to have to write C# code anyhow, you can do this same work in C#.  I suppose that it would be most useful in cases where you have to call common Powershell cmdlets rather than writing your own .NET code.

Hadoop Name Node Capacity Planning

Mamta Chawla has some rules of thumb for sizing your Hadoop name node:

Both name node servers should have highly reliable storage for their namespace storage and edit-log journaling. That’s why — contrary to the recommended JBOD for data nodes — RAID is recommended for name nodes.

Master servers should have at least four redundant storage volumes — some local and some networked — but each can be relatively small (typically 1TB).

It is easy to determine the memory needed for both name node and secondary name node. The memory needed by name node to manage the HDFS cluster metadata in memory and the memory needed for the OS must be added together. Typically, the memory needed by the secondary name node should be identical to the name node.

Click through for some specific recommendations.

ADO.Net In Powershell Core

Max Trinidad shows how to use Powershell Core on Windows or Linux to run T-SQL queries:

There’s a catch!

Nothing is perfect yet! Using the .NET Core version of System.Data, there’s a known issue with the datarow class. It seems it’s building the data results as string list of values without the column information.

But, there’s always a way to make thing work adding some extra code to work around this issue and reconstruct the data the way we want.

Max points out a couple of issues that exist today, but they’re getting resolved.

Copying Databases With dbatools

Mike Robbins has a video showing how to copy databases from several SQL Server instances using dbatools:

The video starts out by checking the default instance of SQL Server on a server named SQL17 to see if any user databases exist. Then the names of five different SQL Servers are piped to ForEach-Object. Within the ForEach-Object loop, $_ is a variable for the current object. It’s translated to each individual server name as it iterates through the list of SQL Servers, copying the user databases to SQL17. Only one user database exists on each of the source SQL Servers. The databases are backed up to the specified network share and restored to the destination server. The network share and any sub-folders that are specified must already exist. The account that SQL Server runs as on each of the servers must also have access to the network share. The names of the SQL Servers used in the demo correspond to the version of SQL Server they’re running. The SQL05 server is running Windows Server 2008 (non-R2) and does not have any version of PowerShell installed which means the Copy-SqlDatabase function is extremely versatile.

Click through to watch the video and see how quickly you can get going.

Categories

July 2017
MTWTFSS
« Jun  
 12
3456789
10111213141516
17181920212223
24252627282930
31