DBAs in the Cloud

Brent Ozar argues that production DBAs will still be important even at cloud-only companies:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.

I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.

A Sizing Problem with Heaps

Slava Murygin shows an example where adding a clustered index to an existing table can reduce its size:

Tables were very narrow with just a few columns and my expectations for data growth were very modest. However, after just a little while I was very surprised when my database showed huge unexpected growth and size of the data became multiple times higher than I’ve expected.

After very little research I’ve found and fixed the problem. In this post I’ll describe how I’ve done it.

Read on to learn how Slava figured this out and how a clustered index fixed the problem.

Managing Database Changes with Scripts

Ed Elliott continues a series on automated database deployments with a quick change process:

In my blog post here https://the.agilesql.club/blogs/ed-elliott/2019-06-10/steps-to-automated-database-deployments I described the steps you need to go through so you can build up your confidence that you are capable of deploying databases using automation. I mean, afterall, knowing that it is possible to automate your deployments and having confidence that they will succeed are two very different things.

Even with the best tooling in the world, automated database deployments are still a struggle and there is one key thing that you can do, no matter what tools you choose and that is to make the deployments re-runnable. (Insert discussion here on the word idempotent and how it means re-runnable but sounds far cooler and intellectual). If you make your deployments re-runnable then you can, by their very definiton, re-run them.

Click through for two options. I definitely prefer option number 1 as well.

Comparing On-Prem To Managed Instance Performance

Jovan Popovic has an article explaining how you can compare your current on-premises SQL Server’s performance to an Azure SQL Managed Instance’s performance:

In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:

1. Compare the environment settings on SQL Server and Managed Instance. 
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline

In the following sections will be described the best practices and the recommended approaches 

This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.

Quantum: Serverless Database

Kevin Feasel

2019-06-11

Cloud

Ashish Thusoo announces a new serverless SQL engine:

Today, we are launching Quantum, a high-performance serverless SQL engine, available on Qubole Data Platform, that simplifies SQL access by offering a true serverless deployment option to enable data analysts to query petabyte-scale volumes of data using ANSI-SQL.

Quantum allows teams to realize value from their data much more quickly, and because of its serverless nature, users pay only for queries they run. Data analysts can query object stores on AWS, Azure, Google Cloud, and Oracle Cloud in seconds to achieve faster time to market with far less IT management overhead.

Ankit Dixit, et al, provide a technical overview as well:

Existing serverless SQL service offerings do not provide users with the ability to use a  metastore of their choice. With Quantum, data teams can use their own custom metastore and start using Quantum without recreating schemas or table metadata.

Most existing Qubole customers already use a custom metastore in the cloud. So there’s virtually no ramp up time to reap the benefits of Quantum.

The technical overview is a bit too much marketing for my tastes, but this is a move worth watching.

Explaining Hadoop Struggles

Kevin Feasel

2019-06-11

Hadoop

Alex Woodie has an editorial taking us through why the major Hadoop distributors (Cloudera and MapR) are struggling so much lately:

The future is not clear for either Cloudera and MapR. While there are similarities in the two companies’ positions, there are big differences too.

Cloudera does not have a permanent CEO at the moment, and it still hasn’t shipped the new converged Hadoop distribution, dubbed Cloudera Data Platform (CDP), that will replace the old Cloudera and Hortonworks distributions. During its first quarter ended April 30, Cloudera said customers are holding off investing in the old Hadoop products since they know the new CDP is due by the end of the year. That fact led Cloudera to dramatically lower its revenue expectations for the year, which upset stockholders, who pushed Cloudera’s stock (NYSE: CLDR) down 40% the following day.

The way I’m phrasing it is that the Hadoop ecosystem is strong (with the successes of companies like Databricks and Confluent), but core Hadoop companies are struggling.

Combining Machine Learning with DevOps

Rolf Tesmer explains that machine learning and DevOps aren’t oil and water (or maybe they are and we just need to stir harder):

In talking with various development teams, customers and DevOps engineers, a lot of the potential problems of meshing ML development into an enterprise DevOps process can be boiled down to a few different areas this aims to address…

ML stack might be different from rest of the application stack
– Testing accuracy of ML model
– ML code is not always version controlled
– Hard to reproduce models (ie explainability)
– Need to re-write featurizing + scoring code into different languages
– Hard to track breaking changes
– Difficult to monitor models & determine when to retrain

So DevOps helps with this, right? Right?

Well er, some of them yes, but not all.

DevOps is not a panacea but it can solve certain types of problems well.

Making Dynamic SQL Safe

Erik Darling explains patiently that if you use sp_executesql wrong, you don’t get the benefits of using it right:

The gripes I hear about fully fixing dynamic SQL are:

– The syntax is hard to remember (setting up and calling parameters)
– It might lead to parameter sniffing issues

I can sympathize with both. Trading one problem for another problem generally isn’t something people get excited about.

But there are good reasons fully to fix it, so read on.

Replacing Bidirectional Filters with Visual Filters

Alberto Ferrari shows how you can replace bidirectional filters with visual filters in Power BI and improve visual quality:

The noticeable thing about the behavior of the slicer is that the two matrices are showing only the brands and colors purchased by Amanda. Yet, the Color slicer is still showing all the colors, even though we know Amanda only purchased three colors: Grey, Silver and White.

The reason is that the matrices, like most Power BI visuals, hide rows if the measure they are showing produces a blank. Because Amanda did not buy any pink product, the value of Sales Amount for Pink results in a blank, therefore the matrix removes the pink color from its result. Prior to the May 2019 release of Power BI, slicers did not display this behavior because slicers did not have a measure to evaluate – they would only show a list of values from a column; Moreover, visual-level filters were not allowed in slicer visuals whereas they were available in other visuals such as charts, tables, and matrices.

Read on to see how to do this.

Kubernetes + AGs + SQL Server 2019 CTP 3

Allan Hirt has a fix for an annoying bug in the latest CTP of SQL Server 2019:

I haven’t written much about them yet (key emphasis there …) but AGs now being supported for containers in SQL Server 2019 is a big deal. Recently, SQL Server 2019 CTP 3.0 was released, but there’s a slight problem: if you try to deploy an AG with Kubernetes, you may see the following errors when trying to deploy the pods with the YAML that contains their definition. The services (i.e. instances of SQL Server) get created, but the pods do not.

Read on for the root cause and the solution.

Categories

June 2019
MTWTFSS
« May Jul »
 12
3456789
10111213141516
17181920212223
24252627282930