Press "Enter" to skip to content

Day: July 21, 2020

The Cloudera Operational Database Experience

Liliana Kadar, et al, cover scalability options for DBAs working with Cloudera:

Cloudera’s Operational Database (OpDB) supports a scale-up (SMP) environment. The caching layer is able to consume all memory in a large SMP environment. Memory has to be large enough to cover RegionServers, DataNodes and operating system, and to have enough extra space to allow the block cache to assist with reads. When HBase is running with other components, CPU contention and memory contention can be a problem that is easy to address with proper YARN tuning. 

As a result of the scale-up architecture, multiple services and engines can be run on a single node. For smaller nodes, multiple services and engines have to be spread out amongst a larger set of nodes. 

In addition, Krishna Maheshwari, et al, announce a technical preview of their Cloudera Operational Database experience:

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year. It offers multi-modal client access with NoSQL key-value using Apache HBase APIs and relational SQL with JDBC (via Apache Phoenix).  The latter makes COD accessible to developers who are used to building applications that use MySQL, Postgres, etc.

It’s interesting to see Cloudera move in this direction.

Comments closed

Result Window Too Large in Elasticsearch

Samir Behara explains a common Elasticsearch error:

I have configured Error Logs for my Elasticsearch cluster, and I see a frequent error below in the logs —

org.elasticsearch.ElasticsearchException$1: Result window is too large, from + size must be less than or equal to: [10000] but was [15020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

Click through to understand what the issue is and how you can resolve it.

Comments closed

Auto-Shutdown an Azure VM and Notify You on Slack

Daniel Hutmacher has a fun assignment:

Virtual machines cost money when they’re powered on. Most servers obviously need to be on 24 hours a day. Others, like development machines, only have to be on when you’re using them. And if you forget to turn them off, they’ll empty out your Azure credits (or your credit card) before you know it.

Today, I’ll show you how to set an Auto-shutdown time to turn a VM off if you forget, as well as have Azure notify you on Slack 30 minutes ahead of time, so you have the option to postpone or cancel the shutdown.

There are a few steps to the process, but everything is straightforward.

Comments closed

Missing Indexes Don’t Tell the Whole Story

Erik Darling explains some of the shortcomings of the missing indexes DMV:

The problem with relying on any data point is that when it’s not there, it can look like there’s nothing to see.

Missing indexes requests are one of those data points. Even though there are many reasons why they might not be there, sometimes it’s not terribly clear why one might not surface.

That can be annoying if you’re trying to do a general round of tuning on a server, because you can miss some easy opportunities to make improvements.

Read on for a few examples of where the results can betray you.

Comments closed

Secrets Management in Powershell Demos

Rob Sewell is happy to stop using Import-Clixml:

I love notebooks and to show some people who had asked about storing secrets, I have created some. So, because I am efficient lazy I have embedded them here for you to see. You can find them in my Jupyter Notebook repository

Rob has a follow-up on the topic:

Following on from my last post about the Secret Management module. I was asked another question.

> Can I use this to run applications as my admin account?

A user with a beard

Well, Rob has a notebook for that.

1 Comment

Azure SQL Database Business Continuity Options

James Serra covers business continuity scenarios with Azure SQL Database:

I have wrote a number of blogs on the topic of business continuity in SQL Database before (HA/DR for Azure SQL DatabaseAzure SQL Database high availabilityAzure SQL Database disaster recovery) but with a number of new features I felt it was time for a new blog on the subject, focusing on disaster recovery and not high availability.

Business continuity in Azure SQL Database and SQL Managed Instance refers to the mechanisms, policies, and procedures that enable your business to continue operating in the face of disruption, particularly to its computing infrastructure. In the most of the cases, SQL Database and SQL Managed Instance will handle the disruptive events that might happen in the cloud environment and keep your applications and business processes running.

James takes us through options available for Azure SQL Database as well as managed instances.

Comments closed

The Tuple Mover in SQL Server 2019

Taryn Pratt gives us closure on an issue from a few months back:

I suggest reading my other post first, it’ll only take a few minutes. I’ll wait…

However, if you really don’t want to read it, here’s a quick recap on the initial issue.

In early February 2020, a lot of data was deleted from some clustered columnstore indexes in our PRIZM database. Some of the tables were rebuilt, but 11 tables weren’t since we don’t have maintenance windows, and that would involve downtime. The rebuilds would happen once we upgraded to SQL Server 2019, to take advantage of the ability to rebuild those columnstore indexes online.

Taryn now has the full story and I recommend giving it a read.

Comments closed

The Basics of Gremlin

Raul Gonzalez introduces us to Gremlin:

Graph databases in Cosmos DB benefit from the same features, like the SQL API, it is globally distributed, scales independently throughput and storage, provides guaranteed latency, automatic indexing and more. So when relational databases choke with certain queries, No-SQL databases come to play.

Gremlin is the query language used by Apache Tinkerpop and it is implemented in Azure Cosmos DB. This language enables us to transverse graphs and answer complex queries that would be otherwise very expensive to run in traditional relational database engines.

Read on for a detailed example.

Comments closed