Press "Enter" to skip to content

Author: Kevin Feasel

HBase PaaS On AWS

Qubole now offers HBase on AWS as a Platform as a Service product:

  • Scheduled Backups and Restore to any cluster: HBase data is backed up to S3 regularly, and you can use Qubole’s cluster management UI to easily configure schedules for your backups. All the data or specific tables can be restored to any other HBase cluster. This feature is not only useful for disaster recovery but also to bring up test or dev clusters as well as clusters for one-off analysis.

  • Cluster Management: HBase clusters in QDS are elastic! Our users regularly increase the size of the capacity of the clusters. This is a boon for capacity planners as they can be more reactive to workload patterns saving time and money.

  • Zero impact node rotations: By carefully orchestrating data compaction and HDFS block transfers prior to reassigning region servers, QDS minimizes the impact of adding/deleting/replacing nodes in a HBase cluster.

  • High-Availability Configurations: Putting together a high availability HBase cluster in the cloud can be daunting. QDS provides prebuilt templates for deploying HA HBase clusters through a single click.

The value proposition for this is very similar to Azure SQL Database:  eliminate some of the administrative tasks by accepting their defaults.  Doing this does remove some of the complexity of HBase.

Comments closed

Learning By Doing

Matt Cushing gives us some notes on learning SSIS:

Send Mail tasks won’t work unless you have things configured properly.  I was trying to find things on google and all I kept coming across was how to configure the task or how to Install SSIS and configure it to run, not how to configure the server to send it properly.  Thankfully John took pity on me and helped me realize that using an execute SQL task and sp_send_dbmail works more easily and cleanly – Sql Server Central

I’ve used Send Mail a few times, but have always had somebody else around to configure Exchange or whatever other mail server we were using at the time.

Comments closed

Power BI Groups

Reza Rad looks at using Power BI groups and integrating with Office 365:

When you share a content with an individual in the organization, if that person leave the company, or be replaced by someone else from another team, then you have to remove sharing from previous user account, and assign it to the new user account. Best practice is to share content with groups. and members of Groups then easily can be managed by an administrator. Power BI groups are fully synchronized with Office 365 groups. once you used a group in Power BI, then it is only an admin’s task to add/remove members from it.

I like this group-based approach a lot, as it makes dashboard security a lot easier.

Comments closed

Database Scoped Configurations

Niko Neugebauer looks at a new feature in SQL Server 2016:

In the first version/iteration of the Database Scoped Configurations, there are 5 functionalities are supported on per Database level:
– Procedure Cache cleaning
– MAXDOP control
– Query Optimizer Cardinalitiy Estimation Model activation control
– Parameter Sniffing control
– Query Optimizer Hotfixes enabling

Niko also shows an example of how the different MAXDOP settings interact.

Comments closed

How Do You Respond?

Erik Darling has a new interview question:

A new developer has been troubleshooting a sometimes-slow stored procedure, and wants you to review their progress so far. Tell me what could go wrong here.

This one’s a bit tougher than some of the early interview questions, and I think you can see that based on the responses in the comments.

Comments closed

Dueling Log Backup Jobs

Robert Davis ran into HADR_WORK_QUEUE waits recently:

Our 3rd party monitoring solution collects blocking information, but not for system threads. There was no additional information available for this blocking incident, but I could see that the system thread was a background process with the command “UNKNOWN TOKEN” and was sitting in a wait type of “HADR_WORK_QUEUE”. It was clearly the worker thread for the AG of a specific database.

A little later, we had blocking again involving that same thread, but this time, the AG worker thread was blocking the log backup thread. Seemed logical that if the worker thread could block the log backup, then the log backup could have also blocked the worker thread, but still it did not make sense to me.

This is one of those cases in which the answer makes perfect sense after the fact, but can be maddening until then.

Comments closed

Microsoft & FreeBSD

Serdar Yegulalp points out that a new Azure VM image for FreeBSD has Microsoft as the publisher:

The other question people are likely to ask is why, kernel contributions notwithstanding, is Microsoft listed as the publisher of the distro? The short answer: support.

According to Microsoft’s blog post, the FreeBSD Foundation is a community of mutually supportive users, “not a solution provider or an ISV with a support organization.” The kinds of customers who run FreeBSD on Azure want to have service-level agreements of some kind, and the FreeBSD Foundation isn’t in that line of work.

The upshot: If you have problems with FreeBSD on Azure, you can pick up the phone and get Microsoft to help out — but only if you’re running its version of FreeBSD.

To be honest, I don’t see this as a big deal.  I’m glad the image is there, but this hardly seems like a landmark change in anything to me.

Comments closed

Adding An Index To A Spark RDD

Arijit Tarafdar gives us a good method for adding an index column to a Spark data frame based on a non-unique value:

The basic idea is to create a lookup table of distinct categories indexed by unique integer identifiers. The way to avoid is to collect the unique categories to the driver, loop through them to add the corresponding index to each to create the lookup table (as Map or equivalent) and then broadcast the lookup table to all executors. The amount of data that can be collected at the driver is controlled by the spark.driver.maxResultSize configuration which by default is set at 1 GB for Spark 1.6.1. Both collect and broadcast will eventually run into the physical memory limits of the driver and the executors respectively at some point beyond certain number of distinct categories, resulting in a non-scalable solution.

The solution is pretty interesting:  build out a new RDD of unique results, and then join that set back.  If you’re using SQL (including Spark SQL), I would use the DENSE_RANK() window function.

Comments closed

Early Thoughts On SQL Server 2016

Koen Verbeeck has some initial thoughts on using 2016 in a POC:

  • AutoAdjustBufferSize property of the SSIS data flow. Done with manually setting the Buffer Size and Buffer Max Rows. Just set this property to true and the data flow takes care of its own performance.

  • Custom logging levels in the SSIS Catalog. Now I can finally define a logging level that only logs errors and warnings AND set it as the server-wide default level.

  • The DROP TABLE IF EXISTS syntax. The shorter the code, the better 🙂

I was initially a bit concerned with AutoAdjustBufferSize because I figured I could do a better job of selecting buffer size.  Maybe on the margin I might be able to, but I think I’m going to give it a try.

Comments closed

SQL Server 2014 Express Docker Image

Perry Skountrianos introduces a new Docker image:

We are excited to announce the public availability of the sql server 2014 express Docker image for Windows Server Core based Containers! The public repo is hosted on Docker Hub and contains the latest docker image as well as pointers to the Dockerfile and the start PS script(hosted on Github). We hope you will find this image useful and leverage it for your container based applications!

Containerization is a huge part of modern administrative world and it’s good to see Microsoft (belatedly) jumping onto the bandwagon.

Comments closed