Press "Enter" to skip to content

Month: September 2020

Choosing Between Hive LLAP and Impala

David Dichmann walks us through the differences between Impala and Hive LLAP:

Written in C++, which is very CPU efficient, with a very fast query planner and metadata caching, Impala is optimized for low latency queries.  Because of this, Impala is an ideal engine for use with a data mart, since people working with data marts are mostly running read-only queries and not large scale writes.  

Impala also has a very efficient run-time execution framework, using code generation, process-to-process communication, massive parallelism, and metadata caching. Because of this, Impala is also great when working with ad-hoc queries, like when exploring by iteratively digging into data.  You’ll want to change your query over and over again, at a moment’s notice, and have very fast response times so you’re not waiting forever for each iteration.  

I was curious what would end up happening with Hive and Impala once old Cloudera (Impala) and Hortonworks (Hive) merged together. Looks like the answer, at least for now, is that they’re both useful in different circumstances. But I do wonder how long that lasts—it’s not impossible to sell using two separate data platform products for different steps in a warehouse implementation, but I could see architects and CIOs wanting to make things simpler and narrow down to one unless there was a particularly smooth bridge between the two.

Leave a Comment

SQL Server R and Python Language Extensions Now Open Source

The SQL Server team has an announcement:

Previously, we announced a Java extensionToday, we are sharing that we are open sourcing the R and Python language extensions for SQL Server for both Windows and Linux on GitHub.

These extensions are the latest examples using an evolved programming language extensibility architecture which allows integration with a new type of language extension. This new architecture gives customers the freedom to bring their own runtime and execute programs using that runtime in SQL Server, while leveraging the existing security and governance that the SQL Server programming language extensibility architecture provides.

Very interesting.

Leave a Comment

Diving into the Azure Resource Mover

Dennes Torres shows off what the Azure Resource Mover can do:

If you include the need to copy a resource or set of resources, instead of only moving, the list expands a lot.

Azure already offers the resources to do this: ARM templates, automated deployments, Data Sync, Recovery Services Vault, VM replication and so on. The problem is that sometimes, to move a set of objects together, you may need to use many of these services and understand how to use them.

The solution is a new free service, still in preview, called Azure Resource Mover. This service reduces the complexity of moving resources, minimizing the number of decisions needed on how the resources will be moved. More than that, the last step, deleting the source of the move, is optional, as you will see in detail later. You can use this feature, not only to move resources, but also to copy and distribute them across many regions. During the move process, only one side (source or destination) will be active, but once you finish the move, if you decide not to delete the source, you have in fact a new deployment of the solution.

This is a fairly detailed tutorial, so check it out.

Leave a Comment

Problems with Power BI’s Publish to Web

Adam Saxton explains when you might not want to use the Publish to Web option in Power BI:

Some don’t realize that Power BI Publish to Web is not secure. Adam shows you that this is the case. It’s a bit scary and there are other options to have secure embedding.

For demos and other resources which are supposed to be accessible to everybody, Publish to Web works great. But if you’re deploying company dashboards, not so much.

Leave a Comment

Stellar Repair: A Review

Grant Fritchey reviews a product which attempts to repair corrupted SQL Server databases:

Let’s start with the most important piece of information you need: it works.

The software itself is really simple to use and just does what you need, repairs your corrupted SQL Server instance. On that alone, I can recommend the tool.

However, there are a few gotchas I ran into along the way. Mostly, little stuff. It’s things a little polish in the UI and some clean up around language could help out. Don’t get me wrong, I’m happy with this software. It worked. It’s just how it works that we should talk about.

Click through for Grant’s full review.

Leave a Comment

Moving a Virtual Machine with the Azure Resource Mover

Kathi Kellenberger tries out the Azure Resource Mover:

Another task I may want to perform is to move a VM to another region. I found this set of steps that involves using Azure Recovery Services Vault that seems a bit complex. Fortunately, I recently heard about a new, much easier way to move VMs and other resources called Azure Resource Mover (in preview). It was announced today.

Read on to see how this works. I like the idea a lot, especially for those times when you accidentally create resources in different regions and only realize it when it’s time to tie everything together.

Leave a Comment

Azure Synapse Analytics Sample Datasets and Scripts

James Serra shows us where to find samples for Azure Synapse Analytics:

Datasets: A bunch of datasets that when added will show up under Data -> Linked -> Azure Blob Storage.  You can then choose an action (via “…” next to any of the containers in the dataset) and choose New SQL script -> Select TOP 100 rows to examine the data as well as choose “New notebook” to load the data into a Spark dataframe.  Any dataset you add is a linked service to files in a blob storage container using SAS authentication.  You can also create an external table in a SQL on-demand pool or SQL provisioned pool to each dataset via an action (via “…” next to “External tables” under the database, then New SQL script -> New external table) and then query it or insert the data into a SQL provisioned database

Click through to learn more, as well as a few other things you can do with Synapse Analytics.

Leave a Comment

Automatic Soft NUMA in SQL Server

Ameena Lalani walks us through NUMA and automatic soft NUMA in SQL Server:

Modern processors have multiple cores per socket. Each socket is represented, usually, as a single NUMA node. The SQL Server database engine partitions various internal structures and partitions service threads per NUMA node. With processors containing 10 or more cores per socket, using software NUMA to split hardware NUMA nodes generally increases scalability and performance. Prior to SQL Server 2014 (12.x) SP2, software-based NUMA (soft-NUMA) required you to edit the registry to add a node configuration affinity mask, and was configured at the host level, rather than per instance. Starting with SQL Server 2014 (12.x) SP2 and SQL Server 2016 (13.x), soft-NUMA is configured automatically at the database-instance level when the SQL Server Database Engine service starts. Please read this  documentation and this documentation for more understanding.

Read on for more info.

Leave a Comment

Operational Database Security in Cloudera Data Platform

Liliana Kadar, et al, walk us through some of the database security and auditing features in Cloudera Data Platform:

Database object-level security is available through the centralized authorization framework of Apache Ranger. 

Both fine-grained access control of database objects and access to metadata is provided. Protected database objects include: database, table, column, view and User Defined Functions (UDFs). 

Fine-grained access control for special administrative operations that can be performed on OpDBMS is also supported. 

Click through for the full story.

Leave a Comment

Key Metrics for Kafka Monitoring

Preetdeep Kumar shares three metrics which are important for monitoring Kafka clusters:

There are 100s of metrics documented as part of Kafka monitoring out of which CPU, Memory, Disk, and Network related metrics are always useful in monitoring any systems. In this article, I share 3 metrics that I found to be very useful from a development point of view, saved us some time while triaging a few corner cases reported by customers.

Click through for those measures.

Leave a Comment