There are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data. We will dig into each of these in due time. For this post, we’re going to talk about Clusters. Clusters are where the work is done. Clusters themselves do not store any code or data. Instead, they operate the physical resources that are used to perform the computations. So, it’s possible (and even advised) to develop code against small development clusters, then leverage the same code against larger production-grade clusters for deployment. Let’s start by creating a small cluster.
Read on for an example.
A while back, Jonathan Kehayias blogged about a way to speed up UDFs that might see NULL input.
Which is great, if your functions see NULL inputs.
But what if… What if they don’t?
And what if they’re in your WHERE clause?
And what if they’re in your WHERE clause multiple times?
But fear not—Erik’s got you covered.
In SQL Server 2016, transaction log writing was enhanced to support multiple transaction log writers. If the instance had more than one non-DAC node in [sys].[dm_os_nodes], there would be one transaction log writer per node, to a maximum of 4.
In SQL Server 2019, it seems the maximum number of transaction log writers has been increased. The system below with 4 vNUMA nodes (and autosoftNUMA disabled) has eight transaction log writer sessions, each on their own hidden online scheduler, all on parent_node_id = 3/memory_node_id = 3 on processor group 1.
Click through for the proof.
Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.
The new app navigation capabilities introduced last month to Power BI probably fall into the former category. But even though they’re a refinement of what the Power BI service has always had, they can still make your apps significantly better. Specifically, these new capabilities can be used to add documentation and training materials directly to the app experience, while keeping that content in its current location.
Click through for an explanation.
That means the entire concept of the arrow is made up by the rendering application – like SQL Server Management Studio, Azure Data Studio, SentryOne Plan Explorer, and all the third party plan-rendering tools. They get to decide arrow sizes – there’s no standard.
SSMS’s arrow size algorithm changed back in SQL Server Management Studio 17, but most folks never took notice. These days, it’s not based on rows read, columns read, total data size, or anything else about the data moving from one operator to the next.
There’s an answer, but it’s not particularly intuitive. I think SentryOne Plan Explorer has the upper hand on this one.
We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks. Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.
– Simplifies integrated security by using IAM Role Passthrough for metadata in Glue.
– Provides easier access to metadata across the Amazon stack and access to data catalogued in Glue.
There are some interesting changes in here.
In this article, you will learn how to publish Kubernetes cluster events data to Amazon Elastic Search using Fluentd logging agent. The data will then be viewed using Kibana, an open-source visualization tool for Elasticsearch. Amazon ES consists of integrated Kibana integration.
We will walk you through with the following process:
– Creating a Kubernetes Cluster
– Creating an Amazon ES cluster
– Deploy Fluentd logging agent on Kubernetes cluster
– Visualize kubernetes date in Kibana
Click through for the full article.
I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.
Today I have a small blog post about a neat little function I discovered last week – with thanks to my German colleague, who wants to remain anonymous. The function is called ILIKE and it is syntactic sugar for the combination of UPPER and LIKE.
I’m personally not a fan of case-sensitive collations for data; it’s hard for me to understand the meaningful differences between “dog,” “Dog,” and “DOG.”
The user wants to unpivot the data by rotating the three header rows (Scenario Type, Month, and Year) from columns to rows. The issue is that the headers span three rows. If you just select these columns and unpivot, you’ll end up with a mess. And Power Query operates on row at the time so you can’t reference previous rows, such as to concatenate Scenario, Month, and Year. We can do the concatenation in Excel so we have one row with column headers, such as Actuals-Jan-2018, Actuals-Feb-2018, and so on, which we can easily unpivot in Power Query. But if we can’t or don’t want to modify the Excel file, such as to avoid the same steps every time a new file comes in?
Click through for a sample file which shows how you can do this.
Next, we will create a resource group by executing the following command:
az group create –name nameOfMyresourceGroup –location eastus2
Once you execute the above command, you can go into the Azure portal and refresh your resource group pane and see the newly created resource group.
Once that is setup, it’s time to create the actual Kubernetes cluster.
Click through for the full set of instructions.