Kevin Feasel – Page 830

Join Execution in Apache Spark

Published 2020-11-04 by Kevin Feasel

Ajay Gupta takes us through join operations in Apache Spark:

Join operations are often used in a typical data analytics flow in order to correlate two data sets. Apache Spark, being a unified analytics engine, has also provided a solid foundation to execute a wide variety of Join scenarios.
At a very high level, Join operates on two input data sets and the operation works by matching each of the data records belonging to one of the input data sets with every other data record belonging to another input data set. On finding a match or a non-match (as per a given condition), the Join operation could either output an individual record, being matched, from either of the two data sets or a Joined record. The joined record basically represents the combination of individual records, being matched, from both the data sets.

Click through for more information on the mechanics of joining, including trade-offs between types of physical join operators.

Comments closed

FAIL_PAGE_ALLOCATION in SQL Server

Published 2020-11-04 by Kevin Feasel

Eric Cobb diagnoses an ugly issue:

I recently ran into a situation where a new SQL Server would crash hard every time it would get under a load.
Here is a synopsis of what we were seeing:
This is a physical server and has 512GB of RAM installed. We have SQL Server 2016 installed, and fully patched (SP2 CU15 at this time). When load testing the server, it would start throwing errors such as:
“Failed allocate pages: FAIL_PAGE_ALLOCATION”
and
“There is insufficient system memory in resource pool ‘default’ to run this query.”
and
“Failed to allocate BUFs”
It would then write a memory dump to the log, and in most cases the server would become completely unresponsive and would have to be rebooted.

Read on to learn under what conditions this happens as well as the solution to the problem.

Comments closed

Considerations Before using SQL Server on Containers

Published 2020-11-04 by Kevin Feasel

Joy George Kunjikkur wants you to slow your roll a little:

It is easy to get started on development and simple testing using SQL containers. It was discussed in the previous post. But before putting into production and start developing real applications we had to make sure the below things at least.

Read on for those considerations. I think they are reasonable and generally agree with the bottom-line conclusion.

Comments closed

Migrating SSIS to Azure Data Factory

Published 2020-11-04 by Kevin Feasel

Koen Verbeeck has some articles for us:

For quite some time now, there’s been the possibility to lift-and-shift your on-premises SSIS project to Azure Data Factory. There, they run in an Integration Runtime, a cluster of virtual machines that will execute your SSIS packages. In the beginning, you only had the option to use the project deployment model and host your SSIS catalog in either an Azure SQL DB, or in a SQL Server Managed Instance.
But over time, features were added and now the package deployment model has been supported for quite some time as well. Even more, the “legacy SSIS package store” is also supported. For those who still remember this, it’s the SSIS service where you can log into with SSMS and see which packages are stored in the service (either the file system or the MSDB database) and which are currently running.

Read on for much more detail on the topic.

Comments closed

External Tables vs T-SQL Views in Synapse

Published 2020-11-04 by Kevin Feasel

James Serra explains the differences between external tables and T-SQL views in Azure Synapse Analytics when querying from Data Lake Storage:

A question that I have been hearing recently from customers using Azure Synapse Analytics (the public preview version) is what is the difference between using an external table versus a T-SQL view on a file in a data lake?
Note that a T-SQL view and an external table pointing to a file in a data lake can be created in both a SQL Provisioned pool as well as a SQL On-demand pool.
Here are the differences that I have found:

Click through for the differences.

Comments closed

Date and Time Functions in Cosmos DB

Published 2020-11-04 by Kevin Feasel

Hasan Savran walks us through date and time functions in Azure Cosmos DB:

Json does not have datetime data type, you need to keep the datetime information in string. This can be a problem for database engines specially if user needs to search by date or sort by date. Cosmos DB team introduced bunch of datetime functions to the Azure Cosmos DB database engine this month. You can read my older post about DateTime in CosmosDB if you like to know how Azure CosmosDB saves the datetime in documents. I will cover the new datetime functions in this post. Here is the list of the functions

Click through for those functions and how you can use them.

Comments closed

Geospatial Analysis with Azure Data Explorer

Published 2020-11-04 by Kevin Feasel

Chris Webb continues along a theme:

Since last week’s blog post about dynamic M parameters generated so much interest, this week I thought I’d give you another example of something cool you can do with them when you’re using Azure Data Explorer (ADX) as a DirectQuery source in Power BI: geospatial analysis.
Let’s say you work for a chain of supermarkets and want to use Power BI see what other competing stores are close to one of your stores.

Read on for the rest of the story.

Comments closed

Understanding How GPS Works

Published 2020-11-03 by Kevin Feasel

Holger von Jouanne-Diedrich walks us through the basics of global positioning:

Last week, I showed you a method of how to find the fastest path from A to B: Finding the Shortest Path with Dijkstra’s Algorithm. To make use of that, we need a method to determine our position at any point in time.
For that matter, many devices use the so-called Global Positioning System (GPS). If you want to understand how it works and do some simple calculations in R, read on!

Do read the whole thing; the explanation is laid out really well.

Comments closed

Adding Libraries in Databricks

Published 2020-11-03 by Kevin Feasel

Arun Sirpal has some third-party libraries to add:

It is a really common requirement to add specific libraries to databricks. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.
Libraries can be added in 3 scopes. Workspace, Notebook-scoped and cluster. I want to show you have easy it is to add (and search) for a library that you can add to the cluster, so that all notebooks attached to the cluster can leverage the library.

I’m hoping that loading libraries in Azure Synapse Analytics will, at some point, be this convenient.

Comments closed

Connecting to Postgres with PolyBase

Published 2020-11-03 by Kevin Feasel

I clear one blog post off my backlog:

Now that we have some data, let’s go back to SQL Server. I assume you’ve already installed and configured PolyBase—if not, check out my presentation on PolyBase. Note that this requires SQL Server 2019 or later, as that’s the first version which supports PolyBase to ODBC. Here’s a script which assumes a database named Scratch and a master key <<SomeSecureKey>>.

Click through for step-by-step instructions to get started, though I will freely admit that I don’t have the Postgres knowledge to give you a full listing of sharp edges.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Author: Kevin Feasel