Press "Enter" to skip to content

Month: November 2020

Dynamic M Parameters and Multi-Select

Chris Webb shows off a method for handling multi-select using dynamic M parameters:

Even though the documentation for dynamic M parameters does mention how to handle multi-select in the M code for your Power Query queries, I thought it would be useful to provide a detailed example of how to do this and explain what happens behind the scenes when you use multi-select.

Click through for that explanation and example.

Comments closed

Querying Data Lake Files in Power BI through Synapse Analytics

Wolfgang Strasser shows us how to integrate Azure Synapse Analytics and Power BI:

Sometimes however, would not it be nice to access the data lake in Direct Query mode – to get the most up to date information for every report view? I would say: yes … but how can you achieve this? The options natively provided by ADLS Gen2 and Power BI are not sufficient to solve this requirement. But: there are options to achieve this and, in this post, I would like to show you the possibilities using Azure Synapse Analytics to build a query layer on top of a ADLS Gen2 storage account.

Click through for a step-by-step walkthrough.

Comments closed

From JSON to SQL Server

Phil Factor has some helper functions for us when working with JSON data:

If you know the structure and contents of a JSON document, then it is possible to turn this into one or more relational tables, but even then I dare to you claim that it is easy to tap in a good OpenJSON SELECT statement to do it. If you don’t know what’s in that JSON file, then you’re faced with sweating over a text editor trying to work it all out. You long to just get the contents into a relational table and take it on from there. Even then, You’ve got several struggles before that table appears in the result pane. You must get the path to the tabular data correct, you have to work out the SQL Datatypes, and you need to list the full panoply of keys. Let’s face it: it is a chore. Hopefully, all that is in the past with these helper functions.

Click through for those functions.

Comments closed

Automating Hadoop Workflows with Spark and Oozie

Prashanth Jayaram walks us through automating a sample data transfer with tools like Sqoop, Spark, and Oozie:

In the process of building a data product one would end-up applying many resource-intensive analytical operations on a medium to large data-set in an efficient way. Apache Spark is the bet in this scenario to perform faster job execution by caching data in memory and enabling parallelism in a distributed data environments.

Components involved in Spark implementation:

1. Initialize spark session using scala program
2. Ingest data from data lake through hive queries
3. Apply business logic using scala constructs or hive queries
4. Load data into HDFS or Hive targets
5. Execute spark programs through spark submit

Read on for a sample flow.

Comments closed

DBCC CHECKDB on Large Databases

Aaron Bertrand shares some thoughts on CHECKDB:

We have a lot of data. Some of that data is stored in large databases (dozens of terabytes each). In some shops, this is an excuse to not run integrity checks. We are not one of those shops.

But we don’t run full CHECKDB operations in production; we have a set of servers dedicated to testing our restores and running checks. We follow a lot of the guidance in these articles:

CHECKDB From Every Angle: Consistency Checking Options for a VLDB

Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts

Minimize performance impact of SQL Server DBCC CHECKDB

Read the whole thing, even if you aren’t dealing with 30+ TB databases.

Comments closed

Creating Users in Azure SQL Database

Kenneth Fisher takes us through a nuance in adding users to Azure SQL Database:

Awesome! I did say I preferred code didn’t I? I am noticing a slight problem though. I don’t actually have a login yet. So I look in object explorer and there is no instance level security tab. On top of that when I try to create a login with code I get the following error:

Msg 5001, Level 16, State 2, Line 1
User must be in the master database.

Read on for the whole process.

Comments closed

In-Memory OLTP and HammerDB Setup Error

Erik Darling tracks down an error:

This a short post about a sort of quirky error message I got while trying to run the TPC-C load test against SQL Server’s in-memory tables.

Error in Virtual User1: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The integer value XXXXXXXXXXXX is out of range.

Click through for the solution. Also, read down to the bottom of the post for a huge discount on Erik’s recorded training content. Erik is extremely knowledgeable and has a great way of explaining things, so take advantage of that knowledge.

Comments closed

GIS Capabilities in R

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed