Press "Enter" to skip to content

Day: February 21, 2020

Creating Sources and Sinks with Blink

Seth Wiesman has a tutorial showing how to create sources and sinks using Apache Flink’s SQL interface, Blink:

A lot of work focused on improving runtime performance and progressively extending its coverage of the SQL standard. Flink now supports the full TPC-DS query set for batch queries, reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads. Its streaming SQL supports an almost equal set of features – those that are well defined on a streaming runtime – including complex joins and MATCH_RECOGNIZE.

As important as this work is, the community also strives to make these features generally accessible to the broadest audience possible. That is why the Flink community is excited in 1.10 to offer production-ready DDL syntax (e.g., CREATE TABLEDROP TABLE) and a refactored catalog interface.

Click through for a demonstration. One of the nicest things about the ANSI SQL standard is that it was intended to be a one-language solution, where the language used for administration is the same as the language used for regular queries. That cuts down on the number of languages you need to know to get your job done.

Comments closed

MR3: Hive on Kubernetes

Alex Woodie reports on a DataMonad production:

MR3 is a software product developed by a team led by Sungwoo Park. The software, which is not open source, is sold by a Delaware-based software company called DataMonad. After prototyping a Java-based execution engine called MR2 in the 2013 timeframe, development of Scala-based MR3 began in 2015. The first release of MR3 was delivered in early 2018, and version 1.0 was released yesterday.

According to DataMonad, MR3 is an execution engine for big data processing, and Hive is the first and main application that’s been configured to run on it (Tez is also supported). The company says MR3 offers comparable performance to the latest release of Hive, dubbed LLAP, but without the technical complexity.

The closed-sourcedness is a bit of a downer, but I like having more competition in the space.

Comments closed

PolyBase and Excel

I have a post on setting up PolyBase to work with Microsoft Excel:

If you tried to use Microsoft’s Excel driver prior to 2019 CU2, you’d get the following error:

Msg 105082, Level 16, State 1, Line LineNumber
105082;Generic ODBC error: [Microsoft][ODBC Excel Driver]Optional feature not implemented

To this point, I recommended in PolyBase Revealed that you use a different driver, like CData’s, which did work. CData’s driver still works (I assume…PolyBase ODBC support is a fluid situation, it seems), but now I can officially say that PolyBase supports the Microsoft Access Database Engine Redistributable driver for Microsoft Excel. Let’s go to the tape.

Click through for the instructions.

Comments closed

Creating a Power BI Report Book

Teo Lachev shows us how to build a Power BI report book:

Scenario: Management has requested an easy way to view a subset of strategic reports located in different Power BI workspaces. You can ask the users to mark reports and dashboards as favorites so they can access pertinent content in the Favorites menu, but you’re looking for an easier configuration, such as to create a book of reports with a built-in navigation that organizes reports in groups (like a table of contents), such as the screenshot below demonstrates.

Click through for the instructions as well as a discussion on why you wouldn’t necessarily want to build a Power BI app for this.

Comments closed

Executing Azure Data Factory Pipelines with Azure Functions

Paul Andrew wants to execute an Azure Data Factory pipeline via an Azure Function call:

For the function itself, hopefully this is fairly intuitive once you’ve created your DataFactoryManagementClient and authenticated.

The only thing to be careful of is not using the CreateOrUpdateWithHttpMessagesAsync method by mistake. Make sure its Create Run. Sounds really obvious, but when you get code drunk names blur together and the very different method overloads will have you confused for hours!…. According to a friend 🙂

Read the whole thing.

Comments closed

Sorting When Columns are Ordered the Same Way

Erik Darling thinks outside the column:

Sorts aren’t just for the column(s) in your order by — if you SELECT *, you need to all the columns in the * by all the columns in the order by.

I know I basically repeated myself. That’s for emphasis. It’s something professional writers do.

Dig it.

The principle Erik is talking about is having a monotonic relationship (that is, as one column increases, the other always increases; and as one column decreases, the other always decreases). If you can guarantee that, and if one of those happens to be indexed already, you can get a nice performance boost.

But wait, I was told there would be no math.

Comments closed

Creating a Delimited List in .NET

Bill Fellows wants you to use a built-in library method:

There are various ways to concatenate values together. A common approach I see is that people will add a delimiter and then the value and loop until they finish. Then they take away the first delimiter. Generally, that’s easier coding, prepending a delimiter, than to append the delimiter and not do it for the final element of a list. Or add it and then remove the final delimiter from the resulting string.

Gentle reader, there is a better way. And has been for quite some time but if you weren’t looking for it, you might not know it exists. 

You do get bonus points if you knew this existed.

Comments closed