Press "Enter" to skip to content

Day: March 1, 2019

Creating a UDF in KSQL

Saeed Barghi shows us how we can create user-defined functions in Kafka Streams’ KSQL language:

The UDF I want to implement here is DATEADD. If you’re familiar with SQL, you have definitely used it: it takes in a date and adds or subtracts a specific number value to a specific part of datetime, and spits out a new datetime.

To implement a User Defined Function (UDF or UDAF) you would need to code your function in Java and then import the jar file in your KSQL server. You can read about the full process here, I point out a couple of things that I believe you should pay attention to:

I think I prefer Spark’s method for UDFs in Spark SQL: create it in Scala and register the function.

Comments closed

Kafka Connect Neo4j Sink

Michael Hunger and Andrea Santurbano announce a new sink for Kafka Connect which handles writing to Neo4j:

We’ve been using the work we did for the Kafka sink – Neo4j extension and have made it available via remote connections over our binary bolt protocol. So you can stream your events from Apache Kafka®directly into Neo4j to create, update and enrich your graph structures. Then it is really up to you what you want to with the event data.

The events can come from frontend systems, API notifications, other databases or streaming systems like Apache Spark™ and Apache Flink®.

For details on the format and internals, please see our previous article or the documentation for the Neo4j sink.

Read on for details and demos.

Comments closed

Remote Code Execution with CallBimlScriptContent

Bill Fellows describes the ability to execute code in Biml from a remote source with the CallBimlScriptContent function:

CallBimlScriptContent was introduced with the migration from Mist to BimlStudio. Why is this cool? You do not have to use files sitting on your computer as the source for your Biml. As long as you can reconstitute the Biml contents into a string, you can store your scripts where ever you’d like. If you want them in a database, that’s great. Store them in the cloud? Knock yourself out.

As a consultant, the latter is rather compelling. Maybe I’m only licensing my clients to use accelerators during our engagement. If I leave files on the file system after I roll off, or they image my computer and accidentally collect them, I am David fighting Goliath. CallBimlScriptContent is a means to protect myself and my IP. Let’s look at a trivial example. I set a C# string with an empty Package tag (hooray for doubling up my double quotes). Within my Packages collection, I invoke CallBimlScriptContent passing in my Biml content.

Bill’s use case was one I hadn’t thought about, but it does make sense.

Comments closed

Using WorkloadTools to Analyze a Workload

Gianluca Sartori shares an example of configuring and using the WorkloadTools SQL workload analyzer:

Now that the analysis database contains the performance data, you can use WorkloadViewer to visualize it and draw your conclusions.

WorkloadViewer is a GUI tool that reads performance data from the analysis database and gives a graphical representation using charts and grids. It accepts a number of command line arguments that allow to automate its behavior, but it can be also opened without specifying any arguments: in this case, WorkloadViewer will present a form to fill the missing information.

WorkloadViewer can be used to visualize information about a single benchmark (analysis mode) or two benchmarks (comparison mode). In this case, you just need to work with a single benchmark, so it is enough to enter the connection info on the left, including the schema name where the tables are. When using Windows Authentication, you can leave UserName and Password blank.

Gianluca has a full demo from the beginning of data capture to analysis.

Comments closed

Benefits of Inline Indexes

Phil Factor covers a topic fairly close to my heart:

Usually, the added features of the CREATE TABLE syntax in new releases of SQL Server are esoteric, and unless you are dealing with memory-optimized tables or other esoteric stuff, they aren’t of great interest. However, the Inline INDEX for both a table and column index has just crept in quietly with SQL Server 2014 (12.x). This was interesting because the SQL Server team back-fitted it to all tables rather than just in-memory OLTP tables for which it was, at the time, found necessary. The new syntax was introduced which allows you to create certain index types inline with the table definition. These could be at column level, concerning just that column, or at the table level, with indexes containing several columns.

Why interesting? This affects multi-statement table functions, user-defined table types, table-valued parameters as well as table variables. It was considered a game-change for table variables because, for a start, it allowed non-unique indexes or explicit clustered indexes to be declared on columns for the first time because you can create indexes on table variables as part of the table definition. Of more significance were the table-level indexes that allowed you to specify multi-column indexes. Previous releases had allowed multi-column primary or unique constraints, but not explicitly named indexes. You still cannot declare an index after the table is created, which is a shame as there are good reasons for being able to do so after a table is stocked with data. Any sort of large import of data into a table that is over-indexed or prematurely-indexed is doomed to crawl rather than to run. I’ll show this later on in this article.

Click through for an analysis of inline indexes themselves as well as how they fit on table variables—something I tend not to do much.

Comments closed

Capturing Queries With XEvent Profiler

Erin Stellato explains how to use the XEvent Profiler in SSMS 17.3 and later:

It’s worth pointing out that neither the Standard or TSQL session writes out to a file. In fact, there’s no target for either event session (if you didn’t know that you can create an event session without a target, now you know). If you want to save this data for further analysis, you need to do one of the following:

1. Stop the data feed and save the output to a file via the Extended Events menu (Export to | XEL File…)
2. Stop the data feed and save the output to a table in a database via the Extended Events menu (Export to | Table…)
3. Alter the event session and add the event_file as a target.

Read the whole thing.

Comments closed

Which Power BI Visuals Send Data Externally

Meagan Longoria does a bit of investigation into data privacy and Power BI visuals:

One thing that makes understanding data privacy in custom visuals easier is the designation of a certified custom visual. One of the requirements for certification is ” Does not access external services or resources, including but not limited to, no HTTP/S or WebSocket requests go out of Power BI to any services.”

You can find the list of currently certified custom visuals on this page. Custom visuals are also identified in the marketplace by a blue star with a check mark.

Read on for some good investigative analysis.

Comments closed