April 2019 – Page 16

Confluent Platform 5.2 Released

Published 2019-04-04 by Kevin Feasel

Mau Barra announces Confluent Platform 5.2:

Confluent Platform 5.2 represents a significant milestone in our efforts across three key dimensions:
1. It allows you to use the entire Confluent Platform free forever in single-broker Kafka clusters, so you are freer than ever to start building new event streaming applications right away. We are also bringing librdkafka 1.0 in order to bring our C/C++, Python, Go and .NET clients closer to parity with the Java client.
2. It adds critical enhancements to Confluent Control Center that will help you meet your event streaming SLAs in distributed Apache Kafka environments at greater scale.
3. With our latest version of Confluent Replicator, you can now seamlessly stream events across on-prem and public cloud deployments.

The top item is quite interesting: a free developer license and not just a 30-day trial.

Comments closed

DW Databases in PolyBase

Published 2019-04-04 by Kevin Feasel

I look at some databases people tend to ignore:

Today is a fairly short post covering a trio of databases you might not even know you have: DWConfiguration, DWDiagnostics, and DWQueue. The PolyBase installer drops all three of these on your instance. Let’s go in ascending order of the number of useful tables.

There are very few useful (to us) tables when using on-prem SQL Server as opposed to APS, but there are a few of note.

Comments closed

Managing Kubernetes Resources

Published 2019-04-04 by Kevin Feasel

Vincent-Philippe Lauzon takes us through some thoughts on Kubernetes resource allocation:

In this article we will look at how to inform Kubernetes about pods’ resources and how we can optimize for different scenarios.
A scenario that typically comes up is when a cluster has a bunch of pods where a lot of them are dormant, i.e. they don’t consume CPU or memory. Do we have to carve them a space they won’t use most of the time? The answer is no. As usual, it’s safer to provision capacity for a workload than relying on optimistic heuristic that not all workloads will require resources at the same time. So, we can configure Kubernetes optimistically or pessimistically.

Read the whole thing.

Comments closed

Using the Cosmos DB Change Feed

Published 2019-04-04 by Kevin Feasel

Hasan Savran (who just became a Microsoft MVP, so congrats to him) takes us through the Cosmos DB Change Feed:

Azure Cosmos DB Change Feed exposes Cosmos DB Logs to outside of CosmosDB. CosmosDB notifies you immediately when there is any change in your database. It supports all Inserts and Updates, Delete will be available soon. You can always use soft delete to catch delete events if you need to.

By knowing what is changed in your database, you can trigger all kind of events and you can make your application work very smart. SQL Server has similar functionality but like many other features Log shipping is usually blocked by DBAs or the company policies. In CosmosDB, you don’t need to do anything to enable Change Feed feature! It’s already enabled, all you need to do is to configure it. Easiest way to catch change feed events is Azure Functions.

When I hear someone describe the change feed, I immediately imagine it as a Kafka topic.

Comments closed

Power BI: Comparing Web.Contents and File.Contents Performance

Published 2019-04-04 by Kevin Feasel

Chris Webb compares the performance of File.Contents and Web.Contents:

In my last post I mentioned the Power Query engine’s persistent cache, which in some scenarios caches the data read from a data source when a query is refreshed. Another important nugget of information that Ehren von Lehe of the Power Query dev team mentioned in a post on the Power Query MSDN forum recently is the fact that if you use File.Contents to get data from a file then the persistent cache is not used, but if you use Web.Contents to get data from the same file then the persistent cache is used. I guess the thinking here is that there is no point creating an on-disk cache containing the contents of a file that is already on disk.

Chris takes us through a couple of unexpected twists, so check it out.

Comments closed

Getting the Last Actual Execution Plan for a Query

Published 2019-04-04 by Kevin Feasel

Pedro Lopes takes us through a new dynamic management function in SQL Server 2019 CTP 2.4:

In CTP 2.4 not all actual execution plans will be available, you can see more details on that here.
For an upcoming CTP version, all queries will be available with the equivalent of the actual execution plan. At least those where the plan was cached in the first place, or those where the plan has not been evicted from cache.

That caveat aside, I’m happy with this.

Comments closed

Finding Dependency Clusters

Published 2019-04-04 by Kevin Feasel

Michael J. Swart performs cluster analysis with tables:

That ball of mush in the middle is hard to look at, but the smaller disconnected bits aren’t! Just like Ben, I want to work on those smaller pieces too! And just like the lonely tables we looked at last week, these small isolated components are also good candidates for extracting from SQL Server.

The script looks at joins in execution plans, which is a rather clever way of doing this when you don’t have a comprehensive set of foreign key constraints.

Comments closed

Deleting In Batches

Published 2019-04-04 by Kevin Feasel

Andy Mallon explains why deleting in batches is a good idea:

Yesterday, I wrote about the importance of planning your data purges. Today, let’s look at the simplest requirement for purging data: Keep data for X days after it’s created. There are tons of cases where this is the rule. Log data, telemetry data, even transactional data is often retained based on X days since it is created.
Given how common this scenario is, let’s talk in a bit more detail about I like to go about deleting it. Deleting data is easy, right? DELETE t FROM dbo.Transactions AS t WHERE CreateDate <= DATEADD(dd,-90,GETDATE());

This part of the series covers the why; the next part will cover the how.

Comments closed

Python Natural Language Processing Tools

Published 2019-04-03 by Kevin Feasel

Sandeep Aspari takes us through some of the tooling available in Python around Natural Language Processing:

TextBlob
TextBlob is a python library tool and extension of NLTK. It provides a simple API approach to its methods and executes a large number of NLTK functions, and it also includes the pattern library functionality. You are just at the beginning, this might be an excellent tool to learning, and we can use it in applications production those don’t require heavy performant. TextBlob libraries are similar to python strings, so we can quickly transform and play similarly we performed in python. Finally, TextBlob is used in everywhere, and it is best suitable for smaller projects.

There are several tools from which you can choose. Sandeep also gives us some Node- and Java-based tools as well.

Comments closed

Developing Big Data Cluster Spark Jobs with IntelliJ

Published 2019-04-03 by Kevin Feasel

Jenny Jiang shows how we can use IntelliJ IDEA to develop Spark jobs against SQL Server Big Data Clusters:

We’re delighted to release the Azure Toolkit for IntelliJ support for SQL Server Big Data Cluster Spark job development and submission. For first-time Spark developers, it can often be hard to get started and build their first application, with long and tedious development cycles in the integrated development environment (IDE). This toolkit empowers new users to get started with Spark in just a few minutes. Experienced Spark developers also find it faster and easier to iterate their development cycle.
The toolkit extends IntelliJ support for the Spark job life cycle starting from creation, authoring, and debugging, through submission of jobs to SQL Server Big Data Clusters. It enables you to enjoy a native Scala and Java Spark application development experience and quickly start a project using built-in templates and sample code. The integration with SQL Server Big Data Cluster empowers you to quickly submit a job to the big data cluster as well as monitor its progress. The Spark console allows you to check schemas, preview data, and validate your code logic in a shell-like environment while you can develop Spark batch jobs within the same toolkit.

It looks pretty good from my vantage point.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Month: April 2019