Press "Enter" to skip to content

Author: Kevin Feasel

Kafka 1.0 Released

Neha Narkhede announces Apache Kafka 1.0:

And Kafka 1.0.0 is no mere bump of the version number. The Apache Kafka Project Management Committee and the broader Kafka community has packed a number of valuable enhancements into the release. Let me summarize a few of them:

  • Since its introduction in version 0.10, the Streams API has become hugely popular among Kafka users, including the likes of PinterestRabobankZalando, and The New York Times. In 1.0, the API continues to evolve at a healthy pace. To begin with, the builder API has been improved (KIP-120). A new API has been added to expose the state of active tasks at runtime (KIP-130). The new cogroup API makes it much easier to deal with partitioned aggregates with fewer StateStores and fewer moving parts in your code (KIP-150). Debuggability gets easier with enhancements to the print() and writeAsText() methods (KIP-160). And if that’s not enough, check out KIP-138 and KIP-161 too. For more on streams, check out the Apache Kafka Streams documentation, including some helpful new tutorial videos.

  • Operating Kafka at scale requires that the system remain observable, and to make that easier, we’ve made a number of improvements to metrics. These are too many to summarize without becoming tedious, but Connect metrics have been significantly improved (KIP-196), a litany of new health check metrics are now exposed (KIP-188), and we now have a global topic and partition count (KIP-168). (That last one sounds so simple, but you’ve wanted it in the past, haven’t you?) Check out KIP-164 and KIP-187 for even more.

  • We now support Java 9, leading, significantly faster TLS and CRC32C implementations. Over-the-wire encryption will be faster now, which will keep Kafka fast and compute costs low when encryption is enabled.

And there are more where that came from.  Congratulations to the Kafka team for hitting this big milestone.

Comments closed

Collation Compatibility And Linked Servers

Greg Low points out an important property which can help linked server performance:

The on-premises versions of SQL Server have the ability to connect one server to another via a mechanism called Linked Servers.

Azure-based SQL Server databases can communicate with each other by a mechanism called External Tables. I’ll write more about External Tables soon.

With Linked Servers though, I often hear people describing performance problems and yet there’s a configuration setting that commonly causes this. In Object Explorer below, you can see I have a Linked Server called PARTNER.

Read on for more.

Comments closed

Multiple Data Sets And SQL Server R Services

Robert Sheldon has a workaround for SQL Server R Services’s limitation of a single input data set:

Despite the ease with which you can run an R script, the sp_execute_external_script stored procedure has an important limitation. You can specify only one T-SQL query when calling the procedure. Of course, you can create a query that joins multiple tables, but this approach might not always work in your circumstances or might not be appropriate for the analytics you’re trying to perform. Fortunately, you can retrieve additional data directly within the R script.

In this article, we look at how to import data from a SQL Server table and from a .csv file. We also cover how to save data to a .csv file as well as insert that data into a SQL Server table. Being able to incorporate additional data sets or save data in different formats provides us with a great deal of flexibility when working with R Services and allows us to take even greater advantage of the many elements available to the R language for data analytics.

Another option is using the rodbc package to connect back to SQL Server to retrieve more data.

Comments closed

Using Tokens In SQL Agent Jobs

Raul Gonzalez builds a SQL Agent job and uses the JOBID token to help him log step output effectively:

It’s usually a good idea to write the output of your SQL Agent jobs to a file, so you can investigate should any issue occur.

But when you define the output file, you need to choose between Appending the output to the same file over and over, or to overwrite it, but that defeats the purpose IMHO.

On the other hand, if you forget to roll over the files, they can grow quite large and then finding any error can become a nightmare.

So some time ago, I wrote a stored procedure that rolls the files for me and place them sorted so it’s easy to find any particular date.

This is a clever solution, but read through to the bottom for a warning.

Comments closed

Finding Queries In Need Of Indexing

Jeff Schwartz continues his series on index tuning:

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused.

Jeff walks through some of his data collection and analysis process in this post, making it worth a read.

Comments closed

Security Issue In Oracle Identity Manager

Oracle has a security advisory with a CVSS base score of 10.0 (which is pretty awful):

This Security Alert addresses CVE-2017-10151, a vulnerability affecting Oracle Identity Manager. This vulnerability has a CVSS v3 base score of 10.0, and can result in complete compromise of Oracle Identity Manager via an unauthenticated network attack. The Patch Availability Document referenced below provides a full workaround for this vulnerability, and will be updated when patches in addition to the workaround are available.

Due to the severity of this vulnerability, Oracle strongly recommends that customers apply the updates provided by this Security Alert without delay.

Catalin Cimpanu explains:

The affected product is Oracle Identity Manager (OIM), a user management solution that allows enterprises to control what parts of their network employees can access. OIM is part of Oracle’s highly popular Fusion Middleware offering and is one of its most used components.

Oracle describes the issue — tracked under the CVE-2017-10151 identifier — as a “default account” vulnerability, an umbrella term that’s usually used to describe accounts with no password or hardcoded credentials (a.k.a. backdoor accounts).

“This vulnerability is remotely exploitable without authentication, i.e., may be exploited over a network without requiring user credentials,” Oracle said in a security alert.

Oracle has patched this.  If you have it installed, please update ASAP.

Comments closed

Azure SQL DB Automatic Tuning FAQ

Arun Sirpal has a self-Q&A session regarding Azure SQL Database’s automatic tuning options:

What are the options?

  1. CREATE INDEX that identifies the indexes that may improve performance of your workload, creates the indexes, and verifies that they improve performance of the queries.

  2. DROP INDEX that identifies redundant and duplicate indexes, and indexes that were not used in the long period of time.

  3. PLAN REGRESSION CORRECTION that identifies SQL queries that are using execution plan that are slower than previous good plan, and uses the last known good plan instead of the regressed plan.

Very useful information.

Comments closed

Hortonworks Data Platform 2.6.3 Released

Roni Fontaine, et al, announce Hortonworks Data Platform 2.6.3:

First, in this release, we made huge strides with regards to Data Science.  This includes:

  • Apache Spark 2.2 is GA. This enhances the stability of Spark for structured streaming.

  • Apache Zeppelin 0.7.3 is GA. This provides support for notebooks stored in HDFS, and improves the scalability and multi-user support.

  • Apache Livy 0.4.0 is GA. This version supports multiple programming languages in the same Spark context.

This is a point release, so most of it is around adding updated versions of different products.

Comments closed

Thinking About Slowly Degrading Page Performance

Ritesh Maheshwari talks about how LinkedIn deals with performance regressions:

Looking at the chart above, where the dotted red line is a reference point to show where we started the year, notice how site speed improvements tend to be significant and noticeable, as they are optimization-driven. Degradations, however, can generally be of any “amount,” as they happen for various reasons. LinkedIn’s page-serving pipeline has many moving parts. We deploy code multiple times per day, operate a micro-service architecture with hundreds of services, and infrastructure upgrades are frequent. A slowdown in any of these components can cause degradations.

While large degradations can be caught using A/B testingcanary analysis, or anomaly detection, small ones tend to leak to production. Thus, performance of a page has a tendency to always degrade over time.

This led to having the centralized Performance Team focus on identifying these leaks, called “site speed regressions,” and to craft tools and processes to fix them.

It’s an interesting principle.  I could see this principle work for tracking database performance degradation as well.

Comments closed

Promises And Closures In R

Damian Rodziewicz looks at the new promises package in R:

Citing Joe Cheng, our aim is to:

  1. Execute long-running code asynchronously on separate thread.
  2. Be able to do something with the result (if success) or error (if failure), when the task completes, back on the main R thread.

A promise object represents the eventual result of an async task. A promise is an R6 object that knows:

  1. Whether the task is running, succeeded, or failed

  2. The result (if succeeded) or error (if failed)

This looks pretty exciting.  H/T R-Bloggers

Also, Sebastian Warnholz has a post on promises and closures in case you’re not familiar with the concepts:

Every argument you pass to a function is a promise until the moment R evaluates it. Consider a function g with arguments x and y. Let’s leave out one argument in the function call:

g <- function(x, y) x
g(1)

## [1] 1

R will be forgiving (lazy) until the argument y is actually needed. Until then y exists in the environment of the function call as a ‘name without a value’. Only when R needs to evaluate y a value is searched for. This means that we can pass some non-existent objects as arguments to the function g and R won’t care until the argument is needed in the functions body.

Read the whole thing.  Once again, H/T R-Bloggers

Comments closed