Scaling Kafka Streams

Kevin Feasel

2016-07-15

Hadoop

Michael Noll discusses elastic scaling of Kafka Streams:

Third, how many instances can or should you run for your application?  Is there an upper limit for the number of instances and, similarly, for the parallelism of your application?  In a nutshell, the parallelism of a Kafka Streams application — similar to the parallelism of Kafka — is primarily determined by the number of partitions of the input topic(s) from which your application is reading. For example, if your application reads from a single topic that has 10 partitions, then you can run up to 10 instances of your applications (note that you can run further instances but these will be idle).  In summary, the number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and thus for the number of running instances of your application.  Note: A scaling/parallelism caveat here is that the balance of the processing work between application instances depends on how well data messages are balanced between partitions.

Check it out.  Kafka Streams is a potential alternative to Spark Streaming and Storm for real-time (for some definition of “real-time”) distributed computing.

Projections And Confidence Intervals

Mirio De Rosa explains confidence intervals and sampling issues, using polls as an example:

Weighting is used to make sure samples reproduce the underlying characteristics of the population they are drawn from. For instance, in the UK 51.6% of voters are women, of these 22.5% are above 65 years of age, 23.1% have a higher education and so on. The people selected to make part of a sample may be recruited to ensure they match these proportions.

Within the context of weighting there are two major sampling procedures: Quota and stratified sampling. The relevant difference between them is the latter uses some sort of randomization device while with the former the interviewer decides whether or not to interview a person with certain characteristics. YouGov presumably applied Quota Sampling for the Brexit survey[4], and the gender, age and education weights they applied are shown in the following image.

Read the whole thing.

Single Slices And Chart Slicers

Matt Allington has a post up on improving your Power BI user experience:

The example on the right adds more value over the one on the left.  The example on the right uses a column chart instead of a slicer.  The benefit of this is you can communicate more information to the user than you can with the static slicer.  In this case I am displaying the total value of each of the scenarios which means it is easy to see the relative difference between them.  And this all happens while still providing slicer capabilities thanks to the powerful cross filter feature in Power BI.  This is what I mean by adding value by thinking outside your old paradigms.

Definitely read the comments on this one; they are full of great questions and suggestions.

Auditing Power Plan Settings

Drew Furgiuele writes a bit of Powershell to control power plan settings and expands upon this one-liner:

It’s a classic one-liner, but if you’re not used to reading it I’ll break it down for you. First, we useGet-ChildItem  to return a list of registered servers in our central management server (named PRO-CM-SQL in my example). This returns a series of objects that lists all the registered names on the central management server in each directory, so we need to filter out the directory names with a Where-Object  (objects that don’t have a “mode” value of “d” for directory). Once we have our list, we just select the names ( Select-Object ). Then we pipe the list of names over to a ForEach-Object  and execute the script each time. Finally, we tack on a Export-CSV  cmdlet to output the results to an easy to read file we can open in Excel (or notepad, or whatever).

Our script also doesn’t control output, so you leave that up to the user. They can put it on the screen or pipe it to a file. And that’s an important style point: never put your users on rails. You may like output one way, but someone else may not. Just think of the next person.Because some day you might be that next person.

This is a good post if you need to figure out how to find your servers’ current power settings, but a great post if you want to think about how to write helpful Powershell scripts.

Power BI Costs

Adam Saxton looks at pricing in Power BI:

You can get a quick look at the differences between free and Pro by looking at the Power BI Pricing page. You can also look at the official documentation for what is Pro content.

So, that’s it! Right?

No so fast! There are other factors that may come into play, or you may be wondering about. You may think they are part of Power BI, but they may be separate.

What I like about this post is that Adam goes into detail on some of the other potential costs involved aside from product licensing.

Missing Query Plans

Jack Li explains when you might not get a result back for query_plan in sys.dm_exec_cached_plans:

When you first execute a batch or stored procedure which may contain multiple statements.   Not every statement will be compiled when you first use the procedure.   If there are some code paths that result in the statement not being executed, SQL Server may choose not to compile that statement in a small set of scenarios. In other words, some statements will only be compiled upon first execution.  So far, I have found the following two scenarios that a statement will not be compiled (deferred compilation) if code path result in that statement being skipped.

  1. Statement involving temp table.   Data could have been changed when the statement is actually executed.  So it doesn’t make sense to compile right off the beginning.

  2. Statement that has option (recompile) attached to it   We need to recompile that statement every time anyway.  If we don’t use, it why compile in advance?

Read the whole thing.

New Features In SSAS 2016 Multidimensional

Chris Webb takes a look at SQL Server 2016, specifically around Multidimensional Analysis Services:

In fact the drillthrough/multi-select improvements (which I blogged about here) already shipped as part of SSAS 2014 and are reliant on improvements in Excel 2016 as much as in SSAS; similarly the Excel 2016 query optimisations are not reliant on any changes in SSAS 2016 and will benefit users of all versions of SSAS.

So what has actually changed with SSAS 2016 Multidimensional? I don’t know all the details on every change, but here’s what I know right now:

It sounds like the answer is “not much.”  Tabular has been getting more love in Analysis Services.

Copying Tables

Stephen West shares a few methods for copying tables between databases:

The last method is done by generating scripts and it will help user in copying not only the table schema or data, but also allows user to copy views, functions, constraints, triggers, etc.

These are three built-in methods which require no additional tooling.

More On DBCC CLONEDATABASE

Melissa Connors digs into the DBCC CLONEDATABASE command:

Finally, keep in mind that because the clone is a read-only, empty database, you should be able to test repeatedly without updating statistics and skewing your results. Since I wanted to see this for myself, I executed a set of updates and selects against the SQLSentryData and SQLSentryDataClone databases. As a result of the lack of data and read-only database status, there were no actual updates in SQLSentryDataClone. Consequently, the statistics were updated in the SQLSentryData database, but remained the same in the SQLSentryDataClone database:

Read the whole thing.

New In-Memory OLTP Whitepaper

Kalen Delaney has a final version of her In-Memory OLTP whitepaper:

Click on “SQL Server 2016 In-memory OLTP technical white paper” and you can open or save.

OR

You can use this direct link: http://download.microsoft.com/download/8/3/6/8360731A-A27C-4684-BC88-FC7B5849A133/SQL_Server_2016_In_Memory_OLTP_White_Paper.pdf

It’s 90 pages long, so you might reserve it for beach reading.

Categories

July 2018
MTWTFSS
« Jun  
 1
2345678
9101112131415
16171819202122
23242526272829
3031