Press "Enter" to skip to content

Category: Uncategorized

Building A Secure Microservice Which Uses Kafka Streams

George Vetticaden has a reference architecture with sample code for a secured microservice running atop Kafka Streams:

One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.

If you’re trying to wrap your head around streaming apps, give this a try.  George has all of the code in his GitHub repo.

Comments closed

When Azure SQL DB Tells You You’re Hitting Your DTU Limits

Brent Ozar shows us how to use sp_BlitzFirst to determine if you’re hitting DTU limits:


You’ve got an Azure SQL DB, and your queries are going slow. You’re wondering, “Am I hitting the performance limits? Is Microsoft throttling my queries?”
There’s an easy way to check: run sp_BlitzFirst. sp_BlitzFirst is our free performance health check stored procedure that analyzes a lot of common performance issues and then gives you a prioritized list of reasons why your server might be slow right now.

Click through for an example.  The same signal can mean “it’s time to move to a higher level in your tier,” “it’s time to move up to the next tier,” or “it’s time to tune that cursor in a cursor in a cursor which performs several scans of a very large table for each operation.”

Comments closed

An Overview Of Today’s Data And Event Ecosystem

John Hammink walks us through a laundry list of tools in the modern data platform:

InfluxDB

The rapid instrumentation of the physical world due to IoT and data-collecting applications has led to an explosion of time-stamped data. Time series databases serve this evolving niche, and among them, InfluxDB is emerging as a major player. InfluxDB, like others, can handle complex logic or business rules atop massive — and fast-growing — data sets, and InfluxDB adds the advantage of a range of ingestion methods, as well as the ability to append tags to different data points. Aiven also provides a managed version, Aiven InfluxDB.

Even with this list, there are still plenty more, including some of my favorites. H/T DZone

Comments closed

Separating Totals In DAX With IsInScope

Kasper de Jonge shows how we can use the IsInScope function in DAX to separate out parts of hierarchies like totals:

Finally I drag in the Values column and the Hierarchy in a matrix (I also turned on the +/- icons so we can expand collapse which is another November feature). The first thing we see is that the matrix looks weird with blanks and all.

The reason we are seeing this is this is how the data is set up, we have stored totals and grand totals at the lowest level in the hierarchy. Instead we want to show to them at the level where they are correct. Also we don’t want to show the aggregation created by the SUM.

Click through for the full example.

Comments closed

How Join Hints Affect Adaptive Joins

Grant Fritchey looks at the combination of adaptive joins and query hints which specify join type:

I’ve highlighted the interesting bit. “Actual Number of Locally Aggregated Rows” is part of aggregation push down, explained by the amazing Niko Negebauer here and here. Basically, the aggregation is occurring with the data access. So while there is a Hash Match operator for the aggregation, actually, the active part of the aggregation was performed within the columnstore. That’s why the Actual Number of Rows coming out of the columnstore index itself is 0, but the number of rows coming out of the Hash Match Aggregate is 441.

So… why not another aggregate push down when we used the hint? Because the hint says, we MUST use a hash join. At that point the optimizer has no choices on where, when, how it does data processing. It must, first, ensure that a hash join is used, so it does. First thing out of the gate, hash join. Then a hash aggregate. This difference in behavior results in a 24% decrease in performance. The only interesting thing is that the reads remained consistent. This means that it was just the processing of the join that added overhead.

Read the whole thing.

Comments closed

Recovery_Pending State After Moving SQL Server Files

Jon Shaulis gives us a couple of reasons why our databases could be stuck in Recovery_Pending state after moving file locations:

The scripts I ran to edit the rest of the databases looked similar to the below:

Once I finished altering all of my database files to their new locations, I stopped the SQL Server Service in Services. I copied and pasted all MDF and LDF files to their correlated new destinations and then started the SQL Server Service once more.

That’s when I ran into the interesting issue of “Recovery in a Pending state”. Some digging and sleuthing brought me back to my scripts.

Read on for those causes.

Comments closed

Using APPLY To Aggregate Unpivoted Data

Dan Clemens gives us yet another practical use of the APPLY operator:

I had a situation last week where I needed to find the MIN() and MAX() values of some data. Normally this would be pretty straightforward, but the problem was I needed to identify the MIN() and MAX() values amongst multiple columns. So today I bring you a SQL tip using APPLY.

Looking at sys.procedures we can see I have a create_date and a modify_date column for each record. What if we needed to return the MIN() or MAX() value from those 2 (or more) columns?

SELECT [Name]
,create_date
,modify_date
FROM sys.procedures AS p
WHERE p.[name] = 'ChangePassword';

In this two-column example, it’s not too difficult.  As you add more and more columns, the solution remains the same, though the urge to ask why all of these dates are unpivoted might increase…

Comments closed

Faking Arrays In T-SQL With Custom Types

Jovan Popovic shows how to use custom types as pseudo-arrays in SQL Server:

One of the missing language features in T-SQL language is array support. In some cases you can use custom types to work with arrays that are passed as parameters to your stored procedures.

Custom types in T-SQL enable you to create alias for some table, .Net or built-in type. Once you create a custom type, you can use it for local variables and parameters of functions.

I go back and forth on whether I’d like full array support in T-SQL, as on the plus side, it simplifies interactions with external tools.  On the other hand, it can promote bad habits like violating first normal form.

Comments closed

Estimating Columnstore Compression

Niko Neugebauer has a way of estimating disk size after creating a columnstore index on a table:

For anyone working with SQL Server since version 2005 (with Service Pack 2 to be precise) there is a very common task when thinking or research the system optimisation – the usage of the compression, and before advancing with this step the question that one usually receives is – “can you estimate how much improvement we shall get?”
For this purpose since SQL Server 2008, we have a very useful stored procedure that is called sp_estimate_data_compression_savings, that is capable of providing us with the estimation of how much storage we can save by enabling or moving to a more effective compression method (as in NONE | ROW | PAGE types).
Columnstore Indexes have appeared in SQL Server 2012 (that is well over 6 years ago) and even though from time to time I would ask and suggest to enable the stored procedure sp_estimate_data_compression_savings to start supporting Columnstore Indexes, until now there is no such support.

Until now – I am introducing a conjunction of my scripts in a comprehensive and reasonably capable stored procedure that is called “cstore_sp_estimate_columnstore_compression_savings” and that is a part of my free & open-sourced Columnstore Indexes Script Library, freely available on GitHub.

That’s a useful addition, especially when you’re trying to sell management on using clustered columnstore indexes.

Comments closed

The Evolution Of Hadoop

Holden Ackerman has an interesting analysis of Qubole customers’ adoption of Hadoop 2:

In Qubole’s 2018 Data Activation Report, we did a deep-dive analysis of how companies are adopting and using different big data engines. As part of this research, we found some fascinating details about Hadoop that we will detail in the rest of this blog.

A common misconception in the market is that Hadoop is dying. However, when you hear people refer to this, they often mean “MapReduce” as a standalone resource manager and “HDFS” as being the primary storage component that is dying. Beyond this, Hadoop as a framework is a core base for the entire big data ecosystem (Apache Airflow, Apache Oozie, Apache Hbase, Apache Spark, Apache Storm, Apache Flink, Apache Pig, Apache Hive, Apache NiFi, Apache Kafka, Apache Sqoop…the list goes on).

I clipped this portion rather than the direct analysis because I think it’s an important point:  the Hadoop ecosystem is thriving as the matter of primary importance switches from what was important a decade ago (batch processing of large amounts of data on servers with direct attached storage) to what is important today (a combination of batch and streaming processing of large amounts of data on virtualized and often cloud-based servers with network-attached flash storage).

Comments closed