Spark session is a unified entry point of a spark application from Spark 2.0. It provides a way to interact with various spark’s functionality with a lesser number of constructs. Instead of having a spark context, hive context, SQL context, now all of it is encapsulated in a Spark session.
Read on to learn more about SparkSession and how you can use it.
In this blog, we’ll move one step forward to get an understanding of the Dual streaming model to see what abstractions does KSQL use to process the data.
All the data that we are working on with KSQL is produced to Kafka topics by some client. This client can be any Application, Kafka connectors etc., which produces continuous never-ending data to the topics.
KSQL does not directly interact with these topics, it rather introduces a couple of abstractions in between to process the data, which are known as Streams and Tables.
Read on to learn what these are and why it’s useful to think in these terms.
The ISO/IEC 9075:2016 standard (aka SQL:2016) introduces support for Row Pattern Recognition (RPR) in SQL. Similar to using regular expressions to identify patterns in a string, RPR allows you to use regular expressions to identify patterns in a sequence of rows.
To me, it’s the next step in the evolution of window functions. If you think that window functions are profound and useful, RPR is really going to bake your noodle.
RPR has limitless practical applications, including identifying patterns in stock market activity, handling time series, fraud detection, material handling, shipping applications, DNA sequencing, gaps and islands, top N per group, and many others.
I’ve voted it up and recommend you do so too. This is a great way to think of streams of data sitting in a database. If you have business use cases where this could help, adding those as comments would be great too.
First things first: what is a semi-additive calculation? Any calculation can be either additive, non-additive or semi-additive. An additive measure uses SUM to aggregate over any attribute. The sales amount is a perfect example of an additive measure. Indeed, the sales amount for all customers is the sum of the individual sales for each customer; at the same time, the amount over a year is the sum of the amounts for each month.
A non-additive measure does not use SUM over any dimension. Distinct count is the simplest example: the distinct count of products sold over a month is not the sum of the distinct counts of individual days. The same happens with any other dimension: a distinct count of products sold in a country is not the sum of the distinct counts of the products sold in each city in the country.
Semi-additive calculations are the hardest ones: a semi-additive measure uses SUM to aggregate over some dimensions and a different aggregation over other dimensions – a typical example being time.
Semi-additive measures are probably the trickiest of the three, as you can easily work with additive measures and you know you won’t be able to do much with non-additive measures.
Andy Leonard takes two shots at adding the SSIS Catalog to a Docker container. First up is the version which doesn’t work:
I have been working on getting an SSIS Catalog running in a container for a couple years.
I share this post not to discourage you.
I share it to let you know one way I failed.
I thought I had succeeded when the PowerShell in this post worked. The PowerShell works, by the way – just not in a container configured thus.
This is but one failure.
I failed more than once, I promise.
I can hear some of you thinking, “How do we accomplish this, Andy?”
I’m glad you asked.
The answer is “We modify our container.”
Disclaimer: I’m about 100% certain there’s another way to do this and about 99% sure there’s a better way. I’m going to show you what I did. Cool?
Read on to see how Andy did it.
Query execution plans, otherwise known as actual execution plans or just Showplan, provide a map of all the required operations to get the query output, which includes runtime performance statistics.
Query Store (QS) is a valuable tool for troubleshooting workload trends via specific queries – this is because QS aggregates performance data on queries. However, sometimes we need to get the specific, singleton query execution plan to analyze and troubleshoot – this is where xEvents come in.
Since SQL Server 2012 we’ve had the query_post_execution_showplan xEvent for this. As the name suggests, it gets you the actual query plan – because it is *after* execution – when we have the runtime statistics available. However this xEvent is based on the standard query execution statistics profile infrastructure(quite a mouthful) – or standard profiling for short (read more about it here). This has a very high overhead (75%+ with a TPC-C like workload) which is why its use needs to be seriously considered, and most likely not used unless in last resort.
In more recent releases we have other alternatives for these requirements to get the singleton actual execution plans, based on the lightweight query execution statistics profile infrastructure – or lightweight profiling for short (read more about it here). These xEvents are listed below, where we’ll see examples on how to use them.
Pedro does a good job of taking us through the available events and what we get from them.
One of the Power BI improvements in the March 2019 Desktop release was reduced bubble size for the Map visual. I previously wrote about the benefit of the reduction in point/bubble size. I was unaware until recently that this change made it into more than the Map visual.
The ability to reduce the point size also appears in the Format options for the Power BI Scatter chart. Previously, you could change the size option from 0 to 100 under the Shapes area. As with the Map, the Scatter now allows you to reduce the size as low as -30. I did not see this mentioned in the March Desktop blog post. I must have missed it if it was part of a previous month’s release. In any case, if you were not aware that you could set the point size from -30 to 100with the Scatter chart, now you do.
For most scenarios, I think the dot size is probably a little too big. -30 is generally too small, but I’m happy that they offer us options to get it right.
The creation of this view has chewed up a bunch of storage. It has jumped right up to the number two spot on the biggest objects list within this database. You can see that differences by comparing the highlighted rows to the previous image. The vPerson view is highlighted in red in this second image to help point it out quickly.
Surely this must be a contrived example and people don’t really do this in the real world, right? The answer to that is simply: NO! It DOES happen. I see situations like this all too often. Far too often, large text fields are added to an indexed view to make retrieval faster. I have mimicked that by adding in two XML columns from the Person.Person table. This is definitely overkill because a simple join back to the table based on the BusinessEntityID would get me those two columns. All I have effectively done is duplicated data being stored and I have achieved that at the low low cost of increased storage of 25% for this small database. If you are curious, the column count between the Person.Person table and this new view is 13 columns each.
Jason takes us through a couple more gotchas and provides some important advice you should follow if you think indexed views might be a fit for you.
In this DMV, obviously there are a number of columns. This post will focus mostly on the object_type column. This column has a number of potential values, including Event, Target and Action. When setting up a SQL Server Extended Event session these are a few of the key items that need to be filled out.
Let’s just take a few moments to identify what these are.
Click through to see where you can find targets, actions, and event descriptions.