Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API:

The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to the interoperability of DataSet and Table API, you can even use relational Table API or SQL queries to analyze and process state data.

For example, you can take a savepoint of a running stream processing application and analyze it with a DataSet batch program to verify that the application behaves correctly. Or you can read a batch of data from any store, preprocess it, and write the result to a savepoint that you use to bootstrap the state of a streaming application. It’s also possible to fix inconsistent state entries now. Finally, the State Processor API opens up many ways to evolve a stateful application that were previously blocked by parameter and design choices that could not be changed without losing all the state of the application after it was started. For example, you can now arbitrarily modify the data types of states, adjust the maximum parallelism of operators, split or merge operator state, re-assign operator UIDs, and so on

Read on to learn more about how this works.

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook:

I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I believe), and you had to sign up, to get access to the “bits”. Well, you did not really get any “bits”; what you did get was access to Python deployment scripts.

Now, September 2019, the BDC is in public preview (you do not have to sign up), and it has reached Release Candidate (RC) status, RC 1. The install method has changed, or rather, in addition to installing via deployment scripts, you can now also install using Azure Data Studio deployment notebooks, and that is what this blog post is about.

Having gone through this myself, there’s quite a bit of reading involved in the setup, but they make the process pretty smooth. This also shows off one of the key benefits of notebooks: documentation and code together.

Key Concepts of Convolutional Neural Networks

Srinija Sirobhushanam takes us through some of the key concepts around convolutional neural networks:

How are convolution layer operations useful?
CNN helps us look for specific localized image features like the edges in the image that we can use later in the network Initial layers to detect simple patterns, such as horizontal and vertical edges in an image; and deeper layers detect complex patterns.

This is a rather high-level description of the topic, but it serves to give an understanding of what you need to know before building and using these networks.

Using dbatools Instead of RDP

Garry Bargsley shows how we can use dbatools to do some of the things which we might naturally do with Remote Desktop:

So you hear this spread across the Twitterverse and Blogosphere. You should not RDP your SQL Servers to do administrative work. My nature has always been to troubleshoot issues from the server in an RDP session.

When I received a disk space alert on a development system I was about to RDP and do my thing. But I said wait, let me approach this from a different perspective…

To the rescue is dbatools as always seems to be the case these days.

Remoting is much less resource-intensive and it lets you scale out to dozens, hundreds, or thousands of servers without any more effort on your part. It’s rare that you get constant scaling, so take advantage of it where you can.

File Sizes in dbatools

Chrissy LeMaire gives us several ways to format file sizes with dbatools:

Within dbatools, you may notice file sizes are pretty and human-readable.

That was some C# based magic created by Microsoft PFE and creator of PSFrameworkFred Weinmann. In the background, SQL Server often gives us different types of numbers to represent file sizes. Sometimes it’s bytes, sometimes it’s megabytes. We wanted to standardize the sizing in dbatools, and thus the dbasize type was born.

Human-readable file sizes are great but they make it difficult to compare when piping sets of data to Format-Table. Knowing how to override this when necessary gets you the best of both worlds.

Reading SQL Server Logs from Kubernetes

Anthony Nocentino shows us how we can use kubectl logs to read the SQL Server Error Log:

We can use follow flag and that will continuously write the error log to your console, similar to using tail with the -f option. If you remove the follow flag it will write the current log to your console. This can be useful in debugging failed startups or in the case below, monitoring the status of a database restore. When finished you can use CTRL+C to break out and return back to your prompt.

Read on for a brief demonstration.

T-SQL Tuesday 118 Roundup

Kevin Chant played Santa Claus this month:

I hope they had as much fun contributing their posts as I had reading them afterwards. For those who missed the invitation you can read about it here.

For some reason I thought it’d be a good idea to do it on my birthday month when I first given choices about which month to host. However, now I cannot remember why.

Read on for 23 separate blog posts covering quite a few peoples’ desires for new features, bugfixes, or changes in functionality.

Comparing Power BI Files

Imke Feldmann shows off a new Power BI file comparison tool:

What’s not covered?
Nothing. The comparison includes everything from the pbit-files: So beneath your M and DAX code, you’ll see all about your visual definitions (incl. filters set !), row level security and much, much more. Actually, I found some information a bit noisy (like many date fields, telling you when which changes happened). So I filtered them out in Excel. I’d recommend to check it out and play a bit with it to find the most suitable settings for you.

This looks quite useful.

Modeling Semi-Additive Measures

Paul Poco shows a couple techniques for modeling semi-additive measures in Analysis Services and Power BI:

As mentioned earlier, the most commonly encountered approach is Option 2, the snapshot fact table. The main drawback of this approach is that the fact table’s size will grow extremely fast. For example, if you want to calculate the headcount in a company with 10,000 employees on average, and you want 5 years of historical data, you will add 10,000 rows per day to your fact table – that gives you (10,000 * 365 * 5 =) 18,250,000 rows after 5 years.  

If you used the first approach, Option 1, the fact table would be (10,000 * 5 =) 50,000 rows after 5 years, assuming your employees change position or quit the company once a year, on average. 

The snapshot fact table (Option 2) is (18,250,000 / 50,000 =) 365 times bigger. On the bright side, as the data is very repetitive, you might get a very good compression ratio on these tables.  

Check it out. Semi-additive measures are not as common as additive measures, but you’re liable to have a couple of them in your data model.

Icon Maps in R

Laura Ellis shows how you can build maps full of little icons:

That was ok, but we should try to make the images more aesthetically pleasing using the magick package. We make each image transparent with the image_transparent() function. We can also make the resulting image a specific color with image_colorize().

I then saved the images using the image_write() function. I manually re-uploaded them to GH.

This was a great example of where laying icons on a map works.

Categories

September 2019
MTWTFSS
« Aug  
 1
2345678
9101112131415
16171819202122
23242526272829
30