Press "Enter" to skip to content

Day: January 7, 2022

Apache Flink ML 2.0.0

Dong Lin and Yun Gao make an announcement:

The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library. Moreover, we added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML with state-of-the-art performance.

Congratulations to everybody who contributed to the project; it’s a big milestone.

Comments closed

Anomaly Detection in Two Ways

Muhammad Asad Iqbal Khan shows how you can use isolation forests and kernel density estimation for outlier detection:

Just like the random forests, isolation forests are built using decision trees. They are implemented in an unsupervised fashion as there are no pre-defined labels. Isolation forests were designed with the idea that anomalies are “few and distinct” data points in a dataset.

Recall that decision trees are built using information criteria such as Gini index or entropy. The obviously different groups are separated at the root of the tree and deeper into the branches, the subtler distinctions are identified. Based on randomly picked characteristics, an isolation forest processes the randomly subsampled data in a tree structure. Samples that reach further into the tree and require more cuts to separate them have a very little probability that they are anomalies. Likewise, samples that are found on the shorter branches of the tree are more likely to be anomalies, since the tree found it simpler to distinguish them from the other data.

Click through for descriptions and the code.

Comments closed

Combining Azure DevOps and Databricks

Anna Wykes continues a series on DevOps for Databricks:

An Environment Variable is a variable stored outside of the Python script; in our instance it will be stored on the DevOps Agent running the DevOps Pipelines. Consequently, it is accessible to other scripts/programs running on the DevOps Agent. We will not cover DevOps Agents in this blog specifically, the simplest description is that they are the compute that runs your pipeline, normally a VM (Virtual Machine) or Docker Container

Read the whole thing.

Comments closed

Mastermind in R

Tomaz Kastrun continues a series of useful games:

Playing a simple guessing game with R. It’s called Mastermind game! This game was originally created for two people, but R version will be for single-player mode, when an R developer or R data scientists need a break.

The gameplay is simple and so are the rules. The board contains 10 rows (or more) with possibilities of four colours and code pegs (white or black). R engine stores a secret colour combination and user selects a random combination.

Click through to see it in action.

Comments closed

Permission Requirements for Temp Tables

Jeff Iannucci looks at permissions:

Managing permissions is a constant issue for Database Administrators, but rarely do DBAs consider permissions for tempdb. Everybody’s looking for something, but how often do you get requests for “access to read and write in the tempdb database”? Like…never?

OK, but what if you were asked the subject of this post in a job interview? Even if you’ve worked with SQL Server for ages, would you know how to answer this? Moreover, would you know why the answer should give you some concern?

Read on for the answers.

Comments closed

Lessons Learned Troubleshooting High CPU in Azure SQL DB

Kendra Little has an after-action report:

I’ve just had the pleasure of publishing my first new article in the Microsoft Docs, Diagnose and troubleshoot high CPU on Azure SQL Database.

This article isn’t really “mine” – anyone in the community can create a Pull Request to suggest changes, or others at Microsoft may take it in a different direction. But I got to handle the outlining, drafting, and incorporation of suggested changes for the initial publication.

It was a ton of fun, and I learned a lot about Azure SQL Database in the process.

Click through for what Kendra learned specific to Azure SQL Database, and also read the article itself.

Comments closed

Flexible File Components with SSIS

Bill Fellows hides SSIS DNA in a can of Barbasol shave cream:

The Azure Feature Pack for SSIS is something I had not worked with before today. I have a client that wants to use the Flexible File Task/Flexible File Source/Flexible File Destination but they were having issues. The Flexible File tools allow you to work with Azure Blob storage. We were dealing with ADLS Gen2 but the feature pack can work with classic blob storage as well. In my hubris, I said no problem, know SSIS. Dear reader, I did not know as much as I thought I did…

Click through for a whopper of a story. But be sure to read to the very end, as you don’t want to stop at using TLS 1.0.

Comments closed

Discovering Pester Tags

Jeffrey Hicks has a two-parter on discovering Pester tags. Part one is Jeffrey’s take:

As I resolved at the end of last year, I am doing more with Pester in 2022. I’m getting a bit more comfortable with Pester 5 and as my tests grow in complexity I am embracing the use of tags. You can add tags to different Pester test elements. Then when you invoke a Pester test, you can filter and only run specific tests by their tag. As I was working, I realized it would be helpful to be able to identify all of the tags in a test script. After a bit of work, I came up with a PowerShell function.

Part two is a reader’s take:

Yesterday I shared some PowerShell code I wrote to discover tags in a Pester test. It works nicely and I have no reason to complain. But as usual, there is never simply one way to do something in PowerShell. I got a suggestion from @FrodeFlaten on Twitter on an approach using the new configuration object in Pester 5.2. I’ll readily admit that I am still getting up to speed on the latest version of Pester. That’s one of my goals for this year, so this was a great chance to learn something new.

Click through to see how both approaches work.

Comments closed