Press "Enter" to skip to content

Day: October 3, 2024

Working with the Apache Flink Table API

Martijn Visser takes us through the Flink Table API:

Apache Flink® offers a variety of APIs that provide users with significant flexibility in processing data streams. Among these, the Table API stands out as one of the most popular options. Its user-friendly design allows developers to express complex data processing logic in a clear and declarative manner, making it particularly appealing for those who want to efficiently manipulate data without getting bogged down in intricate implementation details.

At this year’s Current, we introduced support for the Flink Table API in Confluent Cloud for Apache Flink® to enable customers to use Java and Python for their stream processing workloads. The Flink Table API is also supported in Confluent Platform for Apache Flink®, which launched in limited availability and supports all Flink APIs out of the box.

This introduction highlights its capabilities, how it integrates with other Flink APIs, and provides practical examples to help you get started. Whether you are working with real-time data streams or static datasets, the Table API simplifies your workflow while maintaining high performance and flexibility. If you want to go deeper into the details of how Table API works, we encourage you to check out our Table API developer course.

Read on to learn more information about how the Table API works in comparison to other interfaces.

Comments closed

An Overview of LightGBM

Vinod Chugani continues a series on tree-based classification techniques:

LightGBM is a highly efficient gradient boosting framework. It has gained traction for its speed and performance, particularly with large and complex datasets. Developed by Microsoft, this powerful algorithm is known for its unique ability to handle large volumes of data with significant ease compared to traditional methods.

In this post, we will experiment with LightGBM framework on the Ames Housing dataset. In particular, we will shed some light on its versatile boosting strategies—Gradient Boosting Decision Tree (GBDT) and Gradient-based One-Side Sampling (GOSS). These strategies offer distinct advantages. Through this post, we will compare their performance and characteristics.

Read on to learn more about LightGBM as an algorithm, as well as how to use it.

Comments closed

Plotting the ROC Curve in Microsoft Fabric

Tomaz Kastrun gets plotting:

ROC (Receiver Operation Characteristics) – curve is a graph that shows how classifiers performs by plotting the true positive and false positive rates. It is used to evaluate the performance of binary classification models by illustrating the trade-off between True positive rate (TPR) and False positive rate (FPR) at various threshold settings.

Read on to see how you can generate one in a Microsoft Fabric notebook. Tomaz also plots a density function for additional fun.

Comments closed

Recovering Power BI Reports You Cannot Download

Kurt Buhler grabs a report:

Below are some reasons why you might not be able to download your Power BI report or model from a workspace:

  • The report was created in the service:
    • Someone created the report manually (using the User Interface) and connects to a model in another workspace.
    • Someone created the report programmatically (for instance, using the REST APIs).
    • Power BI created the report automatically (for instance, it copied the report to a workspace that belongs to a later stage in a deployment pipeline)
  • You used the REST APIs to re-bind a report (changed which model it connects to as a data source).
  • The model has incremental refresh enabled.
  • The model uses automatic aggregations.
  • The model was modified via an XMLA endpoint.
  • Other scenarios described in the limitations in the Microsoft documentation.

When you encounter this scenario, you see something like the following image, which shows the Download this file option greyed out from the File menu of the Power BI report.

Read on to see how you can nonetheless recover these published reports using the semantic-link-labs library.

Comments closed

Managing SQL Agent Job History

Joe Gavin prunes some history:

The SQL Server Agent is a very powerful job scheduling and alerting tool that’s tightly integrated with SQL Server. It’s quite possible you’re only using it for basic maintenance tasks like database backups, index maintenance, DBCC checks, etc., and the default retention is fine. However, you may also be using it for much more, i.e., executing multiple step jobs that execute multiple SSIS packages, and you need to retain a longer than the default job history.

Read on to learn more, including how one really useful-sounding checkbox doesn’t quite work the way you’d expect.

Comments closed

Mounting Azure Data Factory in Fabric Data Factory

Andy Leonard takes up a factory job:

Thanks to the hard work of the Microsoft Fabric Data Factory Team, it’s now possible to mount an Azure Data Factory in Fabric Data Factory. This post describes one way to mount an existing Azure Data Factory in Fabric Data Factory. In this post, we will:

  • Mount an existing Azure Data Factory in Fabric Data Factory
  • Open the Azure Data Factory in Fabric Data Factory
  • Test-execute two ADF pipelines
  • Modify and publish an ADF pipeline

Read on to see how it all works. One of the odd things about Microsoft Fabric—and its predecessor, Azure Synapse Analytics—is the penchant for similar-but-not-quite-the-same services. Yes, we have Data Factory…but it’s not quite the same. Yes, we have Azure Data Explorer (and KQL)…but it’s not quite the same. I get that there are reasons for this (such as not having a resource group with a dozen separate services hanging around), but I’m sure it’s a bit frustrating working on several separate code bases and trying to keep them all approximately in sync.

Comments closed