Press "Enter" to skip to content

Day: December 30, 2020

Bayesian Modeling of Holiday Behavior

Daniel Marthaler and Brian Coffey have an interesting post:

As the year unfolds, our demand fluctuates. Two big drivers of that fluctuation are seasonality and holidays. With the holiday season upon us, it’s a great time to describe how both seasonality and holiday effects can be estimated, and how you can use this formulation in a predictive time series model.

In this post, we describe the difference between seasonality and holiday effects, posit a general Bayesian Holiday Model, and show how that model performs on some Google Trends data.

Read the whole thing.

Comments closed

Deleting Messages and Topics in Kafka

The Hadoop in Real World team has a pair of related posts. The first is on how to remove messages in a Kafka topic:

The easiest way to purge or delete messages in a Kafka topic is by setting the retention.ms to a low value. retention.ms configuration controls how long messages should be kept in a topic. Once the age of the message in a topic hits the retention time the message will be removed from the topic.

Note the below steps delete or purge messages in your topic. Use precaution when executing the below.

Because Kafka is an immutable log rather than “final” storage, the ideal scenario has you never deleting data. But sometimes you just run low on disk space. You can also set the max retention size as another option. But note that these aren’t going to let you delete a single message—that’s not a good thing to do with a log; rather, you offset or cancel out the message and submit a new one.

The second post covers deletion of a Kafka topic:

In this post we will see how to delete a Kafka topic and get the details of the topic before deleting it.

Comments closed

Chaining with DirectQuery for Power BI Datasets

Wolfgang Strasser explains the notion of chaining when working with Power BI datasets:

In my last blog post I introduced the new concept of DirectQuery for Power BI datasets. This feature allows you to extend and modify a (remote) published Power BI dataset with the help of a local model.

The local model does not contain a copy of the remote dataset but a reference to it. You, as Power BI developer, are able to extend the referenced model with new data sources (like the Excel file I used in my previous example) and/or extend the model with new measures, columns and so on. For a new data model, relationships between the two data islands can be created.

Read on for examples of how this can be useful and what the current limitations look like.

Comments closed

Using the Open Source R or Python Runtime with Machine Learning Services

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.

From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!

However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

External Table Not Accessible because Content of Directory Cannot be Listed

Liliam Leme troubleshoots an error when working with a serverless SQL pool in Azure Synapse Analytics:

Following this lab: Lab: Serverless Synapse – From Spark to SQL On Demand – Microsoft Tech Community

You may experience this message: 

Failed to execute the query because content of directory cannot be listed) 

This is due to an extra step required to enable the AAD to pass through the firewall on the storage.

Click through for the solution.

Comments closed

Adding a Database Project to GitHub

Elizabeth Noble shows how you can get your brand new Azure Data Studio project into GitHub:

Once you have the database project created, you’ll want to get your database project added to source control so that you (and others) can modify and manage your database code. This next step is the beginning of allowing you and others to work on the same databases and minimize the risk of overwriting someone else’s work or deploying the wrong code to Production.

Tools like GitHub Desktop and SourceTree have definitely made things easier, especially for the happy path scenarios.

Comments closed