Press "Enter" to skip to content

Month: October 2021

Trying a Read-Only API

Mark Litwintschik reviews ROAPI:

ROAPI is an API Server that exposes CSV, JSON and Parquet files without the need to write any code. The project was started by Qingping Hou around this time last year. Qingping had spent the better part of four years working at LinkedIn prior to joining Scribd as a Senior Engineer. He is also a committer to both the Apache Airflow and Arrow projects.

ROAPI is made up of 4K lines of Rust. This line count is low due to the intense use of 3rd party libraries. These include Apache Arrow for, among other things, Parquet support, Arrow’s DataFusion Project, which provides SQL and query execution support, Actix, which provides the HTTP interface and Rusoto, the AWS SDK for Rust.

Click through to see how to set it up and how to use it.

Comments closed

Power BI Storage Modes and Aggregations

Phil Seamark dives into storage modes in Power BI:

How to choose the correct storage mode for Power BI Tables.

This article aims to help explain the different storage modes available when designing an aggregation strategy for a Power BI Report. What each storage mode is and when you would use it. Picking the correct storage mode for each table in your model can significantly affect overall performance.

Click through for the tl;dr version, but stay for the whole thing.

Comments closed

Optimizing for Mediocre

Erik Darling points out an issue with some approaches to preventing parameter sniffing problems in queries:

Despite the many metric tons of blog posts warning people about this stuff, I still see many local variables and optimize for unknown hints. As a solution to parameter sniffing, it’s probably the best choice 1/1000th of the time. I still end up having to fix the other 999/1000 times, though.

In this post, I want to show you how using either optimize for unknown or local variables makes my job — and the job of anyone trying to fix this stuff — harder than it should be.

Click through for two methods, both of which end up being the wrong answer.

Comments closed

Value Comparisons with Nullable Columns

Chad Baldwin wants to check if rows exist before inserting:

I haven’t posted in a while, so I thought I would throw a quick one together to hopefully restart the habit of writing and posting on a regular basis.

One of my first blog posts covered how to only update rows that changed. In that post, I described a popular method that uses EXISTS and EXCEPT to find rows that had changed while also implicitly handling NULL values.

Click through for two types of technique, one for non-nullable data and one which can include NULL.

Comments closed

TensorFlow Fundamentals

Tanishka Garg starts a series on TensorFlow:

TensorFlow is an open-source end-to-end machine learning library. It is for preprocessing data, modeling data, and serving models (getting them into the hands of others).

It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML. And developers easily build and deploy ML-powered applications.

Read on for basic setup instructions and a primer on tensors.

Comments closed

Dataclasses in Python

Evan Seabrook takes us through a Python library:

If you’re really lucky, there will be a docstring for this function that outlines the structure of the parameter user, saving you from having to dig through the function and identify the possible keys that exist in parameter user.

The problem here is twofold:

1. Dictionaries in python are mutable and can have arbitrary schemas. 

a. This in itself isn’t a problem and can be a good thing, depending on your needs. Its usage, however, is really only enabled by the quality of the second point, which is:

2. You must rely on the documentation to know the structure, and the documentation must stay updated as the structure evolves.

Read on to see how the dataclass library can create a wrapper around dictionary objects.

Comments closed

Time Series Insights in Azure

Aveek Das explains the notion of Azure Time Series Insights:

In this article, we are going to learn in detail about Azure Time Series Insights. Microsoft Azure is one of the leading cloud providers these days. With a lot of companies adopting or migrating to the cloud these days, it has become a usual trend to convert existing technologies into cloud-based services and consume them. This not only helps the companies to reduce their cost but also in turn allows them to focus on more business-related problems rather than concentrating on infrastructure costs.

Azure Time Series Insights is one of the cloud services that users can use to integrate with their data that is constantly changing with time such as data from various sensors or machines, data from satellites, airlines etc. Any data that can be generated on a high scale and needs to be analysed, can be used through Azure Time Series Insights. In this article, we will focus on a high-level introduction of this service along with some use cases in detail.

Read on for the article.

Comments closed

Foreign Keys and Delete Operations

Kenneth Fisher takes us through a case of deleting rows:

Deleting rows from a table is a pretty simple task right? Not always. Foreign keys, while providing a ton of benefits, do make deletes a bit more complicated.

Click through for an example of this, as well as a quick discussion of cascading deletes, which sound really useful until you make a big mistake. The other problem with cascading deletes is, even if you do intend to delete everything noted, the process is a lot slower than what you can do in batches, and you’re liable to increase the size of your transaction log file to boot.

Comments closed