2023-02-03 – Curated SQL

Dealing with Imbalanced Class Data for Image Classification

Published 2023-02-03 by Kevin Feasel

Alexander Billington needs more beta carotene:

Image classification is a standard computer vision task and involves training a model to assign a label to a given image, such as a model to classify images of different root vegetables. A big problem with classification is bias, and the models favouring a particular image class above the others. A common cause of this can be dataset imbalance, and it is often hard to spot as a model trained on an imbalanced dataset can often still have good accuracy. E.g. if there are 1000 images in the test dataset, 950 potatoes and 50 carrots and the model predicted all 1000 images to be potatoes it would still have 95% accuracy. This is also an example of why more metrics than accuracy should be considered… but let’s leave that discussion for another day.

Click through for several techniques you can use to balance out classes, with a focus on image classification. Undersampling is almost always a no-go for me, though I am much fonder of the other techniques.

Comments closed

Multi-Class Classification in PyTorch

Published 2023-02-03 by Kevin Feasel

Adrian Tam does some iris categorizing:

Now you need to have a model that can take the input and predict the output, ideally in the form of one-hot vectors. There is no science behind the design of a perfect neural network model. But you know one thing, it has to take in a vector of 4 features and output a vector of 3 values. The 4 features corresponds to what you have in the dataset. The 3-value output is because we know the one-hot vector has 3 elements. Anything can be in between, and those are known as the “hidden layers” since they are neither input nor output.

Click through for the full tutorial.

Comments closed

Flink 1.16.1 Release

Published 2023-02-03 by Kevin Feasel

Martijn Visser announces Apache Flink version 1.16.1:

The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.16 series.

This release includes 84 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.

We highly recommend all users upgrade to Flink 1.16.1.

Read on for the release notes, including links to all of the closed tickets.

Comments closed

Azure Load Testing Now GA

Published 2023-02-03 by Kevin Feasel

Darryl Taft provides an overview of a now generally available service:

Moreover, Azure Load Testing collects detailed resource metrics to help you identify performance bottlenecks across your Azure application components. You can automate regression testing by running load tests as part of your CI/CD workflow.

Azure Load Testing also creates monitoring data using the Azure Monitor service, including application insights and container insights, to capture details from the Azure services.

It’s available in 11 regions, including the best region of all (East US) and the second-best region of all (East US 2).

Comments closed

Migrating from Elasticsearch to Azure Data Explorer

Published 2023-02-03 by Kevin Feasel

Bhaskar Kakaraparthy does a logging switcharoo:

This article is an extension to an existing article to migrate data from Elastic Search to Azure Data Explorer (ADX) using Logstash pipeline as a step-step-step guide. In this article, we will explore the process involved in migrating data from one source (ELK) to another (ADX) and discuss some of the best practices and tools available to make the process as smooth as possible.

Using Logstash for data migration from Elasticsearch to Azure Data Explorer (ADX) was a smooth and efficient process. With the help of ADX output plugin & Logstash, I was able to migrate approximately 30TBs of data in a timely manner. The configuration was straightforward, and the data transfer with ADX output plugin was quick and reliable. Overall, the experience of using ADX output plugin with Logstash for data migration was positive and I would definitely use it again for similar projects in the future.

Read on to see how.

Comments closed

Database Constraints in Postgres

Published 2023-02-03 by Kevin Feasel

Grant Fritchey does some data modeling:

PostgreSQL supports constraints much like any other database management system. When you need to ensure certain behaviors of the data, you can put these constraints to work. I’ve already used several of these in creating my sample database (available articles publicly on GitHub, in the CreateDatabase.sql file). I’ll explain those as we go. The constraints supported by PostgreSQL are:

Read on for the list, which includes one constraint which doesn’t have a direct analog in SQL Server.

Comments closed

Power BI Workspace Roles

Published 2023-02-03 by Kevin Feasel

Reza Rad shares some recommendations with us:

Power BI workspaces are not like the old days when we had Edit access and View access only. You have more options for roles in a workspace, and in my courses, I have found that many people have chosen the incorrect role without knowing what the role does. In this article, I’ll explain all the roles in the workspace, and what is the best way to set them up to have a secure workspace.

Click through for the article, as well as an accompanying video. Or a video and an accompanying article, if that’s how you roll.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

Day: February 3, 2023

Dealing with Imbalanced Class Data for Image Classification

Multi-Class Classification in PyTorch

Flink 1.16.1 Release

Azure Load Testing Now GA

Migrating from Elasticsearch to Azure Data Explorer

Database Constraints in Postgres

Power BI Workspace Roles