Press "Enter" to skip to content

Day: March 17, 2022

Dynamic DAGs with Apache Airflow

Bhavya Garg explains how we can create dynamic directed acyclic graphs in Apache Airflow:

Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know

Read on for an example.

Comments closed

A Conceptual Discussion of Active Learning

Kevin Jacobs teaches us to learn:

Active Learning is a method in which data is annotated in s smart way. With data annotation, you would normally get to see a randomly selected item which you need to label. This however can lead to a lot of repetition of similar items which you have to label. This is a waste of time. A better way would be to use Active Learning. For Active Learning, a batch of random items is selected first. Then, a lightweight classifier is used for evaluating the previously annotated data.

Basically, run your prediction mechanism, find the things about which the mechanism is least certain, and figure those out. Doing this reduces ambiguity and quickly leads to a better model.

Comments closed

Using the Q&A Visual in Power BI

Gauri Mahajan tries out the Q&A visual:

The speed at which the options for data hosting, data processing and data management keep growing, the options for data consumption have also been growing at the same pace. Traditionally, applications and reports used to be the most common and most frequent means of consuming data. As data consumption means matured with time, chatbots, analytics engines, machine learning and artificial intelligence tools and many others. Traditionally, to explore the data, some of the common mechanisms have been using database query languages, preparation of reports by report designers and data exploration in a self-service manner by power users. With the evolution of capabilities like machine learning, artificial intelligence, natural language processing and others, some of the popular and modern methods of data exploration includes natural language-based data analysis, voice-enabled data exploration using smart devices, computer vision-based data analysis, etc. While many of these methods are highly sophisticated and need user training for a user to employ these data exploration methods, natural language-based data exploration is one of the most popular data exploration methods. This method is offered out-of-box by many reporting tools including Tableau and Power BI as well.

The Q&A visual is a really cool concept which works a surprising amount of the time. The problem is that when it doesn’t work, it feels like pushing a string: no matter what you do, it just doesn’t quite do what you need it to.

Comments closed

Dealing with Shift Times

Kenneth Fisher knows what time it is:

One of the more interesting jobs I’ve had over the years was for a company that created emergency room software. It was pretty cool software and I learned a lot, both about writing queries in SQL Server and about how a software company can be run. One of the more interesting things in the various reports we created was the concept of shift calculations. In other words, what happened during a given shift.

I’ve had to do something similar (though it was for nurse scheduling rather than emergency rooms). Things get really tricky when you start dealing with 12-hour and 16-hour shifts, tracking overtime, and the like.

Comments closed

From Cosmos DB to the Serverless SQL Pool

Jovan Popovic shows off Synapse Link:

The serverless SQL pools enable you to implement near-real-time analytics solutions on top of your Cosmos DB data. Serverless SQL pools with the Synapse Link provide a cost-effective analytics solution for analyzing NoSQL data stored in Cosmos DB, which is not affecting or spending the resource units on your Cosmos DB transactional store. You can run heavy analytics on the serverless SQL pools that will not affect your workload or price of the main Cosmos DB transactional store. The serverless SQL pools enable you to use the T-SQL query language for analytics that enables you to connect the reporting & analytics tools (such as Power BI, Analytics Services) from a large ecosystem that works with SQL Server or Azure SQL database.

When you are integrating the serverless SQL pools in your solution, you need to apply some best practices. There are general best practices for the serverless SQL pools in the Synapse Analytics workspace, but some of these settings are not applicable to the Cosmos DB scenario. Probably you will use only a subset of the best practices that you can find here. In this post, you will find only the best practices that you should apply in the Cosmos DB solution and some additional hints that could help you to optimize your solution.

Click through to see how the process works and a few recommendations.

Comments closed

Views in MySQL

Robert Sheldon continues a series on getting started with MySQL:

Like other database management systems, MySQL lets you create views that enable users and applications to retrieve data without providing them direct access to the underlying tables. You can think of a view as a predefined query that MySQL runs when the view is invoked. MySQL stores the view definition as a database object, similar to a table object.

Read on for plenty of detail around views. Even if you know how views work in another RDBMS, there are nuances to each of them you’ll want to understand.

Comments closed

PSProjectStatus

Jeffery Hicks wants to check Git status:

I write a lot of PowerShell modules. And probably like you, I am working on more than one project at a time. I was finding it difficult to keep track of what I was working on and what I might be neglecting. So I turned to PowerShell and created a tool that I use to keep on top of my projects. The PowerShell module is called PSProjectStatus and you can install it from the PowerShell Gallery. You can find the project on GitHub, but I thought I’d provide an introduction here.

Read on to see how it works.

Comments closed