Press "Enter" to skip to content

Day: March 3, 2023

Visualizing PyTorch Models

Adrian Tam describes a model:

PyTorch is a deep learning library. You can build very sophisticated deep learning models with PyTorch. However, there are times you want to have a graphical representation of your model architecture. In this post, you will learn:

  • How to save your PyTorch model in an exchange format
  • How to use Netron to create a graphical representation.

Click through for the article, which is mostly about training the PyTorch model. Visualizing it turns out to be pretty easy with the right tool.

Comments closed

Understanding the Fold Function

Prakhar takes us through the fold function in functional programming, using Scala as the language of choice:

“fold” is a common operation in programming languages including Scala where we essentially use it to “reduce” (note that “reduce” is also an operation in programming languages and has a special meaning in Scala as well). In this blog, we will learn how to use the fold function, understand different types of fold operations (including foldLeft and foldRight), and try to understand how it all works. Although fold operation can be applied on Option, Future, Try, etc.. but here we will understand it through List

Fold is extremely useful for things like “I want to calculate a sum but it’s got to be a conditional sum” or when you have more complex mathematical operations to combine elements together. It can take a while to get comfortable with the syntax, but once you do, it opens mental doors.

Comments closed

Working with Kafka from Python

Dave Shook has a new course for us:

If you’re a Python developer, our free Apache Kafka for Python Developers course will show you how to harness the power of Kafka in your applications. You will learn how to build Kafka producer and consumer applications, how to work with event schemas and take advantage of Confluent Schema Registry, and more. Follow along in each module as Dave Klein, Senior Developer Advocate at Confluent, covers all of these topics in detail. Hands-on exercises occur throughout the course to solidify concepts as they are presented. At its end, you will have the knowledge you need to begin developing Python applications that stream data to and from Kafka clusters.

Read on to learn more about it and give it a try.

Comments closed

Data Mesh Q&A

Jean-Georges Perrin answers some questions:

How about data virtualization? If you have different Data Hubs with different data models, how do you integrate them?

As illustrated in the next figure, you can use data virtualization pointing to various physical data stores. Your onboarding pipeline can be “virtual” or at least leveraging virtualized data stores. You will gain in data freshness by reducing latency but you may be limited in the number of data transformations you want to perform towards your interoperable model.

Read on for the full set of questions and answers.

Comments closed

Trying out DuckDB

Mark Litwintschik gives DuckDB a go:

DuckDB is primarily the work of Mark Raasveldt and Hannes Mühleisen. It’s made up of a million lines of C++ and runs as a stand-alone binary. Development is very active with the commit count on its GitHub repo doubling nearly every year since it began in 2018. DuckDB uses PostgreSQL’s SQL parser, Google’s RE2 regular expression engine and SQLite’s shell.

Click through to see how you can install it on Ubuntu, perform some basic configuration, and work with the tool.

Comments closed

Tips on Navigating Postgres Documentation

Laetitia Avrot dishes dirt on Postgres documentation:

I could have created a very easy post with quick tips on psql, like how to disable this horrible pager the “ancient” Postgres contributors insist on keeping on by default (BTW, it’s \pset pager off, you’re welcome, you’ll thank me later), but as I wrote an entire website on that exact topic, I thought I needed to find something else.

So here is my topic: how to use the Postgres documentation! Yes, that documentation content is great, but no, that documentation is not easy to navigate at first.

Click through for tips on the best ways to navigate through this documentation, as well as important pages and topics based on your use case and role.

Comments closed

Unit Testing Spark Notebooks in Synapse

Arun Sethia grabs the oscilloscope:

In this blog post, we will cover how to test and create unit test cases for Spark jobs developed using Synapse Notebook. This is an extension of my previous blog, Synapse – Choosing Between Spark Notebook vs Spark Job Definition, where we discussed selecting between Spark Notebook and Spark Job Definition. Unit testing is an automated approach that developers use to test individual self-contained code units. By verifying code behavior early, it helps to streamline coding practices for larger systems.

Arun covers three major use cases: when your code is in an external library, when it is in a separate notebook, and when it is in the same notebook.

Comments closed

Creating a Disaster Recovery Plan for Synapse

Freddie Santos talks HA/DR with Synapse:

Many of our customers have been asking about creating a disaster recovery plan for their Synapse Workspace. In a new blog series, we will cover the basics of disaster recovery and business continuity, discussing available options and custom solutions.

In this first post, we’ll review important concepts and questions to answer before building a disaster recovery plan, including the differences between High Availability and Disaster Recovery.

The focus in this post is on the dedicated SQL pool and Azure Data Lake Storage Gen2 (because people still think about Gen1?), though that’s the majority of what you’d need to think about—Spark pools and the serverless SQL pool really drive from the data lake. There’s also Data Explorer pools, which have their own storage and HA/DR capabilities.

Comments closed