Press "Enter" to skip to content

Month: September 2024

Generating Embeddings in Oracle from a Function or Trigger

Brendan Tierney continues a series on generative AI in Oracle:

In my previous post, I gave examples of using Cohere to create vector embeddings using SQL and of using a Trigger to populate a Vector column. This post extends those concepts, and in this post, we will use OpenAI.

Warning: At the time of writing this post there is a bug in Oracle 23.5 and 23.6 that limits the OpenAI key to a maximum of 130 characters. The newer project-based API keys can generate keys which are greater than 130 characters. You might get lucky with getting a key of appropriate length or you might have to generate several. An alternative to to create a Legacy (or User Key). But there is no guarantee how long these will be available.

Assuming you have an OpenAI API key of 130 characters or less you can follow the remaining steps. This is now a know bug for the Oracle Database (23.5, 23.6) and it should be fixed in the not-too-distant future. Hopefully!

Read on to learn more.

Comments closed

Connecting to Power BI as a Guest User

Koen Verbeeck can only enter a tenant with explicit permission:

Sometimes your Microsoft Entra ID account (formerly known as Azure Active Directory) is added as a guest user in another tenant. This happens quite a lot when you’re a consultant and your client can’t create a new user in their own tenant, so they add the account of your own company as a guest instead. If you’re not a consultant, it can also happen after a merger or acquisition and you’re suddenly stuck with multiple tenants.

Yeah, this is a real annoyance with Microsoft Fabric / Power BI. Koen links to a 5-year-old feature request that I recommend upvoting.

Comments closed

Updating the Default Lakehouse of a Notebook

Sandeep Pawar makes a change:

I have written about default lakehouse of a Fabric notebook before here and here. However, unless you used the notebook API, there was no easy/quick way of removing all/selective lakehouses or updating the default lakehouse of a notebook. But thanks to tip from Yi Lin from Notebooks product team, notebookutils.notebook.updateDefinition has two extra parameters, defaultLakehouse and defaultLakehouseWorkspace which can be used to update the default lakehouse of a notebook. You can also use it to update environment attached to a notebook. Below are some scenarios how it can be used.

Click through for those scenarios.

Comments closed

Hierarchical Data Types in Postgres

Florent Jardin builds a hierarchy:

The SQL standard defines a set of rules so that database systems can be interchangeable, but there are small singularities in the wild. In this regard, the hierarchyid data type provided by SQL Server is a striking example. If you are switching to PostgreSQL, two solutions are available to you.

A first and simpler solution consists in linking each node to its parent using a new parentid column and applying a foreign key constraint. Another, more complete approach consists in using the ltree extension. This article deals with the latter case.

Read on to learn more.

Comments closed

Page Navigation in Power BI

Nikil Prabhakar turns the page:

Navigating between pages in Power BI reports is a critical aspect of creating a smooth and interactive user experience. Over time, Power BI has introduced several ways to implement page navigation, each with its own benefits and challenges. In this article, we’ll explore three different methods of page navigation and their use cases:

  1. Button Navigation
  2. Page Navigator
  3. Slicer and Button Combination

Read on for those three mechanisms. There’s also a video taking you through the process.

Comments closed

Testing Kafka Messages with RecordCaptor

Anton Belyaev shows off an open-source utility:

Let’s take a Telegram bot that forwards requests to the OpenAI API and returns the result to the user as an example. If the request to OpenAI violates the system’s security rules, the client will be notified. Additionally, a message will be sent to Kafka for the behavioral control system so that the manager can contact the user, explain that their request was too sensitive even for our bot, and ask them to review their preferences.

The interaction contracts with services are described in a simplified manner to emphasize the core logic. Below is a sequence diagram demonstrating the application’s architecture. I understand that the design may raise questions from a system architecture perspective, but please approach it with understanding — the main goal here is to demonstrate the approach to writing tests.

Read on to see how it all works, as well as links to Anton’s GitHub repo for testing in Kafka.

Comments closed

Using Managed Identities in Azure Logic Apps

Koen Verbeeck doesn’t want to change a password yet again:

A stored procedure is executed on an Azure SQL Database. The connection to this database was configured using SQL Server Authentication. The goal of this article is to show you how you can connect using managed identities instead, which was left as an exercise to the reader in the previous article.

I recommend you to go through this article first if you don’t have a solid understanding of Logic Apps, or if you want to follow along as an exercise. It’s not necessarily a prerequisite to understand the concepts of this article and if you’re just interested in learning how managed identities work for Logic Apps, then keep on reading.

Click through to learn more about managed identities in Azure and how they can be so useful.

Comments closed

Domain Lineage in Microsoft Fabric

Sandeep Pawar creates 1000 words of value:

In Fabric, you can use the Domains to create a data mesh architecture. It allows you to organize the data and items by specific business domains within the organization and make the overall data architecture decentralized. You can create domains within domains and assign workspaces to each domain. As it grows, you may find it challenging to understand how the domains & workspaces have been organized. Below code will help you trace the domains, subdomains and the workspaces assigned to them.

Click through to see how you can use the graphviz library in Python to generate a simple domain chart.

Comments closed

When to Think about Scalability

Steve Jones hits upon a dilemma:

There may not be a large workload in production either, at least not at first.

So, what do you worry about first: your code being used or performing well? That’s a similar question to this one: Worry about Scalability or Popularity First? While most of us don’t work for a startup and our organizations have some sort of financial stability, does popularity matter?

I don’t always have a solid answer for this. The closest I have is to try to make my baseline:

  • Easy for maintainers (including myself) to read
  • Reasonably efficient
  • Capable of some level of scale but not necessarily the most scalable

If you want practical terms, I create a somewhat-educated guess on approximately how many rows there will be in a steady state after launch and then multiply that by a factor of 2-3 when generating test data. If you can dodge 2-3x expectations, you can dodge a ball.

And if you suddenly balloon to 10x, you grouse and grumble and spend a couple of sprints digging out from the mess of success.

Comments closed

Compression Tuple Filtering in TimescaleDB

Sven Klemm talks compression:

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, we’re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed data—without sacrificing performance or flexibility.

Read on to learn a bit more about compression in Postgres and TimescaleDB, as well as how compression tuple filtering works.

Comments closed