Press "Enter" to skip to content

Month: March 2025

Local Text Summarization via DistilBart

Muhammad Asad Iqbal Khan summarizes a document:

Text summarization represents a sophisticated evolution of text generation, requiring a deep understanding of content and context. With encoder-decoder transformer models like DistilBart, you can now create summaries that capture the essence of longer text while maintaining coherence and relevance.

In this tutorial, you’ll discover how to implement text summarization using DistilBart. You’ll learn through practical, executable examples, and by the end of this guide, you’ll understand both the theoretical foundations and hands-on implementation details. After completing this tutorial, you will know:

Click through for the article.

Comments closed

Serving Databricks Models via API Management Endpoints

Drew Furgiuele makes available a model:

When it comes to generative AI projects I’d argue that the hardest and most tedious part has moved into a new area: hosting and serving your models. Whether you’re working with CPU intensive models, or models that require GPU horsepower, sourcing the hardware, building out deployment pipelines, configuring monitoring, and then securing everything is real, serious work that requires everyone to lean in to get it right.

And then, there’s the real question of how you’re going to use those models: will you be setting up automation and doing batch processing using your models and infrastructure? Or do you want to get really serious and offer up real-time inference? If the latter, you can add one more thing to solve for: managing your front-end APIs that you will have to build to support that use case.

Click through to see how you can use an API management tool (like Azure API Management) to assist in these things.

Comments closed

Publishing a Fabric SQL Database

Koen Verbeeck deploys a database:

When a SQL Database is in Microsoft Fabric, you can develop it locally in a database project. As part of the development process, you want to deploy this project to the online Fabric SQL Database. The database project also contains pre- and/or post-deployment scripts that need to be executed as part of the deployment process. How can this goal be achieved?

Click through for the answer.

Comments closed

Load Testing Azure SQL Databases

Reitse Eskens sets the stage:

Some time ago, I wrote a number of blogposts comparing the different Azure SQL options to give you some idea about performance, differences between tiers and differences between the Stock Keeping Units (SKU’s). This was done by creating data in the database itself and review the metrics. This works fine and gave a good overview of the different tiers and SKU’s. For reference, you can find those blogs here.

For the new series, I’ve thought of a new process that aligns more with my regular line of work, data warehousing. This means ingesting a lot of data and modelling it.

Click through for the summary of method and initial notes.

Comments closed

Automating Management of Extended Statistics in PostgreSQL

Andrei Lepikhov builds an extension:

The extended statistics tool allows you to tell Postgres that additional statistics should be collected for a particular set of table columns. Why is this necessary? – I will try to quickly explain using the example of an open power plant database. For example, the fuel type (primary_fuel) used by a power plant is implicitly associated with the country’s name.

Click through to learn more about what extended statistics are and the nature of the extension.

Comments closed

Handling SQL Agent Dates and Durations

Andy Mallon disparages some Microsoft intern’s summer of 1996 project:

SQL Agent’s schema is older than me. It handles dates, times, and durations like it’s 1980 by using integers instead of date/time data types. My buddy Aaron Bertrand talks more about Dating Responsibly so that you can have a good datetime with your own database.

I was writing a query to pull recent job failures from SQL Agent’s msdb job history, and knew that I didn’t want to deal with the wonky date/time formats. Specifically, I was querying msdb.dbo.sysjobhistory to find the Start Time, End Time, and Duration of job runs that failed. If you aren’t familiar with that table, you can look at it over in the docs.

Andy does point out the built-in function but then explains why a separate function is superior. Andy also happens to furnish that function, so check it out.

Comments closed

Comparing Pandas to Other Libraries for Data Processing

Vidyasagar Machupalli performs a comparison:

As discussed in my previous article about data architectures emphasizing emerging trends, data processing is one of the key components in the modern data architecture. This article discusses various alternatives to Pandas library for better performance in your data architecture. 

Data processing and data analysis are crucial tasks in the field of data science and data engineering. As datasets grow larger and more complex, traditional tools like pandas can struggle with performance and scalability. This has led to the development of several alternative libraries, each designed to address specific challenges in data manipulation and analysis.

This is by no means a comprehensive test, but it does show off quite a few libraries that perform similar actions to Pandas.

Comments closed

Microsoft Fabric Shortcuts and Lakehouse Maintenance

Dennes Torres has a public service announcement:

I wrote about lakehouse maintenance before, about multiple lakehouse maintenancespublished videos about this subject and provided sample code about it.

However, there is one problem: All the maintenance execution should be avoided over shortcuts.

The tables require maintenance in their original place. According to our solution advances, we start using shortcuts, lots of them. Our maintenance code should always skip shortcuts and make the maintenance only on the tables.

Click through to see how you can differentiate shortcuts from actual tables and write code to avoid shortcuts.

Comments closed

Custom Visual Dialog Boxes Broken in Power BI Desktop February 2025

Marco Russo has some bad news for us:

As one of the founders of OKVIZ—a company dedicated to producing custom visuals—I have been following the recent developments in Power BI Desktop with particular concern. This issue, however, extends beyond our company and affects many other organizations that rely on custom visuals to enhance their business intelligence solutions. For this reason, I use my blog on SQLBI to reach a larger audience.

Click through for the problem. Marco has an update that Microsoft pledges to have the problem fixed with the March release of Power BI Desktop.

Comments closed