Press "Enter" to skip to content

Author: Kevin Feasel

Importing PDF Contents into SQL Server

Sebastiao Pereira loads a PDF:

PDF forms are widely used for data collection, document processing, and digital workflows due to their versatility and consistency across different platforms and devices. They are essential in various industries, including healthcare, education, finance, government, and business. How do you retrieve data from PDF forms and insert into a SQL Server database table?

Read on for an answer using Visual Basic, which is a name I haven’t heard in quite some time.

Comments closed

Features in Azure AI Foundry

Tomaz Kastrun continues a series:

Azure AI Foundry is all purpose tool that provides all of the capital ingredients that data scientists would need in order to create, develop and deploy the generative AI applications. The platfrom supports and gets you the following services and abilitiies:

Click through for those features and how you can access the Azure AI Foundry.

Comments closed

Interpolating Missing Values in R

Steven Sanderson fills in the blanks:

Interpolation is a method of estimating missing values based on the surrounding known values. It’s particularly useful when dealing with time series data or any dataset where the missing values are not randomly distributed.

There are various interpolation methods, but we’ll focus on linear interpolation in this article. Linear interpolation assumes a straight line between two known points and estimates the missing values along that line.

Read on to see how you can perform linear interpolation in R.

Comments closed

Blank Dates and DAX

Marco Russo and Alberto Ferrari are blanking on us:

Handling missing dates in a semantic model can be challenging, especially when working with DAX time intelligence functions. Dates might be missing for various reasons: incomplete data entry, system errors, special placeholder values like 0000, or dates set far in the future. We will see that using a blank is the best way to manage missing dates, even though you should pay attention to DAX conditional expressions operating on those dates. We will also consider how to hide these blanks in a Power BI report if their presence is not desired in charts and slicers.

Read on to learn more.

Comments closed

Optimizing AWS Costs

Albert McQuiston speaks my language (that is, saving money):

Every organization looks to save on its cloud expenses to align with business objectives. With the following tips, you can optimize your Amazon Web Services (AWS) cloud expenditure and review the key aspects where you can save more effectively.

Read on for some high-level tips. It doesn’t cover things like spot instances, but does a pretty decent job of laying out the problem and showing some of the cost and budgeting tools available to figure out where your company’s money is going.

Comments closed

External Data Sharing in OneLake

Jens Vestergaard shares some info about sharing some info:

At #MSIgnite Microsoft announced a new feature in Fabric that allows people from one organization to share data with people from another organization. You might ask yourself why is this even news, and rightly so. Up until last week, professionals have had to use tools like (S)FTP clients like FileZillaAzure Storage ExplorerWeTransfer or similar products in order to share data. Some of these tools are in fact hard to use and/or understand for a great number of business users – they are familiar with Windows and the Office suite and not much more. This is all to be expected, as business users in general should focus on business stuff rather than IT stuff.

Read on to see how this has changed, and an update to what I consider one of the coolest products to come out of Microsoft Fabric.

Comments closed

GiST Indexes and Range Queries in PostgreSQL

Lee Asher can’t be limited to a single point:

Our Part I query used the following WHERE clause:

WHERE tsrange(o.start_time, o.end_time) && tsrange(p.enter, p.leave)

The “tsrange()” functions return timestamp ranges. But overlap queries aren’t limited to timestamps; they can be constructed from integers and floating-point values too. Imagine an arbitrage database that tracks the minimum and maximum price paid for a commodity.

Read on for examples of other types of ranges, preventing range intersection, and more.

Comments closed

An Explanation of Boosting, Bagging, and Stacking Ensembles

Ivan Palomares Carrascosa disambiguates three terms:

Unity makes strength. This well-known motto perfectly captures the essence of ensemble methods: one of the most powerful machine learning (ML) approaches -with permission from deep neural networks- to effectively address complex problems predicated on complex data, by combining multiple models for addressing one predictive task. This article describes three common ways to build ensemble models: boosting, bagging, and stacking. Let’s get started!

My explanation, which makes sense for people who grew up during the 1980s: bagging is Voltron, boosting is Rocky, and stacking is three racoons in a trench coat.

Comments closed

A Review of the Azure AI Foundry

Tomaz Kastrun starts a new series:

Microsoft Azure offers multiple services that enable developers to build amazing AI-powered solutions. Azure AI Foundry brings these services together in a single unified experience for AI development on the Azure cloud platform.

Until now, developers needed to work with multiple tools and web portals in a single project. With Azure AI Foundry, these tasks are now simplified and offers same environment for better collaboration.

Read on to see more about the Azure AI Foundry.

Comments closed

Benchmarking Power BI Local Data Import Speed

Eugene Meidinger has all the data he needs on his desktop:

The chart above shows the number of seconds it took to load X million rows of data from a given data source, according to a profiler trace and Phil Seamark’s Refresh visualizer. Parquet is a clear winner by far, with MS Access surprisingly coming in second. Sadly the 2 GB file limit stops Access from becoming the big data format of the future.

Part of the reason I wanted to do these tests is often people on Reddit will complain that their refresh is slow and their CPU is maxed out. This is almost always a sign that they are importing oodles and oodles of CSV files. I recommended trying Parquet instead of CSV, but it’s nice to have concrete proof that it’s a better file source.

Read on for the chart. Also, don’t tell his accountants about the gaming laptop. It’s 100% for work purposes, just like my desktop PC. Only work, nothing else, IRS. The high-end GPU is for AI work. And the big screen is for doing big business.

Comments closed