Press "Enter" to skip to content

Month: March 2022

Selective Document Copy in Elasticsearch

The Hadoop in Real World team show how to migrate specific documents when building a new index:

As shown in the other post, we still use a reindex by specifying the source and destination but this time we also specify a query in source along with the term which indicates that the documents with state = ‘ny’ will be filtered from the source.

So only documents with state ny will be copied to the new index account_v3 with this reindex operation.

Click through for an example of how this works.

Comments closed

Tempdb Improvements over Time

Melody Zacharias takes us through the history of tempdb improvements:

In my previous blog and this one, there is a lot of talk of trace flags and you can see the life cycle of them.  They are implemented and then sometimes become part of the product over time. For a long time, the trace flag 1118 was a common performance improvement trick known only by industry experts. Over the years as a consultant, I have often been asked by clients if they should use trace flags, and generally speaking, as long as they are documented by Microsoft they are safe to use.  I would certainly not recommend using undocumented trace flags.  They are not supported and therefore not recommended.  So always be sure to check the trace flag list before setting trace flags.  So of course, as I say that, I have to offer another one.  

Read on for more details and also advice on getting the most out of tempdb.

Comments closed

Merging and Appending Queries in Power Query

Joe Billingham shows off two similar-sounding functions:

I had an issue recently where a third-party application had been updated and both the new and legacy versions were being used side-by-side. Logging data from both versions was being written to two separate Azure SQL databases.

The customer needed a Power BI report showing both old and new logging data sets as a single source. If both databases were SQL Server databases, I could have written a view with a cross-database join, imported that into Power BI and thought no more about it. However, the two sources being Azure SQL Databases, with no easy way to join the tables, caused an issue.

This is where the Merge and Append functions in Power Query come in.

Read on for the solution.

Comments closed

Delta Lake Operability in Azure Synapse Analytics

James Serra lets us know when and where we can use Delta Lake within Azure Synapse Analytics:

Many companies are seeing the value in collecting data to help them make better business decisions. When building a solution in Azure to collect the data, nearly everyone is using a data lake. A majority of those are also using delta lake, which is basically a software layer over a data lake that gives additional features. I have yet to see anyone using competing technologies to delta lake in Azure, such as Apache Hudi or Apache Iceberg (see A Thorough Comparison of Delta Lake, Iceberg and Hudi and Open Source Data Lake Table Formats: Evaluating Current Interest and Rate of Adoption).

Read on for more information.

Comments closed

Saving Data Reqeusts

Kenneth Fisher shares a good idea:

The other day a friend of mine mentioned that they were questioned on one of the scripts they ran recently and they were in fact able to pull that script back up and show them. I was reminded of something I’ve thought about writing several times so it seemed like a good excuse.

An old manager of mine started his team saving all of our work in folders on a shared network space. A few months back I started doing this again, just in sub folder of my documents folder.

Yep. Every data request I have gets saved in a folder. Because those are usually in Excel, I have the spreadsheet for output and the original query as a separate tab within the report

Comments closed

Workload Analysis with XESmartTarget

Gianluca Sartori continues a series on XESmartTarget. First up, let’s analyze a workload:

The idea comes from a blog post by Brent Ozar about “How to Find Out Whose Queries are Using The Most CPU“. Brent uses the Resource Governor to detect who’s using the CPU. That’s an interesting approach, but you can do the same more efficiently with XESmartTarget.

Analyzing a workload means capturing all the queries on a server, categorize them by application name, database name and login name, and creating samples at regular intervals, in order to describe the behavior of the workload over time, let’s say every one minute.

From there, we’ll capture the queries running on our server:

In the previous recipe we used XESmartTarget to analyze a workload and we could characterize it by application/login/database. Now let’s take one more step: let’s capture all the queries and their plans and add query_hash and query_plan_hash to the analysis table. We will also capture plans and query text on separate tables.

It is fun getting to see all of the versatility in XESmartTarget.

Comments closed

Capturing a Call Stack without a Dump

Bob Dorr shows off Arvind’s SQL Call Stack Resolver:

Some outputs, such as the XEvent call stack action output the raw stack frame information and require a rebase to loaded module information in order to symbolize.   The security feature for random address virtualization loads images at different addresses each time the image is loaded.   This requires the module base address and the raw address in order to calculate the relative virtual address for symbolization.

Click through for more information.

Comments closed

Making Visual Elements Glow in Excel

Vincent Granville does some marketing:

I explain here how to do it in Excel. But the principle is general. You can produce this type of visualization with other tools. There are many features in Excel that allow you to generate marketing-style pictures. I never really used them, as the result can be cheesy. If overused, you end up with material that looks like advertising from a Casino, or like the “old world wide web”, where blinking fonts and documents with neon colors were popular. But I recently decided to give it a try again, using extreme moderation. I believe my experiment was successful. I will leave it to the reader to have a final say about it.

It turns out a lot less gaudy than I originally imagined.

Comments closed

Building a Recommender in Spark

Avinash Sooriyarachchi makes a recommendation:

There has been an exponential increase in the volume and variety of data at our disposal to build recommenders and notable advances in compute and algorithms to utilize in the process. Particularly, the means to store, process and learn from image data has dramatically increased in the past several years. This allows retailers to go beyond simple collaborative filtering algorithms and utilize more complex methods, such as image classification and deep convolutional neural networks, that can take into account the visual similarity of items as an input for making recommendations. This is especially important given online shopping is a largely visual experience and many consumer goods are judged on aesthetics.

In this article, we’ll change the script and show the end-to-end process for training and deploying an image-based similarity model that can serve as the foundation for a recommender system. Furthermore, we’ll show how the underlying distributed compute available in Databricks can help scale the training process and how foundational components of the Lakehouse, Delta Lake and MLflow, can make this process simple and reproducible.

Click through for the process.

Comments closed

Understanding Plans: Seeks and Scans

Erik Darling made me rhyme. First up, data retrieval via seek:

People. People complain. People complain about SQL Server. That’s probably why I get paid to deal with it, but whatever.

One complaint I get to hear week in and week out is that SQL Server isn’t using someone’s index, or that there are too many index scans and they’re slow.

That might actually be a composite of like twelve complaints, but let’s not start counting.

Erik gives us three reasons why we might not see a seek. But wait, there’s more!

I’m not sure why scans got such a bad rap. Perhaps it’s a leftover from the Bad Old Days© when people worried about logical fragmentation and page splits.

What I mean to say is: scans are often treated with a level of revulsion and focus that distracts people from larger issues, and is often due to some malpractice on their part.

This is true–scans aren’t inherently bad and Erik gives us a better mental model to work with.

Comments closed