Press "Enter" to skip to content

Category: Uncategorized

Cleaning Up Large System Databases

Josephine Bush doesn’t need enormous system databases:

Always set this on your SQL Servers so you don’t have this problem in the first place. This is in the SQL Server Agent settings. I remember having some agent jobs that used to serve this function that ran on a schedule, which may have been required in older versions of SQL Server.

Josephine focuses on SQL Agent history and database backup history, both of which are good ones. If you have an older version of SQL Server or are using the package deployment model, there may be an explosion of information in msdb regarding SSIS that you’d want to manage. Also, check if any of the databases are in Full recovery mode; if so, ensure that the backup script you’re using for transaction log backups actually backs up system databases.

Comments closed

Using Static Cursors

Hugo Kornelis digs into one type of cursor:

I have used the GLOBAL scope option, so that it is possible to step through the code one statement at a time. This option does not affect the execution plans used. The FORWARD_ONLY and READ_ONLY options are probably the most commonly used options with cursors, especially with static cursors. We will briefly look at the effect of other options for read direction and concurrency at the end of this post.

Click through to learn more about how cursors show up in execution plans and how you can tell, based on the execution plan of a static cursor, why it’s either really fast (relative to other cursor types) or really slow.

Comments closed

Plotting Time Series Growth Rates

Steven Sanderson builds a chart:

The ts_growth_rate_vec() function is part of the healthyR.ts library, designed to work with numeric vectors or time series data. It calculates the growth rate or log-differenced growth rate of the provided data, offering valuable insights into the underlying trends and patterns.

Read on to see how this function works, as well as several examples of plotting growth rates of airline data which exhibits both strong cycles and an overall trend.

Comments closed

Indexing in SQL Server Graph

Hugo Kornelis offers a mea culpa:

In my post, I pointed out that SQL Server automatically creates a unique nonclustered index on the (internal and hidden) column graph_id, that you can delete, but can’t modify. I added that users cannot specify indexes on that column, nor on the other internal and hidden columns that exist in edge tables (from_obj_id, from_id, to_obj_id, and to_id). These statements are factually correct, but misleading.

I then concluded from those facts that once you delete the automatically generated index, you can not re-create it. This is actually incorrect.

Read on for the correct answer, as well as more information on indexing practices and advice on the right kinds of indexes to create on graph tables.

Comments closed

Storing the Basis of Calculations, Not Results

Vladimir Khorikov hits on a topic of particular interest:

On the front-end, the duration is represented as 1h 47m. But how should we store it in our database? What data type should we use?

We could keep the duration as a 1h 47m string, just as we render it on the screen, but what if we later decide to change the string format? We might need to display it as 1:47 or 107 minutes instead. Do we then parse all the existing strings and convert them into a new format?

I like the thrust of this article. Two things I think people forget about in data modeling are ledgers and events, instead trying to store the results of these. Ledgers and events can be slower—though there are things we can do to pre-calculate more user-friendly results and speed up the process—but they provide you auditability, flexibility in how you want to calculate and display the information, and the ability to correct errors over time.

Comments closed

Bit Manipulation in SQL Server 2022

Itzik Ben-Gan twiddles some bits:

The need to manipulate data at the bit level with bitwise operations isn’t common in T-SQL, but you might stumble into such a need in some specialized scenarios. Some implementations store a set of flags (yes/no, on/off, true/false) in a single integer or binary-typed column, where each bit represents a different flag. One example is using a bitwise representation of a set of user/role permissions. Another example is using a bitwise representation of a set of settings turned on or off in a given environment. Even SQL Server stores some flag-based data using bitwise representation.

Here’s the deal. I don’t mind that this new syntax exists, particularly because—as Itzik points out—there are areas built into SQL Server which use integers to store bit flags. In application code, however, this gets a sharp “No!” from me in any code review. If you need to decompose values in your table as a matter of course, your table is not in first normal form. Having a table not be in 1NF isn’t the end of the world but at that point, I think the onus is on the developer to defend the violation at that point.

Comments closed

Views: Indexed or Otherwise

Erik Darling explains an important difference:

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

And naturally, those indexed views are different from materialized views in Oracle/PostgreSQL but that’s a topic for another day.

Comments closed

Creating an Azure Redis Cache

Arun Sirpal continues a series on Azure Redis:

Remember – basic should never be used for production. Also, if you need dedicated service then you will not want C0 because this is based on shared infrastructure. Redis can get expensive but could be cost – effective especially if you design to use a multi app approach per cache.

I select P1 – Premium with 6GB cache just to talk a couple things through.

As a note, 6GB of cache is a lot in most environments. That’s because your average cached element size in Redis should be measured in single-digit or double-digit bytes, not kilobytes. You’re typically caching individual values, not entire documents, so if you average 64 bytes per cached key-value combo, you can get somewhere around 90 million values in cache at a time. The database call savings add up quickly, considering a really simplistic estimation: if the average number of queries before expiration for a cached item is 3, a single “cycle” of caching saves you about 270 million database calls. That can allow you to downscale your relational databases considerably, saving a lot of money in the process. There’s a lot of hand-waving I’m doing in the math and a lot of complexity I’m wiping away, but both of those tend on average to make the cache more effective, not less.

Comments closed

Algorithmic Trading with ML.NET

Zadhid Powell has an example of working with ML.NET:

Machine learning is one of those areas of programming which is very capable of invitations and creativity. But, are you limited to any particular language like Python or R to develop either AI or ML projects? Who says that?

Nowadays, many developers have started learning to program with C#. But, if you’re one of them, you’d probably have heard that C# is not the best choice to start programming or it’s just useful for windows applications which is a wrong belief.

I mean, I’m still going to the mats for F# here but yeah, anybody who says C# is a bad starting language for programmers or that you can only build Windows apps with it lacks sufficient information on the language to make a sound judgment.

Comments closed