Press "Enter" to skip to content

Category: Uncategorized

Plotting Time Series Growth Rates

Steven Sanderson builds a chart:

The ts_growth_rate_vec() function is part of the healthyR.ts library, designed to work with numeric vectors or time series data. It calculates the growth rate or log-differenced growth rate of the provided data, offering valuable insights into the underlying trends and patterns.

Read on to see how this function works, as well as several examples of plotting growth rates of airline data which exhibits both strong cycles and an overall trend.

Comments closed

Indexing in SQL Server Graph

Hugo Kornelis offers a mea culpa:

In my post, I pointed out that SQL Server automatically creates a unique nonclustered index on the (internal and hidden) column graph_id, that you can delete, but can’t modify. I added that users cannot specify indexes on that column, nor on the other internal and hidden columns that exist in edge tables (from_obj_id, from_id, to_obj_id, and to_id). These statements are factually correct, but misleading.

I then concluded from those facts that once you delete the automatically generated index, you can not re-create it. This is actually incorrect.

Read on for the correct answer, as well as more information on indexing practices and advice on the right kinds of indexes to create on graph tables.

Comments closed

Storing the Basis of Calculations, Not Results

Vladimir Khorikov hits on a topic of particular interest:

On the front-end, the duration is represented as 1h 47m. But how should we store it in our database? What data type should we use?

We could keep the duration as a 1h 47m string, just as we render it on the screen, but what if we later decide to change the string format? We might need to display it as 1:47 or 107 minutes instead. Do we then parse all the existing strings and convert them into a new format?

I like the thrust of this article. Two things I think people forget about in data modeling are ledgers and events, instead trying to store the results of these. Ledgers and events can be slower—though there are things we can do to pre-calculate more user-friendly results and speed up the process—but they provide you auditability, flexibility in how you want to calculate and display the information, and the ability to correct errors over time.

Comments closed

Bit Manipulation in SQL Server 2022

Itzik Ben-Gan twiddles some bits:

The need to manipulate data at the bit level with bitwise operations isn’t common in T-SQL, but you might stumble into such a need in some specialized scenarios. Some implementations store a set of flags (yes/no, on/off, true/false) in a single integer or binary-typed column, where each bit represents a different flag. One example is using a bitwise representation of a set of user/role permissions. Another example is using a bitwise representation of a set of settings turned on or off in a given environment. Even SQL Server stores some flag-based data using bitwise representation.

Here’s the deal. I don’t mind that this new syntax exists, particularly because—as Itzik points out—there are areas built into SQL Server which use integers to store bit flags. In application code, however, this gets a sharp “No!” from me in any code review. If you need to decompose values in your table as a matter of course, your table is not in first normal form. Having a table not be in 1NF isn’t the end of the world but at that point, I think the onus is on the developer to defend the violation at that point.

Comments closed

Views: Indexed or Otherwise

Erik Darling explains an important difference:

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

And naturally, those indexed views are different from materialized views in Oracle/PostgreSQL but that’s a topic for another day.

Comments closed

Creating an Azure Redis Cache

Arun Sirpal continues a series on Azure Redis:

Remember – basic should never be used for production. Also, if you need dedicated service then you will not want C0 because this is based on shared infrastructure. Redis can get expensive but could be cost – effective especially if you design to use a multi app approach per cache.

I select P1 – Premium with 6GB cache just to talk a couple things through.

As a note, 6GB of cache is a lot in most environments. That’s because your average cached element size in Redis should be measured in single-digit or double-digit bytes, not kilobytes. You’re typically caching individual values, not entire documents, so if you average 64 bytes per cached key-value combo, you can get somewhere around 90 million values in cache at a time. The database call savings add up quickly, considering a really simplistic estimation: if the average number of queries before expiration for a cached item is 3, a single “cycle” of caching saves you about 270 million database calls. That can allow you to downscale your relational databases considerably, saving a lot of money in the process. There’s a lot of hand-waving I’m doing in the math and a lot of complexity I’m wiping away, but both of those tend on average to make the cache more effective, not less.

Comments closed

Algorithmic Trading with ML.NET

Zadhid Powell has an example of working with ML.NET:

Machine learning is one of those areas of programming which is very capable of invitations and creativity. But, are you limited to any particular language like Python or R to develop either AI or ML projects? Who says that?

Nowadays, many developers have started learning to program with C#. But, if you’re one of them, you’d probably have heard that C# is not the best choice to start programming or it’s just useful for windows applications which is a wrong belief.

I mean, I’m still going to the mats for F# here but yeah, anybody who says C# is a bad starting language for programmers or that you can only build Windows apps with it lacks sufficient information on the language to make a sound judgment.

Comments closed

Auditing Logins and Finding Unused Tables with XESmartTarget

Gianluca Sartori continues a series on XESmartTarget. First up is a process to audit successful logins:

Sometimes you are not interested in the individual events captured by a session, but you want to extract some information from a series of events, by grouping and aggregating them. This is the case of events that happen very often, like, logon events, that can help you validate your story.

Imagine that you inherited a big, busy and chaotic SQL Server instance, with lots of databases and logins. One of the things that you probably want to do is track which logins are active and which ones are not and can be safely disabled.

Once you have that in place, how about unused objects?

The previous recipe showed you how to capture data from Extended Events sessions, summarize it in memory and then save it to a table in SQL Server, merging with any existing rows. This comes extremely handy when the events are not useful individually, but when the story is told by the aggregation of all the events.

Another possible problem that can be solved with the same technique is finding unused objects in the database. It looks like a trivial problem, but it’s not.

Read on to see how you can solve both of these issues.

Comments closed

Redis Streams for Apache Kafka Users

Paul Brebner gives us an overview of Redis Streams:

The Redis Streams data type is newer than the Redis Pub/Sub data type, and is designed to support “disconnected” distributed streaming applications. The data type itself is essentially an append-only data structure, stored in memory—basically preserved messages!

This differs from Redis Pub/Sub channels, which are focused only on the delivery of messages to currently connected subscribers only—Pub/Sub uses a push-based delivery mechanism, and if there are no current connected subscribers, then messages are simply discarded. And channels don’t remember messages to enable disconnected subscribers to catch up with missed messages, replay messages, or read different ranges of messages, etc. But channels are fast!

I’m still partial toward Kafka but Redis Streams are pretty nice.

Comments closed