Press "Enter" to skip to content

Day: September 10, 2020

Tumbling and Sliding Windows in Flink

Kundan Kumarr takes us through two different window types in Apache Flink:

In the previous blog, we talked about Flink’s windows operator, a heart of processing infinite streams. Generally in Flink, after specifying that the stream is keyed or non keyed, the next step is to define a window assigner. The window assigner defines how elements are assigned to windows. Flink provides some useful predefined window assigners like Tumbling windowsSliding windowsSession windows, Count windows, and Global windows. We can use any of them as per our use case or even we can create custom window assigners in Flink.

In this blog, we will learn about the first two window assigners i.e., Tumbling and sliding windows. These two window assigners, assign elements to windows based on time, which can either be processing time or event time.

Click through for a description of each.

Comments closed

Forcing a Shrink of TempDB

Monica Rathbun has a script for people who just want to watch the world burn:

Occasionally, we must resize or realign our Tempdb log file (.ldf) or data files (.mdf or .ndf) due to a growth event that forces the file size out of whack. To resize we have three options, restart the SQL Server Service, add additional files, or shrink the current file. We most likely have all been faced with run away log files and in an emergency situation restarting the SQL Services may not be an option but we still need to get our log file size smaller before we run out of disk space for example. The process of shrinking down that file can get tricky so I created this flow chart to help you out if you ever get into this situation.

Now its very important to note that many of these commands will clear your cache and will greatly impact your server performance as it warms cache backup. In addition, you should not shrink your database data or log file unless absolutely necessary. But doing so,  it can result in a corrupt tempdb.

Click through for the process, including a well-made flow chart with a large “Know what you’re doing before doing this” warning.

Comments closed

How SQL Server Stores XML Data

Randolph West continues a series on internal storage of data types:

The SQL Server XML data type supports both schema-defined and schema-less XML documents (typed and untyped), and partial documents. Partial documents are still considered well-defined XML, but don’t have surrounding <?xml>...</xml> tags for example. Either way, the storage engine validates the data when inserting or updating into an XML column.

The XML data type lets us store up to 2GB per column per row, but it’s prudent here to note that just because we can, it doesn’t mean we should.

Click through for the details.

Comments closed

GetAllTheErrorLogs: Combining Multiple Log Sources

Aaron Bertrand has a new project:

There’s a whole lot of grunt work in there that shouldn’t have to be done by a person. I don’t think you could automate the whole thing, because it is hard to predict exactly what events will be interesting and not, but I think 90% is achievable.

A colleague mentioned that they want to build something that would help, but even when that happens, that would up in proprietary code that only helps us. I saw Drew Furgiuele’s post on Building a Better Get-SQLErrorLog, and that gave me some ideas for what I would build. After reaching out to Drew, I created a GitHub repository with a working name of GetAllTheErrorLogs. Its elevator pitch is a simple sentence:

Powershell to assemble a timeline combining salient events from Windows Event Log, Failover Cluster log, and SQL Server errorlog.

Click through for the details as well as Aaron’s current progress.

Comments closed

Recursive Common Table Expressions

Itzik Ben-Gan continues a series on table expressions:

Recursive CTEs have many use cases. A classic category of uses has to do with handling of graph structures, like our employee hierarchy. Tasks involving graphs often require traversing arbitrary length paths. For example, given the employee ID of some manager, return the input employee, as well as all subordinates of the input employee in all levels; i.e., direct and indirect subordinates. If you had a known small limit on the number of levels that you might need to follow (the degree of the graph), you could use a query with a join per level of subordinates. However, if the degree of the graph is potentially high, at some point it becomes impractical to write a query with many joins. Another option is to use imperative iterative code, but such solutions tend to be lengthy. This is where recursive CTEs really shine—they enable you to use declarative, concise and intuitive solutions.

I’ll start with the basics of recursive CTEs before I get to the more interesting stuff where I cover searching and cycling capabilities.

Read the whole thing. There are some good uses of recursive common table expressions (I used them to figure out nested foreign key relationships, for example), so it’s worth your time.

Comments closed

Automating Database Restorations

Hugo Kornelis walks us through automated restoration of database backups:

Now I’ve been to quite a few conferences. And I’ve heard a lot of DBAs talk about best practices. One of the constants in those talks is: automate your restores. So I felt confident that a quick internet search would surely be enough to find me an existing script for restoring a database. Sure, I’d need to modify it to restore to a test database, but that should be minimal effort.

To my surprise, I was unable to find a script. Is scripting this so easy that every DBA can do it with their eyes closed, and nobody feels a need to share it? Is it so hard that they all decided they’re sitting on gold and will only share it for big money? Or were my internet search skills simply severely lacking?

Anyway, bottom line is that I had to do it myself. And I’ll share the result in this post.

Click through for the script and a detailed explanation of how it works.

Comments closed

Automation Tips for ETL

Richard Swinbank shares some tips around automating ETL processes:

I spent a good part of my career in the UK National Health Service (NHS). We often talk about data engineering problems in terms of handling large volumes of data, but the challenge in the NHS was frequently complexity – even a small acute hospital might have a dozen or more separate systems, running on various different DBMS backends, with many tables of interest for management or central reporting.

You might need to extract data from hundreds of (mostly small) tables to populate your data warehouse – my challenge was to make this as quick and easy to implement as possible.

Click through for the tips.

Comments closed