Press "Enter" to skip to content

Day: April 15, 2020

Serialization in Apache Flink

Nico Kruber walks us through the viable set of serializers in Apache Flink:

Flink handles data types and serialization with its own type descriptors, generic type extraction, and type serialization framework. We recommend reading through the documentation first in order to be able to follow the arguments we present below. In essence, Flink tries to infer information about your job’s data types for wire and state serialization, and to be able to use grouping, joining, and aggregation operations by referring to individual field names, e.g. stream.keyBy(“ruleId”) or dataSet.join(another).where("name").equalTo("personName"). It also allows optimizations in the serialization format as well as reducing unnecessary de/serializations (mainly in certain Batch operations as well as in the SQL/Table APIs).

Click through for notes on each serializer and a graph which shows how the choice of a serializer can make a huge difference.

Comments closed

Time Series Forecasting Best Practices

David Smith talks about a new GitHub repo:

The repository includes detailed examples of various time series modeling techniques, as Jupyter Notebooks for Python, and R Markdown documents for R. It also includes Python notebooks to fit time series models in the Azure Machine Learning service, and then operationalize the forecasts as a web service.

The R examples demonstrate several techniques for forecasting time series, specifically data on refrigerated orange juice sales from 83 stores (sourced from the the bayesm package). The forecasting techniques vary (mean forecasting with interpolation, ARIMA, exponential smoothing, and additive models), but all make extensive use of the tidyverts suite of packages, which provides “tidy time series forecasting for R“. The forecasting methods themselves are explained in detail in the book (readable online) Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos (Monash University).

This looks really cool.

Comments closed

Finding the xp_cmdshell Running User

Steve Stedman shows which user account actually runs commands when you use xp_cmdshell:

When you execute xp_cmdshell SQL Server runs a command at the operating system level similar to the old DOS prompt, or CMD shell. There are many security concerns and misconceptions about xp_cmdshell as documented in an earlier post.

The purpose of this post is not to debate whether xp_cmdshell is safe or not, but instead to show what user commands sent to xp_cmdshell are being run as.

Read on for a sample script.

Comments closed

Find and Replace Database Objects in SSMS

Daniel Hutmacher has a fun trick for us:

Here’s a quick tip that touches on one of the powerful SSMS tricks in my “Management Studio Level-Up” presentation. Say you have a potentially large number of database objects (procedures, functions, views, what have you), and you need to make a search-and-replace kind of change to all of those objects.

You could, of course, put the database in source control and use a proper IDE to replace everything, then check your code back into source control and commit it to the database. That’s obviously the grown-up solution. Thanks for reading this post.

But let’s say for the sake of argument that you haven’t put your database in version control. What’s the lazy option here?

Even if you do have everything in source control, the first part is still quite useful just in case there’s schema drift.

Comments closed

Using Trello as a Power BI Data Source

Gilbert Quevauvilliers shows how we can use Trello as an input for Power BI:

Where I am consulting, they use Trello boards which enables them to keep track of what tasks are being done, getting done and how things are progressing.

An interesting question came up asking how much work has been done. And I thought this could be done via Trello and looking at the number of tasks in the boards that have gone from To Do to Completed.

Below are the steps of how I completed this.

It’s a mashable world.

Comments closed

The Basics of tSQLt

Jess Pomfret walks us through the basics of tSQLt:

Getting started with tSQLt is really easy- you download a zip file, unzip the contents and then run the tSQLt.class.sql script against your development database.

There are a couple of requirements, including CLR integration must be enabled and your database must be set to trustworthy.  These could open up some security considerations, but for my development environment it’s no issue.

This is where I’d say putting the database in a container would be extremely helpful, as then you can destroy it afterward. I’m not sure if that’d work, as SQL Server on Linux doesn’t support unsafe or external access assemblies (and I’m not sure what tSQLt requires there).

Comments closed

Installing SSMS on Servers Running SQL Server?

Andy Mallon says yes, install SQL Server Management Studio on those servers running SQL Server instances:

“But wait, Andy. That’s not a best practice!” you say?

The pseudo best practice of “don’t install SSMS” is a misguided one–advice that I even fell into repeating in the past. However, that’s actually proposed solution to a best practice, rather than being itself a best practice.

I agree with Andy wholeheartedly on this.

Comments closed

The Importance of Unit Testing Database Code

Chris Johnson shares some thoughts on unit testing database code:

This is a topic that is quite close to me heart. I don’t come from a computing background before I started working with SQL Server, so I was quite ignorant when it came to a lot of best practices that other developers who have worked with other languages are aware of. Because of this, I had no idea about unit testing until I attended a talk at a SQL Saturday all about tSQLt. If anyone isn’t aware (as I wasn’t) tSQLt is a free open source unit testing framework for use in SQL Server databases. It is the basis of Redgate’s SQL Test software, and is the most used framework for writing unit tests in SQL Server.

Since then I’ve worked to try and get employers to adopt this as part of a standard development life cycle, with mixed success at best. My current employer is quite keen, but there are two major problems. First, we have a huge amount of legacy code that obviously has no unit tests in place; and second, the way people code is not conducive to unit testing.

Click through for additional thoughts on writing good tests and an example of modularizing code to make it more testable. I’m still in the camp of “test what you can, but you can’t test everything” with databases. There’s just too much state dependency.

Comments closed