Apache Airflow Now A Top-Level Project

Fokko Driesprong announces that Apache Airflow is now a top-level Apache project:

Today is a great day for Apache Airflow as it graduates from incubating status to a Top-Level Apache project. This is the next step of maturity for Airflow. For those unfamiliar, Airflow is an orchestration tool to schedule and orchestrate your data workflows. From ETL to training of models, or any other arbitrary tasks. Unlike other orchestrators, everything is written in Python, which makes it easy to use for both engineers and scientists. Having everything in code means that it is easy to version and maintain.

Airflow has been getting some hype lately, especially in the AWS space.

Databricks Library Utilities For Notebooks

Srinath Shankar and Todd Greenstein announce a new feature in Databricks Runtime 5.1:

We can see that there are no libraries installed and scoped specifically to this notebook.  Now I’m going to install a later version of SciPy, restart the python interpreter, and then run that same helper function we ran previously to list any libraries installed and scoped specifically to this notebook session. When using the list() function PyPI libraries scoped to this notebook session are displayed as  <library_name>-<version_number>-<repo>, and (empty) indicates that the corresponding part has no specification. This also works with wheel and egg install artifacts, but for the sake of this example we’ll just be installing the single package directly.

This does seem easier than dropping to a shell and installing with Pip, especially if you need different versions of libraries.

Capturing SQLCMD Errors

Jack Vamvas shows us how we can capture errors from SQLCMD:

I’m executing  code using SQLCMD from a batch file . The code points to a sql file and there is also an output file. 

SQLCMD -E -S MYSERVER\INST1 -i “setup_job_entry.sql” -o “setup_job_entry.log”

But I noticed that if the actual SQLCMD returns an error , for example , if I’m connecting to  an server which doesn’t exist this error message will appear in the output file – but there will  not be an ERROR number , which would allow me to trap and return an appropriate message 

There is a way and Jack shows us how.

Creating Cosmos DB Indexes

Hasan Savran explains indexing in Cosmos DB:

In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It’s kind of same thing at the end. You might ask “If everything is indexed and working fine, why do you want me to poke the well running system?” When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.

There’s just enough difference to make you pay the price if you assume Cosmos DB works just like SQL Server.

Migrating Lots Of Databases To SQL Server 2016

Andy Levy has a problem. Well, about 8000 of them. In part 1, he describes the plan:

How do you move eight thousand databases in a reasonable amount of time? I spent about an hour and a half one morning hashing ideas out w/ folks in the dbatools Slack channel, plus several conversations in the office and with our hosting provider.

Then, in part 2, he describes the execution:

We missed the estimated time for our go/no-go decision by five minutes. With the number of moving parts, databases in play, unexpected delays, and amount of testing we had to do, that’s pretty good! My colleague and I had some additional work we needed to take care of after the team declared the migration a success. Agent jobs needed to be enabled, overnight job startups monitored, things like that. We called it a day after about 14 hours in the office.

It was a nice success story, so check it out.

What Happens When Your Secondary DAG Fails Over

Tianyu Wen explains what happens when there is a failover incident in the secondary Availability Group of a Distributed Availability Group:

If the primary replica on the secondary AG (also known as the “forwarder”) is lost and causes automatic failover to happen or manual failover is performed in the secondary AG in a DAG, there will be no data loss if the following conditions are met:
– The primary replica on the primary AG runs with no synchronization issue when the failover happens;
– The secondary AG on the DAG has a functioning secondary replica before the failover happens;
– The primary replica on the primary AG can communicate properly with the secondary replica on the secondary AG over their database mirroring endpoints.

There’s a lot of detail here, so if you are supporting Distributed Availability Groups, check it out.

Getting Reporting Services Installation Details With Powershell

Josh Smith wants to find every installation of SQL Server Reporting Services on a machine:

This is one of those posts so I never have to google this again (one hopes). Here is the PS code to pull back a set of details about every SSRS instance installed on a server including the SSRS instance name, & the first URL port it is running on, the service name and the name of the report server database etc.

Click through for the Powershell script.

VARCHAR Size And Memory Grant Estimates

Arthur Daniels shows us a good reason for using better data sizes than just VARCHAR(MAX) everywhere:

That’s a lot of desired memory, 1,493,120 KB aka 1.4 GB, but there was only 
25 MB used in the sort. So why was SQL Server so far off the right estimate? It’s the data types we picked.

That’s a lot of memory for a fairly simple query returning 300,000 rows, each containing a string and an int.


January 2019
« Dec