Press "Enter" to skip to content

Month: April 2018

Accessing BigQuery Data From Python And R

Eleni Markou shows how to connect to Google’s BigQuery service using Python and then R:

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build a model of your customers’ behavior, or try to predict your churn rate.

To do that, you will need to extract your data from BigQuery and use a framework or language that is best suited for data analysis and the most popular so far are Python and R. In this small tutorial we will see how we can extract data that is stored in Google BigQuery to load it with Python or R, and then use the numerous analytic libraries and algorithms that exist for these two languages.

Read on to see how easy it is for either language.

Comments closed

New dbachecks Checks

Rob Sewell announces updates in the dbachecks Powershell package:

Today we updated the HADR tests to add the capability to test multiple availability groups and fix a couple of bugs

Once you have installed dbachecks you will need to set some configuration so that you can perform the tests. You can see all of the configuration items and their values using

Read on for more about these updates.

Comments closed

The Value Of Live Query Stats In SSMS

Rob Farley exlaims his appreciation of Live Query Stats in SQL Server Management Studio:

wrote about Live Query Statistics within SSMS a while back – and even presented at conferences about how useful it is for understanding how queries run…

…but what I love is that at customers where I have long-running queries to deal with, I can keep an eye on the queries as they execute. I can see how the Actuals are forming, and quickly notice whether the iterations in a Nested Loop are going to be unreasonable, or whether I’m happy enough with things. I don’t always want to spend time tuning a once-off query, but if I run it with LQS turned on, I can easily notice if it’s going to be ages before I see any rows back, see any blocking operators that are going to frustrate me, and so on.

I don’t use it often, but when I do, I typically learn something interesting about the query I’m running.

Comments closed

Database Backups And GDPR

Grant Fritchey digs into one of the more contentious areas of GDPR:

Nothing within Article 17 talks about backups, offsite storage, readable secondaries, log shipping, or any of that stuff. In fact, there’s nothing technical there at all. No help to tell you what to do about this question.

Now, each article has expansions that further detail the information within the article called recitals. In the case of the right to be forgotten, there are two, Recital 55 and Recital 66. Recital 55 has nothing for us, at all. Recital 66 does talk about the fact that, because we’re dealing in an online world, the best available technical means should be used to deal with the fact that a person’s data may be in more than one location and we’ll need to clean that up.

And that’s it.

In fact, you can search the GDPR and not find the word, backup.

Read on for Grant’s thoughts, including what he argues is a defensible position (though we won’t know for sure until the bureaucracy runs its course).

Comments closed

Things To Think About Before Detaching And Attaching A Database

Jana Sattainathan has a few considerations before running a detach-attach operation:

On the detached database files, the file permissions change when detaching. i.e., The AD account (your AD account) performing the detach operation becomes the owner and the only person with permissions to the file. It does not inherit the permissions from the folder. Oddly, if you just detach and reattach, it would attach fine even though SQL Server Service account does not have any explicit permissions on the files. However, others cannot attach a file that you detached even if they are on the Administrators group until they are explicitly granted permissions on the files themselves. This is well explained in this MSSQLTips article. Quoting the article, if you want to retain file permissions on detach, set the trace flag 1802.

“SQL Server 2005 introduced trace flag 1802 which retains the database files permission after the detach operation. The trace flag is tested and still applicable with SQL Server 2016.”

Click through for several tips of similar ilk.

Comments closed

Dynamically Showing Or Hiding Columns In SSRS With Parameters

Sander Stad shows how to show or hide columns at runtime in SQL Server Reporting Services reports using parameters:

Regularly I have reports that have an extensive amount of columns.
Because the amount of columns, reports tend to become inefficient and have too much information we don’t always need. The users may want to select certain columns and to make the report easier to read.

Hiding and showing columns in SSRS reports using parameters is a solution to make reports more dynamic and easier to use.

At the time of writing of this article, SQL Server Reporting Services did not yet have the possibility to use checkbox parameters. Instead we will be using a multi-value text parameter to show or hide our columns.

Click through to see how to do this.

Comments closed

Learning About Spatial Data In R

Steph Locke has a compendium of resources for people wishing to learn more about working with spatial data in R:

I recently met up with someone who does geospatial stuff but uses the more traditional GIS software to do it. I showed him a few things in R but not being a person who does a lot of geospatial analysis I thought I’d ask the lovely #rspatial crowd what they’d recommend. Here are the compiled recommendations. Happy learning spatial R!

Feel free to comment or tweet your recommendations to get them added to this list.

There’s a lot of reading, watching, and doing there, so thanks to Steph for putting it together.

Comments closed

Transforming Data: ELT Or ETL?

Artyom Keydunov argues that Extract-Load-Transform is a better model than Extract-Transform-Load:

ETL arose to solve a problem of providing businesses with clean and ready-to-analyze data. We remove dirty and irrelevant data and transform, enrich, and reshape the rest. The example of this could be sessionization: the process of creating sessions out of raw pageviews and users’ events.

ETL is complicated, especially the transformation part. It requires at least several months for a small-sized (less than 500 employees) company to get up and running. Once you have the initial transform jobs implemented, never-ending changes and updates will begin because data always evolves with business.

The other problem of ETL is that during the transformation, we reshape data into some specific form. This form usually lacks some data’s resolution and does not include data that is useless for that time or for that particular task. Often, “useless” data becomes “useful.” For example, if business users request daily data instead of weekly, then you will have to fix your transformation process, reshape data, and reload it. That would take a few weeks more.

Read on for more, including his argument for why ELT is better.

Comments closed

Monitoring Performance Of Natively Compiled Stored Procedures

Jos de Bruijn announces a feature coming to the next version of SQL Server:

We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP with natively compiled stored procedures.

After enabling these options, you can monitor the performance of natively compiled stored procedures using Query Store, as well as the DMVs sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. Note that there is a performance impact to enabling execution statistics collection, thus we recommend to disable stats collection when not needed.

That last sentence is important:  there’s an observer effect which slows down execution of natively compiled stored procedures, and considering that you’re implementing them specifically for the speed, that’s fairly unwelcome.

Comments closed