Press "Enter" to skip to content

Day: January 10, 2018

Time Series Forecasting With DeepAR

Tim Januschowski, et al, introduce DeepAR on AWS:

Today we are launching Amazon SageMaker DeepAR as the latest built-in algorithm for Amazon SageMaker. DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce both point and probabilistic forecasts. We’re excited to give developers access to this scalable, highly accurate forecasting algorithm that drives mission-critical decisions within Amazon. Just as with other Amazon SageMaker built-in algorithms, the DeepAR algorithm can be used without the need to set up and maintain infrastructure for training and inference.

Click through for a product demonstration.

Comments closed

Geocoding With OpenStreetMap

Dmitry Kisler shows how to geocode addresses in R using the OpenStreetMap API:

It is quite likely to get address info when scraping data from the web, but not geo-coordinates which may be required for further analysis like clustering. Thus geocoding is often needed to get a location’s coordinates by its address.

There are several options, including one of the most popular, google geocoding API. This option can be easily implemented into R with the function geocode from the library ggmap. It has the limitation of 2500 request a day (when it’s used free of charge), see details here.

To increase the number of free of charge geocoding requests, OpenStreetMap (OSM) Nominatim API can be used. OSM allows up to 1 request per second (see the usage policy), that gives about 35 times more API calls compared to the google geocoding API.

Click through for the script.

Comments closed

Compare-Object And Null Values

Mike Robbins notes that the Compare-Object cmdlet in Powershell does not like to handle null values:

I thought I’d run into a bug with the Compare-Object cmdlet in PowerShell version 5.1 earlier today.

Compare-Object : Cannot bind argument to parameter ‘ReferenceObject’ because it is null.
At line:1 char:33
+ Compare-Object -ReferenceObject $DriveLetters -DifferenceObject $Driv …
+ ~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Compare-Object], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.CompareObjectCommand

Read on for the description of the problem as well as the solution.

Comments closed

Deleting From Large Tables

Andy Galbraith tells a tale of a failed query and a lot of log file growths:

The App01 database was in SIMPLE recovery (when I find this in the wild, I always question it – isn’t point-in-time recovery important? – but that is another discussion).  The relevance here is that in SIMPLE recovery, LOG backups are irrelevant (and actually *can’t* be run) – once a transaction or batch completes, the LDF/LOG file space is marked available for re-use, meaning that *usually* the LDF file doesn’t grow very large.

A database in SIMPLE recovery growing large LDF/LOG files almost always means a long-running unit of work or accidentally open transactions (a BEGIN TRAN with no subsequent CLOSE TRAN) – looking at the errors in the SQL Error Log the previous night, the 9002 log file full errors stopped at 1:45:51am server time, which means the offending unit of work ended then one way or another (crash or success).

Sure enough when I filtered the XEvents event file to things with a duration > 100 microseconds and then scanned down to 1:45:00 I quickly saw the row shown above.   Note this doesn’t mean the unit of work was excessively large in CPU/RAM/IO/etc. (and in fact the query errored out due to lack of LOG space) but the excessive duration made all of the tiny units of work over the previous 105 minutes have to persist in the transaction LDF/LOG file until this unit of work completed, preventing all of the LOG from that time to be marked for re-use until this statement ended.

It turns out that deleting millions of records in a single transaction is an expensive operation.

Comments closed

Introducing PowerApps

Jason Thomas has a great introduction to PowerApps:

you will be able to pass context aware data to a PowerApps app which updates in real time as you make changes to your report. Now, your users can derive business insights and take actions from right within their Power BI reports and dashboards. No need to switch tabs to open the separate apps, copy paste data from one window to another or worry about fat fingering the wrong customer id or invoice amount.

If you think about it, this is a game changer – you finally have a BI tool that allows you to collaborate and take actions right within the report. How many times have you looked at a report, found out an insight and wished that you could send an email to the account manager, only to forget later? Well, now you don’t have to worry about that, as I am going to show you an example of how we can collaborate by adding comments within the report (not just comments, but context aware comments, based on what you are selecting) as well as show how to send emails (to the appropriate people based on your selection).

This is an interesting concept and Jason has a detailed overview of it.

Comments closed

Customizing SQL Operations Studio

Samir Behara shows how to differentiate environments with custom tab colors:

The initial January release insiders build focuses on bug fixes and minor feature improvements. One thing which caught my attention was the ‘SQL Editor Tab Color‘ to differentiate between query tabs inside the IDE.

Using Custom Color to differentiate between environments is one of my favorite feature inside SQL Server Management Studio. The color is displayed in the SSMS status bar, at the bottom. Hence when you connect with a particular environment, it uses the same assigned color. This presents a visual indication of the environment in which you are running your scripts. Lot of 3rd Party tools from RedGate and ApexSQL also has their own versions of setting different colors while connecting to different environments.

Using environment-specific color schemes can be a life-saver.

Comments closed

The Premise Of Cloud Data Warehousing

Derik Hammer explains how cloud data warehouses differ from their on-prem cousins:

Given the data processing needs of a data warehouse, they tend to be implemented on massively parallel processing (MPP) systems. The MPP architecture replies upon a shared nothing concept for distributing data across various slices. Compute nodes are layered on top of the storage and processes queries for data residing in its local slice. The control node is responsible for taking a query and dividing it up into smaller queries to be run in parallel on the compute nodes.

Read the whole thing.

Comments closed

Monitoring Using SQL Operations Studio

Marlon Ribunal demonstrates how to build a widget in SQL Operations Studio:

This is where SQL Operations Studio (SOS) comes in the picture. SOS is a lightweight, cross-platform client. It is also open-source. Aside from its IDE functionalities, SOS has a lot of more offerings for DBAs, Devs, and DevOps. One of these is that you can use it as a dashboard to monitor your databases. SSMS comes with standard reports as well as custom reports using Report Definition Language or RDL (which is essentially an SSRS-like report residing in SSMS). SQL Operations Studio raised the bar in this regard. You can create multiple widgets to display on your dashboard.

Click through for a demonstration and screen shots.

Comments closed

Migrating A Memory-Optimized Database

Michael Bourgon notes that there’s an extra step when migrating a database with a memory-optimized filegroup from one server to another:

So, I was trying to get an in-memory database moved from one server to another (long story, involving IMOLTP melting down on me and resulting in a 2 terabyte log file).

I thought it’d be simple: copy the files over, along with the Filestream folders, and attach.  Nope!  Various errors, including “5(Access is denied.)”.

Click through for the steps involved.

Comments closed