Uploading Files In Shiny

Kevin Feasel

2017-05-01

R

Steph Locke explains how to use lapply() and fread() to validate and combine multiple uploaded files in R:

In shiny, you can use the fileInput with the parameter multiple = TRUE to enable you to upload multiple files at once. But how do you process those multiple files in shiny and consolidate into a single dataset?

The bit we need from shiny is the input$param$fileinputpath value.

Read the whole thing.

SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Adaptive Joins

Erik Darling found a query which triggers an adaptive join in SQL Server 2017 CTP 2:

This probably won’t seem like a big deal soon

But I just got the optimizer to pick an Adaptive Join! It took a few tries to figure out what would cause some guesswork to happen, but here it is.

Erik promises more details are forthcoming as he works out the XML details.

Full-Text Indexing Issues

Cody Konior looks at a couple errors in full-text indexing:

The first error message holds the key. It’s not enough to just enable the service, you need to run that command also (which will also start the service for you as well as doing something internally) or restart SQL Server after the service is enabled.

Here’s where we go to the next level. You now know that full-text indexing can have a problem on startup that isn’t visible externally but surely this could never happen to you, right? But I guarantee you that if you have 500 servers many of them currently have this issue. When a developer or application support person encounters it that’s going to take half an hour or an hour of investigation, then logging incidents, then passing it on to you, for another half hour or hour of investigation, and then more time, and more time, and then doing a root cause analysis, and then sharing it with your team, and then…

Wouldn’t it be so much easier to just write a validation test for this now so that if this ever occurs you can just proactively go fix it? Well you can!

It’s a good read.

Multiple Flat Files SSIS Connector

John Morehouse shows that there are more SSIS connection managers than meets the eye:

While recently examining an existing SSIS package  before implementing some needed changes I noticed the connection manager for multiple flat files.

Normally, when processing multiple files, I use a sequence container and loop through them.  It’s usually a cumbersome process to setup and you must configure variables.  I was not aware the multiple flat file type of connection manager existed. No wonder because it’s notwithin the normal list.  This handy thing could only be found when I looked under “New Connection”.

Click through for more details.

Q&A In Power BI

Adam Saxton has a video talking about Natural Language Search in Power BI:

In this video, I look at one of my favorite Power BI features that I’ve found a lot of people aren’t familiar with. In a few of my presentations, by a show of hands, the majority of folks (meaning most of the room) didn’t know what this was. I WAS SHOCKED!!! Can you guess what it is?

When this works, it’s really cool.

Perfmon Counters For Monitoring AGs

Tracy Boggiano has a set of Perfmon counters she uses to monitor Availability Groups:

Monitoring Availability Groups can be tricky.  The DMVs don’t update the log_send_rate and redo_rate all the time especially if nothing is happening so if you try to use those for calculations when monitoring you could false results worse yet pages in the middle of the night.  In order to calculate the log_send_rate and redo_rate you need to capture the perfmon counters ‘Log Bytes Flushed/sec’, ‘Redone Bytes/sec’, and ‘Log Bytes Received/sec’ into temporary tables WAITFOR 1 second then capture them again.  Below is query that captures this along side what you see in the DMV for when Microsoft might fix the issue.

Click through for a script as well as a Powershell cmdlet wrapper for running against a set of hosts in your Central Management Server.

Re-Provisioning Volumes In Azure

Robert Bishop shows how to create data disks in Azure and attach them to VMs:

Recently a client found this article on “Best Practices for SQL Server in Azure Virtual Machines” and wanted to re-provision his volumes to adhere to them.

No my first thoughts was wait, I’m a DBA, not a System Admin that’s not my role! But thinking more about it I realized the client views this as a SQL Server issue and I am the SQL Server Consultant and that it is my job to remedy this problem.

Not being 100% confident in Azure, I spun up a VM SQL Server and attempted to add some volumes.  To my surprise, this was way too easy.

Click through for the steps.

Categories

May 2017
MTWTFSS
« Apr  
1234567
891011121314
15161718192021
22232425262728
293031