Press "Enter" to skip to content

Curated SQL Posts

Twitter Campaign/Brand Management In Power BI

Mindy Curnutt looks at a Power BI solution template for managing Twitter campaigns:

Now you can start poking around and seeing what’s in the Dashboard. Since I opted to not put any handles in for analysis of FROM and TO, the first two tabs in the workbook (Outbound Tweets and Inbound Tweets) will not have any information, this is normal.

But then we get to tab #3 – Author Hashtag Graph.  The gray dots are hashtags and the green dots are accounts that have tweeted. You can see that I made a tweet that had 2 hashtags – #osmf2017 and #mvpbuzz. And boy was @TexasMusicDude busy tweeting up a storm – and using lots of other hashtags in conjunction with his tweets. Other hashtags that were popular appear to be #CampGround, #ShinyRibs, #TexasMusic, #DreamFolk and #Strings. Along the bottom you can see the day/timeline and the quantity of tweets at what time of day. If you click on any of the nodes, the information about what time the tweet(s) took place is highlighted in the timeline. It’s very interactive.

It does require an Azure subscription, but it looks very useful as a model for an advanced set of dashboards as well as a campaign management tool.

Comments closed

FlowFile Continuation In NiFi

Tim Spann describes one of the more powerful features of Apache NiFi:

Sometimes, you need to backup your current running flow, let that flow run at a later date, or make a backup of what is in-process now. You want this in a permanent storage and want to reconstitute it later like orange juice and add it back into the flow or restart it.

This could be due to failures, for integration testing, for testing new versions of components, as a checkpoint, or for many other purposes. You don’t always want to reprocess the original source or files (they may be gone).

Read on for an explanation of how FlowFile streams can do this.

Comments closed

Dynamic Data Masking

Andrea Allred has been checking out Dynamic Data Masking in SQL Server 2016:

This is a great time to talk about the different masking functions and what they do.  The four types in 2016 are Default, Email, Random and Custom String.

Default – For numeric and binary it will show a “0” For a date it will show 01/01/1900 and for strings it will show xxxx’s (more or less depending on the size of the field).

Email – It will expose the first letter of the email address and the suffix at the end of the email (.com, .net, .edu etc.) For example Batgirl@DC.com  would now be bxxx@xxxx.com.

Random – Number randomly generated between a set range. Kind of like the game, “Pick a number between 1 and 10” but for SQL.

Custom String – Lets you get creative with how much you show or cover and what you use to cover (not stuck with just xxxx’s).

It’s not really a security feature, but it could be useful for protecting sensitive data from snoopers glancing over the shoulder.

Comments closed

Missing JRE, Or Maybe C++

Meagan Longoria went through a frustrating scenario:

On a recent project I used Azure Data Factory (ADF) to retrieve data from an on premises SQL Server 2014 instance and land them in Azure Data Lake Store (ADLS) as ORC files. This required the use of the Data Management Gateway (DMG). Setup was quick and easy in our development environment. We installed the DMG for development on a separate server in the client’s network, where we also installed SQL Server Management Studio (SSMS) for query development and data validation. We set up resource groups in Azure for development and production, and made sure the settings for development and production were the same.  Then we set up a separate server for the production DMG.

Deployment and execution went well in the dev environment. Testing was completed, so we deployed to prod. Deployment went fine, but the pipelines failed execution and returned the following error on the output data sets.

Weird solution, but I’m going to guess that it makes perfect sense if you are able to look at the code.

Comments closed

Availability Group Tips

Derik Hammer has some tips to help you learn about Availability Groups:

3. Use MultiSubnetFailover=true

The Availability Group Listener is technically an optional component of an Availability Group. However, in my opinion it is necessary. By default, your listener will register all IP addresses as DNS A records and it will have multiple IP addresses when your cluster crosses subnets, most commonly when you have disaster recovery between data centers. Using the MultiSubnetFailover=true parameter in your client connection strings will attempt to connect to all IP addresses and completes the connection on the first thread to succeed. The listener ensures that only one IP address is online at a time, therefore you always connect to correct node.

This feature effectively bypasses the limitations of your DNS cache. Traditionally, you would cache the IP address for a DNS record. When you needed the client to connect to a different IP address using the same virtual network name, you would have to wait for the time to live setting to expire. This would delay your recovery time. With the MultiSubnetFailover setting, you can still cache your IP addresses but without the delay that they could induce.

There’s some good reading here.

Comments closed

SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Comments closed

Full-Text Indexing Issues

Cody Konior looks at a couple errors in full-text indexing:

The first error message holds the key. It’s not enough to just enable the service, you need to run that command also (which will also start the service for you as well as doing something internally) or restart SQL Server after the service is enabled.

Here’s where we go to the next level. You now know that full-text indexing can have a problem on startup that isn’t visible externally but surely this could never happen to you, right? But I guarantee you that if you have 500 servers many of them currently have this issue. When a developer or application support person encounters it that’s going to take half an hour or an hour of investigation, then logging incidents, then passing it on to you, for another half hour or hour of investigation, and then more time, and more time, and then doing a root cause analysis, and then sharing it with your team, and then…

Wouldn’t it be so much easier to just write a validation test for this now so that if this ever occurs you can just proactively go fix it? Well you can!

It’s a good read.

Comments closed