This month’s T-SQL Tuesday was all about Big Data. See what the community has to say about Big Data with this collection of articles ranging from deep technical walk-throughs to musings about Big Data’s impact on our industry and the data professional.
Click through to see the participants.
But now we run into a problem: there are certain ports which need to be open for Polybase to work. This includes port 50010 on each of the data nodes against which we want to run MapReduce jobs. This goes back to the issue we see with spinning up data nodes in Docker: ports are not available. If you’ve put your HDInsight cluster into an Azure VNet and monkey around with ports, you might be able to open all of the ports necessary to get this working, but that’s a lot more than I’d want to mess with, as somebody who hasn’t taken the time to learn much about cloud networking.
As I mention in the post, I’d much rather build my own Hadoop cluster; I don’t think you save much maintenance time in the long run going with HDInsight.
Querying Cosmos DB is more powerful and versatile. The CreateDocumentQuery method is used to create an IQueryable<T> object, a member of System.Linq, which can output the query results. The ToList() method will output a List<T> object from the System.Collections.Generic namespace.
Derik also shows how to import the data into Power BI and visualize it. It’s a nice article if you’ve never played with CosmosDB before.
When CLR came in, people said it was a T-SQL killer. I remember a colleague of mine telling me that he didn’t need to learn T-SQL, because CLR meant that he would be able to do it all in .Net. Over time, we’ve learned that CLR is excellent for all kinds of things, but it’s by no means a T-SQL killer. It’s excellent for a number of reasons – CLR stored procedures or functions have been great for things like string splitting and regular expressions – and we’ve learned its place now.
I don’t hear people talking about NoSQL like they once did, and it’s been folded somehow into BigData, but even that seems to have lost a little of its lustre from a year or two ago when it felt like it was ‘all the rage’. And yet we still have data which is “Big”. I don’t mean large, necessarily, just data that satisfies one of the three Vs – volume, velocity, variety.
Rob brings an interesting perspective to the topic, particularly as one of the early Parallel Data Warehouse bloggers.
At the time in Rehfeld R&D, we experimented with making Effektor a metadata repository for a Hadoop data warehouse, where instead of generating tables and ETL processes in the different data warehouse layers, the synchronization engine in the product would generate the Hive objects on top of Hadoop tables. We never made more than an overall spec and a prototype, but the experiment gave us some insight into the technologies around Hadoop.
Around that time, Phillips released the Hue lightbulbs, and our COO bought us two packs to play with. The idea was to create a physical BI dashboard, where lightbulbs would display KPIs, and change color according to its value and the KPI threshold. I still think that was a brilliant idea, and I would love to see more use of consumer electronics in enterprise BI.
His basic maturity model makes sense; as much as I really want to disagree with the maturity model, I can’t. Good read.
But anyway, on to the TSQL2sDay posts
— Rob Sewell (@sqldbawithbeard) September 12, 2017
What a response. You wonderful people. I salute you with a Rimmer salute
Read on for an amazing 34 entries.
Defines a list of two servers (PowerShell lists are as easy as “”,””)
Pipes them to the
Gathers information about the last full backups for all of your databases on that instance.
Restores the backups to the Destination with a new name. If no Destination is specified, the originating server will be used.
The database is restored as “dbatools-testrestore-$databaseName” by default, which you can change with the -Prefix parameter.
The internal file names are also renamed to prevent conflicts with original database.
A DBCC CHECKTABLE is then performed.
And the test database is finally dropped.
The entire log shipping functionality is now separated between 5 functions. Four of them are used internally and are not visible as a public function because you can easily break stuff if it’s not being used correctly.
The main function, Invoke-DbaLogShipping, is available in the dbatools module for anyone to use.
If you open the GUI for the log shipping you have lots of choices but most of them are already supplied in the GUI itself and you can decide whether to use them or not.
The whole idea behind the functionality was that it would allow you to quickly setup log shipping using a lot of defaults like you can in the GUI, but if you’re more experienced you can change any setting to your preferences.
Read on for an example of how to use this. It looks pretty snazzy.
Now let’s say that we are working in an environment that does not allow us to change the server configurations, meaning that
OPENROWSET()is closed for us.
Being completely honest, my spreadsheet is only 8 rows so I would just manually insert the data but what happens if it’s a few thousand rows big? Not a viable option.
So with most of my T-SQL ideas have been exhausted, let’s look at PowerShell!
The other all-time important question, of course, is how to get data from something else into Excel.
My first thoughts about that were creating a SQL Server Agent Job with following steps:
- check the availability of Shared-Destination-Folder
- delete/clear Destination-Folder-Content
- Shrink all Transaction-Logfiles
- Backup all Databases from given list
- Restore each Backup-File from folder
- Check all orphaned user
- delete/clear Destination-Folder-Content
A year or two ago, I had built this with a lot of normal T-SQL-Agent-Jobs, now I’m doing this with dbatools which make it very easy and fast (and in one step)
It’s only a few lines of Powershell code, which you can see upon clicking through.