First, I will not be discussing the personal gateway in this post. If you have chosen to use the personal gateway, you have limited functionality and should consider using the on-premises data gateway for corporate use.
The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Please refer to references at the end of this post for details about data sources supported within the gateway.
Click through for more information.
Chrissy LeMaire has a series of thoughts on this month’s T-SQL Tuesday, and it was worth separating out from the rest of today’s batch:
Y’all know what I’m gonna say here! I love automation and PowerShell. I know for a fact that PowerShell and T-SQL together are the future of SQL Server administration. As someone who often presents about dbatools, the popular SQL PowerShell community project, I’ve seen the excitement and relief that PowerShell automation brings to SQL Server Database Pros.
There’s a lot of well-deserved plugging of dbatools. Hint, hint.
But at this time ADF doesn’t support that. You can copy a file with a copy activity, but you cannot actually move (i.e., copy and delete).
Luckily, we had a workaround for our situation. If you tell ADF to copy data to a file that already exists in the specified location in the data lake, it will overwrite the existing file. We made sure the file name is always the same for each table in the staging area so there is always only one file per table.
Read on for the full details on this workaround. Also, vote on this feedback item if you want the ability to move files instead of just copying them.
The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.
In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.
This is the type of thing that you can easily forget about, but it makes a huge difference down the line.
2 Create a new Hive table from scratch or alter Table
Create a new table by, clicking on the ‘+’ icon, which opens the create table wizard. Enter table name, column name and choose a data type from the dropdown. You can pick folloiwng advanced hive settings directly from the UI
Transactional : Turn on transaction support in Hive, by checking this flag. Note that the table must be bucketed and stored using an ACID compliant format (such as ORC).
Location : Hive stores the table data for managed tables in the Hive warehouse directory in HDFS which is configured in hive-site.xml with property hive.metastore.warehouse.dir. The default location is /apps/hive/warehouse. The location can be changed using the Location text field.
File Format : The default file format for CREATE TABLE statement is ORC. choose a format from the file format dropdown.
Row Format : Select a row format such as Field terminator, Lines terminator, and Stored File type.
Table can be altered to add new columns or change the column name or column datatype.
Tables can also be renames and altred
Read on for more improvements, including a graphical plan viewer and improved autocomplete.
I want to show you how I went from having multiple single SQL databases in Azure to a database elastic pool within a new dedicated SQL Server. Once setup I create and use elastic jobs. This post is long but I am sure you will find it useful.
Create a new “logical” SQL Server.
Create a new elastic pool within this logical SQL Server.
Move the data from the old single SQL databases to the above elastic pool (couple of ways to do this but I used built-in backups).
Confirm application connection.
Decommission single SQL databases.
Create / setup an elastic job.
Check the controller database.
Definitely worth reading if you are looking at hosting multiple databases in Azure.
Believe it or not, there’s not an immediately obvious “Oh, you had an error in your Automation script, here’s how you alert someone” setting in the Azure portal. Now, you could simply put error handling in your PowerShell script. In fact, it’s probably not at all a bad idea to do that as well. However, what you would not get setting things up that way is a mechanism for managing the alerts, history, additional possible responses (like firing off another Runbook, although there is way to do that from the PowerShell too). Instead, what I want is way to manage alerts through the Azure fabric.
If you do a search, there is an Azure Alert service. However, it didn’t seem to be really what I was looking for. Further, I found it extremely difficult (OK, I couldn’t make it work) to connect the alerts directly to the Jobs related to my Runbooks. Instead, after quite a bit of research, what I found is a combination of Azure Log Analytics with the Operations Management Suite (OMS) will do exactly what I’m looking for.
Click through to read how to set this up.
Q: Is there any way to handle the execution of SSIS packages stored locally?
A: Azure Automation works on Azure resources. It cannot be used for executing local SSIS packages.
In some cases, you may still need a scheduling tool (which might be a VM with SQL Agent).
If you’re learning Azure and you research things using a search engine, then I strongly recommend you use the ability to limit your searches to the last year. Otherwise, you may be getting incomplete or incorrect data. At this precise moment, I’d say you need to limit your searches to Google (although I honestly hate recommending one of these tools over the other, let’s keep the competition fierce) because I was able to easily get the correct information within a couple of mouse clicks.
Grant’s post makes sense, and so does the search engine behavior: in Grant’s case, those older cmdlet documentation links have been around longer and older resources tend to have a larger number of relevant linkbacks and clicks. That’s also visible in SQL Server documentation, where sometimes you’ll land on the 2008R2 or 2012 version of documentation rather than 2016 or vNext.
Here are a whole set of links to kick start your learning of Microsoft Azure services.
Changes to computer thinking – Stephen Fry explains cloud computing
That’s a good set of starting links.
The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.
Check out the post as well as the tutorial David links.