ADF V2 natively supports decompression of files as documented at https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support. With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases.
In our particular case the file names and extensions of the source files are all uppercase and when ADF uploads them it doesn’t alter the file extension e.g. if I upload 1234_567.CSV.GZ I get 1234_567.CSV.GZ in blob storage rather than 1234_567.CSV.
Click through for more details and be sure to vote on his Azure Feedback bug if this affects you.
The Segment operator, like all operators, is described at the Books Online page mentioned above. Here is the description, quoted verbatim:
Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These columns are shown as arguments in the Segment operator. The operator then outputs one segment at a time.
Looking at the properties of the Segment operator, we do indeed see the argument mentioned in this description, in the Group By property (highlighted in the screenshot). So this operator reads the data returned by the Index Scan (sorted by TerritoryID, which is required for it to work; this is why the Index Scan operator is ordered to perform an ordered scan), and divides it into segments based on this column. In other words, this operator is a direct implementation of the PARTITION BY spefication. Every segment returned by the operator is what we would call a partition for the ROW_NUMBER() function in the T-SQL query. And this enables the Sequence Project operator to reset its counters and start at 1 for every new segment / partition.
Read on to understand the issue and see Hugo’s proof.
A quick one to signal boost this issue and its solution, as I’m sure other people will run into it. If you’re on Standard Edition of SQL Server and upgrading to 2017, you might run into an issue where the database services portion of the upgrade fails. This seems to be related to SSIS.
If you experience this problem, mid-way through the upgrade you’ll receive this error in a pop-up:
Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
At the end of the upgrade, it will show that the database services section has failed. Checking the error log will show this:
Script level upgrade for database ‘master’ failed because upgrade step ‘ISServer_upgrade.sql’ encountered error 917, state 1, severity 15.
Read on for the answer and a workaround.
After a spot of head scratching and thinking that there was something wrong with my AG setup, it turns out that there’s a bug in SSMS. I was running SSMS 17.5 although this may well also affect earlier versions.
Looking at the release notes for SSMS 17.6, one of the bug fixes that this version addresses is…
Fixed an issue when the primary is down and manually failover to secondary, a NullReferenceException will be thrown.
David notes that upgrading fixed his issue; read on for more.
I tried the configuration a couple of times just to make sure it wasn’t a one-off problem. I installed the latest Cumulative Update (CU). I made sure nothing else was connected to the instance. I rebooted my machine. I restarted services. I banged my head against the wall. I asked a friend if I was insane or stupid. After confirming that I was both, my friend Aaron Bertrand (blog|twitter) confirmed it wasn’t a problem for him.
I discovered I could reproduce the problem simply by running the same simple statement that SSRS used when creating the ReportServer database. SSRS uses a non-standard collation, and specifying that collation seems to be the difference in causing the deadlock. Then I discovered that specifying ANY non-standard collation was causing the deadlock. This had nothing to do with SSRS, and everything to do with non-default collations.
Vote for his User Voice item too.
Last week I got involved with a customer issue. A refresh of the data imported to a PBIX always works in Power BI Desktop, but the refresh operation intermittently fails in the Power BI Service. Their workaround had been to refresh the PBIX in Desktop and re-upload the file to the Service. This post is about finding and fixing the root cause of the issue – this is as of March 2018, so this behavior may very well change in the future.
Turns out, the source of the problem was that the customer’s Open Orders table can contain invalid dates – not all rows, just some rows. Since Open Orders data can fluctuate, that explains why it presented as an intermittent refresh issue. Here’s a simple mockup that shows one row which contains an invalid date:
At this point, we have two open questions:
(1) What is causing the refresh error?
(2) Why is the refresh behavior different in the Service than the Desktop tool?
Read on for the explanation of the difference, as well as a fix to prevent refresh errors due to invalid dates.
Here’s a strange one that I’ve recently come across. I had a customer report that their log shipping restore jobs were chock a block of errors. Now, the logs seem to have been restoring just fine but before every restore attempt, the job is reporting the error,
Error: Failed to update database “DATABASE NAME” because the database is read-only.
Unfortunately I haven’t got any direct access to the server but their logshipping is setup to disconnect users before and leave the database in standby after. After a bit of to-ing and fro-ing, I asked the customer to send me a trace file covering the period that the restore job ran.
Read on for the details and keep those servers patched.
Notice the above gives an incorrect result: all of the
x_icolumns are identical, and all of the
y_icolumns are identical. I am not saying the above code is in any way desirable (though something like it does arise naturally in certain test designs). If this is truly “incorrect
dplyrcode” we should have seen an error or exception. Unless you can be certain you have no code like that in a database backed
dplyrproject: you can not be certain you have not run into the problem producing silent data and result corruption.
The issue is:
dplyron databases does not seem to have strong enough order of assignment statement execution guarantees. The running counter “
delta” is taking only one value for the entire lifetime of the
dplyr::mutate()statement (which is clearly not what the user would want).
Read on for a couple of suggested solutions.
For now, the workaround I have is to restart the SQL Server service occasionally. You can see that I have done it twice in the above screenshot. Our application is resilient to short database downtimes, so this isn’t a bad workaround for us; it’s just a little bit of an annoyance.
One thing to keep in mind if you are in this scenario is that if you are running ML Services hundreds of thousands of times a day, your ExtensibilityData folders might have a lot of cruft which may prevent the Launchpad service from starting as expected. I’ve had to delete all folders in
\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData\MSSQLSERVER01after stopping the SQL Server service and before restarting it. The Launchpad service automatically does this, but if you have a huge number of folders in there, the service can time out trying to delete all of them. In my experience at least, the other folders didn’t have enough sub-folders inside to make it worth deleting, but that may just be an artifact of how we use ML Services.
It’s very unlikely to affect most shops, as we only notice it after running sp_execute_external_script millions of times, and that’s pretty abnormal behavior.
After years of having to deal with Connect – the feedback platform of Microsoft – it is announced a successor has been found: feedback.azure.com. It’s not all about Azure, the link goes to the relevant portion of SQL Server. I’m glad for this change, as Connect could sometimes be a little … quirky. Especially the search function didn’t work properly. The new feedback site is based on UserVoice and it’s really easy to submit feedback. People submitting ideas for Power BI will be very familiar with the format. There are a couple of drawbacks:
You cannot specify many details (none to be exact, or you have to list them in the descriptions). OS version, SQL Server version, bitness, et cetera. On the other hand it makes the process of entering feedback a lot faster.
You cannot mark a feedback item as private so that only Microsoft can see it. This means it’s not exactly the place to dump your production data to show how a bug is bugging you (haha).
I’m not sure how much of an improvement this is, but at least it does serve the Power BI team well.