Press "Enter" to skip to content

Day: July 2, 2019

Scala 2.13 Changes

Anmol Sarna takes us through what’s new in Scala 2.13:

Last, but not the least, the team has invested heavily in compiler speedups during the 2.13 cycle which resulted in some major changes with respect to the compiler.

Compiler performance in 2.13 is 5-10% better compared to 2.12, thanks mainly to the new collections.

There are a lot of changes in this version. I wonder how long before Spark supports it fully.

Comments closed

Thoughts on Hadoop’s Future

Mark Litwintschik ties together a set of thoughts on the present and future of Hadoop:

At no point in Hadoop’s history has there been such a rich variety of features being offered as today and never before has it been so stable and battle-tested.

Hadoop projects are made up of millions of lines of code which have been written by thousands of contributors. In any given week there are 100s of developers working on the various projects. Most commercial database offerings are lucky to have a handful of engineers making any significant improvements to their code bases every week.

Mark takes a broad ecosystem approach (which I fully endorse) and so he sees the glass as more than half-full.

Comments closed

dbatools and Error Handling

Shane O’Neill takes us through some of the error-handling dynamics available in dbatools:

PowerShell errors normally contain useful information on what went wrong. With this information, the “what went wrong” can be fixed.

That being said, if you are writing PowerShell scripts and not raising valid error messages then I highly advise you to go back and fix that.

dbatools raises these error messages as friendly warning messages since we’ve found people will read a warning message quicker than they will read an error message.

There are several options available for you to handle errors, including viewing them as warnings, viewing as errors, and populating error text in variables.

Comments closed

Handling Spatial Data in Cosmos DB

Hasan Savran gives us some tips on storing spatial data in Cosmos DB:

Importing Spatial Data into CosmosDB can be a challenge. CosmosDB is not a relational database and you may need to change your data model structure to add spatial data. You cannot create a new container for spatial data and plan to join this container to your other containers. There are free tools which might help with GeoJson conversion, but you may still need to add converted geoJson data into your data models. Spatial data becomes very powerful when you find a way to join it with your application’s data.

    In the following example, I am going to download the hurricanes from NOAA website. Data is in CSV format so we may need to transform data to create a good data model for CosmosDB. I downloaded the all hurricane data for 2005 which was the year of Katrina hurricane. First thing I did, was to change the name of columns and make them more user-friendly. I have used the following names for columns. Here is a sample row from the CSV file.

Click through for the example.

Comments closed

DAX Median & 2 Billion+ Rows

Chris Webb has bad news for people with tables holding 2 billion-plus rows in Tabular format:

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

I haven’t seen that many Power BI examples with 2 billion or more rows in a table, but it can be an impediment when trying to use Analysis Services Tabular in cases with enormous amounts of data.

Comments closed

Power BI: Showing Only Slicers with Data

Kasper de Jonge shows us a few new tricks with Power BI:

As of this month Power BI finally supports filtering slicers down to only show rows that have fact data. Before the only thing you could do to achieve this was some workaround like: I described here where you filter down the dimension using a calc table. The other approach was to use Bi Directional cross filtering which would filter down the dimension table appropriately. This leads to performance issues though.

Now you can use a measure to filter down the slicer. 

Click through for an example as well as a few other tricks you can do as a result.

Comments closed

Executing Azure SSIS Packages from Blob Storage

Andy Leonard cranks it to the next level:

I confess: I have been waiting for this feature since I first learned of Azure-SSIS.

When I first saw Azure-SSIS – which creates an Azure Data Factory Integration Runtime and SSIS Catalog in the cloud, my first thought was a paraphrase Ferris Bueller’s question about dying the river green: “If we can execute SSIS packages from the SSIS Catalog in Azure Data Factory, why can’t we execute SSIS packages from Azure Blob Storage?” Today, we can.

Read on to see how you can do it.

Comments closed

Column From Examples in Power Query

Matthew Roche shows off the “column from examples” feature in Power Query:

Here’s the quick overview:

1. In the Power Query editor in Power BI Desktop, choose “Column from Examples” from the “Add Column” tab.
2. Enter the values that the new column should have for rows that are already in your data set.
3. Review the values that Power Query is suggesting for the other rows, and when they are all correct, choose OK, and then say “Ooooooohhhhh” when Power Query does all the work for you.

It’s a little hard to see from Matthew’s image but this is a great feature when you have a regular pattern but don’t want to put together a regular expression yourself.

Comments closed