Kevin Feasel – Page 1121

At no point in Hadoop’s history has there been such a rich variety of features being offered as today and never before has it been so stable and battle-tested.
Hadoop projects are made up of millions of lines of code which have been written by thousands of contributors. In any given week there are 100s of developers working on the various projects. Most commercial database offerings are lucky to have a handful of engineers making any significant improvements to their code bases every week.

Mark takes a broad ecosystem approach (which I fully endorse) and so he sees the glass as more than half-full.

Comments closed

dbatools and Error Handling

Published 2019-07-02 by Kevin Feasel

Shane O’Neill takes us through some of the error-handling dynamics available in dbatools:

PowerShell errors normally contain useful information on what went wrong. With this information, the “what went wrong” can be fixed.
That being said, if you are writing PowerShell scripts and not raising valid error messages then I highly advise you to go back and fix that.
dbatools raises these error messages as friendly warning messages since we’ve found people will read a warning message quicker than they will read an error message.

There are several options available for you to handle errors, including viewing them as warnings, viewing as errors, and populating error text in variables.

Comments closed

Handling Spatial Data in Cosmos DB

Published 2019-07-02 by Kevin Feasel

Hasan Savran gives us some tips on storing spatial data in Cosmos DB:

Importing Spatial Data into CosmosDB can be a challenge. CosmosDB is not a relational database and you may need to change your data model structure to add spatial data. You cannot create a new container for spatial data and plan to join this container to your other containers. There are free tools which might help with GeoJson conversion, but you may still need to add converted geoJson data into your data models. Spatial data becomes very powerful when you find a way to join it with your application’s data.

In the following example, I am going to download the hurricanes from NOAA website. Data is in CSV format so we may need to transform data to create a good data model for CosmosDB. I downloaded the all hurricane data for 2005 which was the year of Katrina hurricane. First thing I did, was to change the name of columns and make them more user-friendly. I have used the following names for columns. Here is a sample row from the CSV file.

Click through for the example.

Comments closed

DAX Median & 2 Billion+ Rows

Published 2019-07-02 by Kevin Feasel

Chris Webb has bad news for people with tables holding 2 billion-plus rows in Tabular format:

What’s more, the error will always occur even if you apply a filter to the table that returns less than two billion rows. The same problem occurs with some other functions, such as Percentile(), but it’s worth pointing out that the vast majority of DAX functions work as normal with tables with more than two billion rows – for example, in the pbix file used here the Sum() and CountRows() functions not only work fine but return instantly.

I haven’t seen that many Power BI examples with 2 billion or more rows in a table, but it can be an impediment when trying to use Analysis Services Tabular in cases with enormous amounts of data.

Comments closed

Power BI: Showing Only Slicers with Data

Published 2019-07-02 by Kevin Feasel

Kasper de Jonge shows us a few new tricks with Power BI:

As of this month Power BI finally supports filtering slicers down to only show rows that have fact data. Before the only thing you could do to achieve this was some workaround like: I described here where you filter down the dimension using a calc table. The other approach was to use Bi Directional cross filtering which would filter down the dimension table appropriately. This leads to performance issues though.
Now you can use a measure to filter down the slicer.

Click through for an example as well as a few other tricks you can do as a result.

Comments closed

Executing Azure SSIS Packages from Blob Storage

Published 2019-07-02 by Kevin Feasel

Andy Leonard cranks it to the next level:

I confess: I have been waiting for this feature since I first learned of Azure-SSIS.
When I first saw Azure-SSIS – which creates an Azure Data Factory Integration Runtime and SSIS Catalog in the cloud, my first thought was a paraphrase Ferris Bueller’s question about dying the river green: “If we can execute SSIS packages from the SSIS Catalog in Azure Data Factory, why can’t we execute SSIS packages from Azure Blob Storage?” Today, we can.

Read on to see how you can do it.

Comments closed

Column From Examples in Power Query

Published 2019-07-02 by Kevin Feasel

Matthew Roche shows off the “column from examples” feature in Power Query:

Here’s the quick overview:
1. In the Power Query editor in Power BI Desktop, choose “Column from Examples” from the “Add Column” tab.
2. Enter the values that the new column should have for rows that are already in your data set.
3. Review the values that Power Query is suggesting for the other rows, and when they are all correct, choose OK, and then say “Ooooooohhhhh” when Power Query does all the work for you.

It’s a little hard to see from Matthew’s image but this is a great feature when you have a regular pattern but don’t want to put together a regular expression yourself.

Comments closed

Pandas Multiindex and T-SQL

Published 2019-06-28 by Kevin Feasel

Tomaz Kastrun explains why you should never cross the streams:

1. SQL Server and Python Pandas Indexes are two different worlds and should not be mixed.
2. SQL Server uses Index primarily for DML operations and to keep data ACID.
3. Python Pandas uses Index and MultiIndex for keeping data dimensionality when performing data wrangling and statistical analysis.
4. SQL Server Index and Python Pandas Index don’t know about each other’s existence, meaning if user want to propagate the T-SQL index to Python Pandas (in order to minimize the impact of duplicates, missing values or to impose the relational model), it needs to be introduced and created, once data enters “in the python world”.

Read on for additional conclusions and the demos which bring us here.

Comments closed

CQL: Category Theory-Based Querying Language

Published 2019-06-28 by Kevin Feasel

John Cook looks at a querying language based on category theory:

My interest in category theory waxes and wanes, and just as it was was at its thinnest crescent phase I ran across CQL, categorical query language. I haven’t had time to look very far into it, but it seems promising. The site’s modest prose relative to the revolutionary rhetoric of some category enthusiasts makes me have more confidence that the authors may be on to something useful.

I’m going through some lectures on category theory now and am in a big functional programming phase, so this is interesting but I won’t be giving up SQL anytime soon for it.

Comments closed

Checkpoints Under Simple Recovery

Published 2019-06-28 by Kevin Feasel

Max Vernon shows us how checkpointing works when your database is in the simple recovery model:

Even though the transaction has been rolled back, the log records will not be cleared until a checkpoint occurs. An automatic checkpoint could be triggered by other ongoing transactions being written to the log, or a manual CHECKPOINT statement could be executed. However, for a database that is not seeing frequent transactions, the log may stay nearly full for an extended period of time. This scenario might be seen often during development where there are a very limited number of transactions being generated.

Read the whole thing. Just because you’re in simple recovery mode doesn’t mean the transaction log becomes any less useful.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Author: Kevin Feasel

Thoughts on Hadoop’s Future

dbatools and Error Handling

Handling Spatial Data in Cosmos DB

DAX Median & 2 Billion+ Rows

Power BI: Showing Only Slicers with Data

Executing Azure SSIS Packages from Blob Storage

Column From Examples in Power Query

Pandas Multiindex and T-SQL

CQL: Category Theory-Based Querying Language

Checkpoints Under Simple Recovery