Ways To Crash Elasticsearch

Roi Ravhon shows how to take down an Elasticsearch instance:

Cardinality aggregation is used to count distinct values in a data set. For example, if you want to know the number of IPs used in your system, you can use this aggregation on an IP field and then count the results.

Despite the usefulness, cardinality can also be a touchy Elasticsearch feature to use. Performing a unique count on a field with a multitude of possible values when configuring a visualization, for example, can bring Elasticsearch to a halt.

Most of it comes down to writing good queries.  But if you don’t know what good Elasticsearch queries look like, read on.

Index Create Dates

Kenneth Fisher looks to see when his indexes were created (or at least updated):

SQL Server stores a create date and a change date for each object in the sys.objects system view.

Unfortunately while tables, views and even constraints are objects, indexes are not. Or at least they aren’t stored in the sys.objects system view. And the sys.indexes system view doesn’t have any dates associated with it. So how do we get the create/update date on an index? Well, short answer is you don’t. Long answer is that in some cases you can get some information.

These aren’t ideal answers, but they can be better than nothing.

Power BI Scroller

Devin Knight’s Power BI custom visuals series continues:

  • The Scroller the values move from left to right on an animated scroll bar.

  • The settings allow you to customize the speed, size, status indicator and colors used.

  • If you’d like to see the status indicator then you must have a deviation value.

Maybe it’s just me, but I’m not a fan of tickers.  They’re too visually noisy for my taste.

SSIS Fast Load

Chris Taylor runs into an issue with the OLE DB Destination’s fast load option in Integration Services:

What I do want to bring to your attention is the differences between the two when it comes to redirecting error rows, specifically rows that are truncated. One of the beauties of SSIS is the ability to output rows that fail to import through the error pipeline and push them into an error table for example. With fast load there is a downside to this, the whole batch will be output even if there is only 1 row that fails, there are ways to handle this and a tried and tested method is to push those rows into another OLE DB Destination where you can run them either in smaller batches and keep getting smaller or simply push that batch to run in row-by-row to eventually output the 1 error you want. Take a look at Marco Schreuder’s blog for how this can be done.

One of the issues we have exerienced in the past is that any truncation of a column’s data in fast load will not force the package to fail. What? So a package can succeed when in fact the data itself could potentially not be complete!?! Yes this is certainly the case, lets take a quick look with an example.

Read on for details and potential workarounds.

Showplan XML Updates

Pedro Lopes discusses new properties which show up on execution plans:

This caused some friction, and so we are happy to announce that in the latest (October) release of SSMS, these attributes are now readily available in the Properties window of an Actual Execution Plan, in the scope of each operator.

Here’s an example with the same query as the March post, where we can see the Actual Number of Rows returned by each of the two threads executing the index seek.

Click through for more details.

Network Shares And SQL Server R Services

Kevin Feasel

2016-11-14

R

John Pertell runs into an issue reading a file on a network share using SQL Server R Services:

What I’m trying to do should be pretty easy. I want to use R code inside a stored procedure to read all the log files created by my weather program and store the results in a database. I also want to read the current monthly file on a regular basis, at least once a day. Once the data is in the database I’ll create some mobile reports with data and charts I can read on my phone. I’ll also be able to use my own data to play with local weather predicting. I thought it would make a pretty cool demo for my R sessions.

However when I run my stored procedure to read the logs I receive an error that there is no such file if I map the share as a drive, or I’m using an invalid parameter if I try to access the share directly.

Read on for code and the specific error message.

Filtered Indexes

Kendra Little explains the two types of filtered indexes:

These two filtered indexes are very different – and the SQL Server optimizer can use them very differently!

While classic filtered nonclustered rowstore indexes must reliably “cover” parts of the query to be used to the optimizer, filtered nonclustered columnstore indexes may be combined with other indexes to produce a plan returning a larger range of data.

This sounds a little weird. I’ll show you what I mean using the WideWorldImporters database.

Kendra exposes an interesting difference in the two types of index and a case where filtered indexes simply fail  (though that’s not a situation you want to be in anyhow!).

Installing Zeppelin On Windows 10

Paul Hernandez shows how to install Apache Zeppelin on Windows 10:

There are several settings you can adjust. Basically, there are two main files in the ZEPPELIN_DIR\conf :

  • zeppelin-env
  • zeppelin-site.xml

In the first one you can configure some interpreter settings. In the second more aspects related to the Website, like for instance, the Zeppelin server port (I am using the 8080 but most probably yours is already used by another application)

This is a very clear walkthrough.  Jupyter is still easier to install, but Paul’s blog post lowers that Zeppelin installation learning curve.

Polybase With Compression

I have a post looking at Polybase support for different compression formats:

This is a very interesting set of results.  First, 7Zip archived files do not work with the default encoding.  I’m not particularly surprised by this result, as 7Zip support is relatively scarce across the board and it’s a niche file format (though a very efficient format).

The next failure case is tar.  Tar is a weird case because it missed the first row in the file but was able to collect the remaining 776 records.  Same goes for .tar.gz.  I unpackaged the .tar file and the constituent SecondBasemen.csv file did in fact have all 777 records, so it’s something weird about the codec.

Stick to BZip2 and GZip if you’re using flat files.

Categories

January 2019
MTWTFSS
« Dec  
 123456
78910111213
14151617181920
21222324252627
28293031