Press "Enter" to skip to content

Day: April 9, 2018

Learning About Spatial Data In R

Steph Locke has a compendium of resources for people wishing to learn more about working with spatial data in R:

I recently met up with someone who does geospatial stuff but uses the more traditional GIS software to do it. I showed him a few things in R but not being a person who does a lot of geospatial analysis I thought I’d ask the lovely #rspatial crowd what they’d recommend. Here are the compiled recommendations. Happy learning spatial R!

Feel free to comment or tweet your recommendations to get them added to this list.

There’s a lot of reading, watching, and doing there, so thanks to Steph for putting it together.

Comments closed

Transforming Data: ELT Or ETL?

Artyom Keydunov argues that Extract-Load-Transform is a better model than Extract-Transform-Load:

ETL arose to solve a problem of providing businesses with clean and ready-to-analyze data. We remove dirty and irrelevant data and transform, enrich, and reshape the rest. The example of this could be sessionization: the process of creating sessions out of raw pageviews and users’ events.

ETL is complicated, especially the transformation part. It requires at least several months for a small-sized (less than 500 employees) company to get up and running. Once you have the initial transform jobs implemented, never-ending changes and updates will begin because data always evolves with business.

The other problem of ETL is that during the transformation, we reshape data into some specific form. This form usually lacks some data’s resolution and does not include data that is useless for that time or for that particular task. Often, “useless” data becomes “useful.” For example, if business users request daily data instead of weekly, then you will have to fix your transformation process, reshape data, and reload it. That would take a few weeks more.

Read on for more, including his argument for why ELT is better.

Comments closed

Monitoring Performance Of Natively Compiled Stored Procedures

Jos de Bruijn announces a feature coming to the next version of SQL Server:

We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP with natively compiled stored procedures.

After enabling these options, you can monitor the performance of natively compiled stored procedures using Query Store, as well as the DMVs sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. Note that there is a performance impact to enabling execution statistics collection, thus we recommend to disable stats collection when not needed.

That last sentence is important:  there’s an observer effect which slows down execution of natively compiled stored procedures, and considering that you’re implementing them specifically for the speed, that’s fairly unwelcome.

Comments closed

Using SQLCMD’s Exit Command

Louis Davidson has a quick tip on using :EXIT in SQLCMD mode:

The first use I will cover will help you stopping from running an entire file’s worth of SQL statements without meaning to. I use this mostly when doing demo code, but it certainly finds its way into some of my other code as well. The hotkeys in SSMS for hiding the results (Ctrl-R), and executing a query (Ctrl-E) are next to each other. In a demo, where you are showing code, statement by statement, accidentally executing the entire file of queries can cause you to spend unwanted time recovering with many eyes staring back at you. This is only slightly better than accidentally running code in your office system, and losing your job.

Read on for more.

Comments closed

Query Tuning With The APPLY Operator

Daniel Janik walks through using the APPLY operator to tune a couple of queries:

Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.

Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.

Let’s compare some code to see how APPLY stacks up.

I like the APPLY operator so much that I created an entire presentation on it.  It’s not a cure-all by any means, but if you understand the intent, you can find places where it improves your code significantly.

Comments closed

Finding The Real Character Set: Unicode And SQL Server Identifiers

Solomon Rutzky wraps up his series on Unicode and regular identifiers:

The question that I’m trying to answer is: what are the valid “letters” and “decimal numbers” from other national scripts?

I tried using the online research tool “UnicodeSet”, but that gave slightly different results compared (using the “alphabetic” and “numeric_type = decimal” properties) to what I discovered SQL Server actually accepts.

I then loaded the actual Unicode 3.2 data files only to find that the number of characters having either the “alphabetic” or “numeric_type = decimal” properties was different than both the online search and what SQL Server actually accepts.

And so…..

Click through to find the real Unicode killer.

Comments closed

Wait Stats Related To Columnstore Indexes

Niko Neugebauer has some documentation on important wait stats around columnstore:

I split the known wait types into the following distinct groups:
– Batch Execution Mode (HTBUILD, HTDELETE, HTMEMO, HTREINIT, HTREPARTITION, PWAIT_QRY_BPMEMORY, BPSORT)
– Columnstore Indexes (ROWGROUP_VERSION, ROWGROUP_OP_STATS, SHARED_DELTASTORE_CREATION, COLUMNSTORE_BUILD_THROTTLE, COLUMNSTORE_MIGRATION_BACKGROUND_TASK, COLUMNSTORE_COLUMNDATASET_SESSION_LIST)

With an appearance in the next SQL Server version of the Batch Execution Mode for the RowStore Indexes, the first group of the waits will suddenly be becoming more important for every single SQL Server user and mixing it together with the Wait Types specific to the internal structures and functions of the Columnstore Indexes makes no sense.

Read on to learn more about these important wait types.

Comments closed

Finding System-Generated Constraint Names

Michael J. Swart has a script which helps you find system-generated constraint names:

Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.

But how do you know whether you have any?

Check out the informative comments as well.

Comments closed