10,000 R Packages

Kevin Feasel

2017-01-31

R

David Smith notes that CRAN is now up to 10,000 packages:

Having so many packages available can be a double-edged sword though: it can take some searching to find the package you need. Luckily, there are some resources available to help you:

  • MRAN (the Microsoft R Application Network) provides a search tool for R packages on CRAN.

  • To find the most popular packages, Rdocumentation.org provides a leaderboard of packages by number of downloads. It also provides lists of newly-released and recently-updated packages.

R is a big language; having good heuristics for figuring out where to find appropriate packages is extremely important.

LOGMGR_RESERVE_APPEND

Paul Randal explains an uncommon wait stat:

Last week I was sent an email question about the cause of LOGMGR_RESERVE_APPEND waits, and in Monday’s Insider newsletter I wrote a short explanation. It’s a very unusual wait to see as the highest wait on a server, and in fact it’s very unusual to see it at all.

It happens when a thread is generating a log record and needs to write it into a log block, but there’s no space in the log to do so. The thread first tries to grow the log, and if it fails, and the database is in the simple recovery mode, then it waits for 1 second to see if log clearing/truncation can happen in the meantime and free up some space. (Note that when I say ‘simple recovery mode’, this also included a database in full or bulk_logged, but where a full backup has not been taken – i.e. the database is operating in what’s called pseudo-simple.)

Read on for more details and a repro script.

Azure Functions: Contact Form

Eli Weinstock-Herman explains how to use Azure Functions to create dynamic content on an otherwise-static page:

My personal website is a static site: 100% HTML, JS, and CSS files with no server-side processing. I have custom code that pulls data from a variety of sources and builds updated versions of the files from templates, which are then deployed to the host. I do this to move the CPU latency of building the pages to my time, instead of charging it to visitors on each page hit. While I have a host, a strategy like this means I could also choose to host for free via github or similar services.

So there’s a great benefit to the reader and our wallet, but no server-side execution makes things like contact forms trickier. Luckily, Azure Functions or AWS Lambda can be used as a webhook to receive the form post and process it, costing nothing near nothing to use (AWS and Azure both offer a free tier for 1M requests/month and 400,000 GB-seconds of compute time).

Eli has a working example in the post, which I recommend checking out.

Memory-Optimized Table Warnings

Robert Davis looks at messages in the error log related to memory-optimized tables:

The server on which we are running in-memory OLTP is a really hefty server with 128 logical cores and 1.5 TB of RAM (1.4 TB allocated to SQL Server). We are limiting in-memory’s memory usage with Resource Governor, which also makes it easy to see how much it is using. Needless to say, even with a limited percentage of 1.4 TB of RAM is still a lot of memory. The highest I have seen in-memory usage for this one database reach at peak activity levels is ~43 GB. In production, when the heavy in-memory OLTP processes complete, I see the system reclaim the in-memory buffers pretty quickly, though not completely. During a normal day, I often see the in-memory memory usage hovering between 1 and 3 GB even when there is virtually no traffic.

When testing in-memory on a dev server that only I was using before deploying to production, I noticed that the memory usage would stay at whatever high level it reached. This makes me believe that in-memory buffers are cleaned up and reclaimed as needed, and if not needed, they just hang around as in-memory buffers. And it appears that some of the buffers end up hanging around. Perhaps they wouldn’t if the server was memory starved. I have not tested that theory.

It’s a conjecture, but seems pretty solid.  Also worth reiterating is that they’re warnings, not errors.

Troubleshooting Connectivity Errors

The CSS SQL Server Engineers have a guide for solving connectivity issues:

In addition to providing a quick checklist of items that you can go through, the doc provides step by step troubleshooting procedures for the following error messages:

  • A network-related or instance-specific error occurred while establishing a connection to SQL Server

  • No connection could be made because the target machine actively refused it

  • SQL Server does not exist or access denied

  • PivotTable Operation Failed: We cannot locate a server to load the workbook Data Model

  • Cannot generate SSPI context

  • Login failed for user

  • Timeout Expired

  • The timeout period elapsed prior to obtaining a connection from the pool

Click through for the guide.  It’s in choose-your-own-adventure format, though without nice graphics.

Caching KPI Reports

Kathi Kellenberger discusses caching in SQL Server Reporting Services KPI reports:

Because these reports automatically show the data, the reports show cached data only. Imagine if hundreds or even thousands of report users brought the web portal page up each day causing the KPI reports to hit the database even when the report user was not interested in seeing the KPI reports at that time. That is why Microsoft decided to use cached data only in these reports.

When the data changes, the KPI report will continue to show the same information unless you configure a cache refresh plan on the dataset. Follow these instructions so that the KPI data will refresh on a scheduled basis.

Read on for a step-by-step guide on how to set up caching.

Appending Data In Power BI

Ginger Grant shows that data sets don’t need to be exactly the same for Power BI to combine their contents:

Recently I worked on a Power BI project where I needed to merge data provided in spreadsheets. The spreadsheets came from different vendors and while they contained mostly the same data, the columns were not in the same order. I wanted all of the data to reside in one table. In Query, that means that I wanted to Append the data. The files which I were merging were very wide, and I missed the fact until after I was done that some of the columns were in different order. Power BI is smart enough to figure out the order on its own. I didn’t need to change the order of the columns at all, as long as they have the same column names. Here’s an example using three different files.

That’s a sign of a smart tool.

Hierarchical Data Cleansing With Power BI

Cedric Charlier has started a series on dealing with hierarchical data in a not-so-hierarchical format:

To load this Excel file in Power BI, I’ll just use standard functions and define a first table “Source” that won’t be enabled to load in report.

My next task will be to create a table (or dimension) with the different questions. I also want to include a hierarchy in this dimension: I should be able to browse the questions by categories and sub-categories.

Let’s create a new table named “Question” by referencing the “Source” table. Then remove the other columns than A and B.

The curse of Excel is that it’s so easy to build a data set in strange ways that make it hard to integrate later.

Diagnosing Execution Plan Oddities

Kendra Little digs into an oddly complex execution plan:

Aha! This is a definite clue. Some sort of security wizardry has been applied to this table, so that when I query it, a bunch of junk gets tacked onto my query.

I have no shame in admitting that I couldn’t remember at all what feature this was and how it works. A lot of security features were added in SQL Server 2016, and the whole point of a sample database like this to kick the tires of the features.

Kendra’s post frames it as an impostor syndrome check, whereas I read it as a murder mystery.

SSRS Log File Location Change

Wolfgang Strasser points out that SSRS log files are in a new directory structure for vNext:

The log files can be found in the Logfiles directory (it was the same directory also for the older versions). In SSRS vNext there more different log files..

The logging information seems to be splitted into multiple log files – if you want for example dig into the Power BI on-premises logging I propose to have a look at the RSPower*.log files.

This happens every once in a while, so it’s good to know when the log files move somewhere else.

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031