R Internals: Data Sizes With Nullable Columns

Niels Berglund digs into the Binary Exchange Langage (BXL) and notices something weird about data sizes:

When looking at the data sent, the size of the packages and “drilling” into the TCP packets we could deduct that: :

  • Each column has an over-head of 32 bytes (at least for non nullable data)

  • The size of the column in one row is the size of the data type for numeric types.

  • For decimal and numeric an extra byte is added to each column, where this byte indicates the precision.

  • Columns of alpha numeric type all had 2 bytes pre-pended to the bytes, except max types.

  • For char and nchar the storage size was 2 bytes plus the size the column was defined as.

  • For varchar and nvarchar the storage size was 2 bytes plus the size of the data stored.

  • For the varmax data types the number of bytes that were pre-pended varied dependent on the data size.

Read the whole thing.

Related Posts

An Introduction To seplyr

John Mount guest blogs on the Revolutions blog about seplyr: seplyr is an R package that supplies improved standard evaluation interfaces for many common data wrangling tasks. The core of seplyr is a re-skinning of dplyr‘s functionality to seplyr conventions (similar to how stringr re-skins the implementing package stringi). Read on for a couple of examples of where seplyr can make it easier for you to […]

Read More

R In Linux For Windows

David Smith shows how to install and use R in the Windows Subsystem for Linux: R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930