.replacing r with d3

01 Jul 2014

Introduction

I have always used R and in particular Rmd when doing any exploratory analysis in order to create reproducible results.

R was a major step forward from the process of having a large directory of scripts in multiple languages which output CSV information to be loaded into Excel… At this point I believe it is time to make another change due to some fundamental issues which are not being addressed in R workflows.

The major drawbacks I experience with R projects are.

There is no requirement to describe how the dataset was obtained. It is good practice to list the original location but there is no requirement.
R projects change often and are kept in sync with version and date numbers appended to the file. After a few revisions this becomes a nightmare to try and understand.
As more analysis is added to the R scripts there is no way to verify the assumptions about the data being analysed is still true. (There is no way to test it)
R scripts often end up in one large incomprehensible file.

These are all common problems in software development which I believe have been solved. In software development we write the program which downloads the original dataset, we use git (or other revision control systems), we rely on test coverage reports and we modularize code.

With these ideas in mind I am adjusting my R workflow to follow my development workflow.

A Different Approach

My new approach is to extract, transform, load and analyse information using D3.js. This allows me to have a quick turnaround for interactive illustrations while building the analysis using thorough testing.

My Setup

I like to have things run continually and live. Each time I change a line of code I like to see it reflected immediately. In order to accomplish this I rely on

npm

1
grunt

and

1
bower

NPM Setup

My base

1	package.json

looks something like this.

{
  // ... meta data about this package ...
  "devDependencies": {
    "bower": "*",
    "coffee-script": "*",
    "grunt": "*",
    "grunt-cli": "*",
    "grunt-contrib-coffee": "*",
    "grunt-contrib-connect": "^0.8.0",
    "grunt-contrib-watch": "^0.6.1",
    "q": "*"
  },
  "scripts": {
    "postinstall": "bower install"
  }
}

bower

I use bower in order to keep track of the javascript dependencies required for the web interface like D3 and NVD3.

coffee-script

My personal preference is to use coffee-script over javascript. Either works fine.

grunt

I hate having to type in multiple commands while working, grunt allows me to easily run all my required commands whenever files change so I can live edit.

grunt-cli

This package installs a

1
grunt

command, I usually do this globally as well

1
npm install -g grunt-cli

grunt-contrib-coffee

Grunt module to compile coffee-script to javascript.

grunt-contrib-connect

In order to work with d3 I need a live site, this allows me to have a quick development webserver to use.

grunt-contrib-watch

This is the grunt plugin which watches files for changes and will execute a command. In my case I will compile coffee-script and reload the page in my browser.

q

Not required but often I need a promise instead of getting stuck in a callback hell.

This is the basic layout but

grunt

and

1
bower

both require additional configuration.

Bower Setup

Bower is powered by

1	bower.json

and is setup as follows.

{ 
  // ... project meta data ...
  "dependencies": {
    "jquery": "1.11.1",
    "d3": "latest",
    "nvd3": "latest"
  }
}

jquery

Maybe by habit I include

1
jQuery

by default… It is useful for doing

1
.getJSON

requests which return promises instead of the d3 callback approach.

d3

The latest version of d3 is used to build visualizations, parse information and convert different types.

nvd3

This library is useful when creating basic graphs for initial explorations. It lacks tests and is often broken between versions of d3 so I don’t completely trust it.

Grunt Setup

Grunt is powered by a Gruntfile, in my case I am using

1	Gruntfile.coffee

module.exports = (grunt) ->
  grunt.initConfig
    connect:
      server:
        port: 8000
        base: './web'
    coffee:
      compile:
        files:
          'web/js/main.js': ['web/app/**/*.coffee']
    watch:
      coffee:
        options:
          livereload: true
        files: ['web/app/**/*.coffee', 'web/**/*.html']
        tasks: ['coffee']

  grunt.loadNpmTasks 'grunt-contrib-connect'
  grunt.loadNpmTasks 'grunt-contrib-coffee'
  grunt.loadNpmTasks 'grunt-contrib-watch'

  grunt.registerTask 'default', ['connect', 'watch']

This Gruntfile will setup three different pieces.

connect

Initializes a basic webserver which hosts the directory specified by

1
base

coffee

Compiles the different coffee-script files into a single javascript file which can be included in a page.

watch

Whenever a

1
.coffee

1
.html

file changes this will compile the coffee-script and reload my open browser tab.

default

Using

1
grunt.registerTask 'default', ['connect', 'watch']

will say that by default I want to run both those tasks, this allows me to run

1
grunt

with no parameters and let it do what I need it to.

Conclusion

This is a basic setup, I haven’t included any testing information but this will get a project started. Using this my workflow is more difficult than using R in Rstudio but it allows me a great deal of freedom. This process forces me to follow an approach which will have reproducible results and is easily shared with others over the web.

Erik Erwitt

.replacing r with d3

Introduction

A Different Approach

My Setup

NPM Setup

Bower Setup

Grunt Setup

Conclusion

Related Posts

.Moving to Medium 18 Nov 2017

.TensorFlow From CSV to API 14 Jan 2016

.velato-js 10 Nov 2014