.replacing r with d3
01 Jul 2014Introduction
I have always used R and in particular Rmd when doing any exploratory analysis in order to create reproducible results.
R was a major step forward from the process of having a large directory of scripts in multiple languages which output CSV information to be loaded into Excel… At this point I believe it is time to make another change due to some fundamental issues which are not being addressed in R workflows.
The major drawbacks I experience with R projects are.
- There is no requirement to describe how the dataset was obtained. It is good practice to list the original location but there is no requirement.
- R projects change often and are kept in sync with version and date numbers appended to the file. After a few revisions this becomes a nightmare to try and understand.
- As more analysis is added to the R scripts there is no way to verify the assumptions about the data being analysed is still true. (There is no way to test it)
- R scripts often end up in one large incomprehensible file.
These are all common problems in software development which I believe have been solved. In software development we write the program which downloads the original dataset, we use git (or other revision control systems), we rely on test coverage reports and we modularize code.
With these ideas in mind I am adjusting my R workflow to follow my development workflow.
A Different Approach
My new approach is to extract, transform, load and analyse information using D3.js. This allows me to have a quick turnaround for interactive illustrations while building the analysis using thorough testing.
My Setup
I like to have things run continually and live. Each time I change a line of code I like to see it reflected immediately. In order to accomplish this I rely on
, 1
npm
and 1
grunt
.1
bower
NPM Setup
My base
looks something like this.1
package.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
// ... meta data about this package ...
"devDependencies": {
"bower": "*",
"coffee-script": "*",
"grunt": "*",
"grunt-cli": "*",
"grunt-contrib-coffee": "*",
"grunt-contrib-connect": "^0.8.0",
"grunt-contrib-watch": "^0.6.1",
"q": "*"
},
"scripts": {
"postinstall": "bower install"
}
}
- bower
- I use bower in order to keep track of the javascript dependencies required for the web interface like D3 and NVD3.
- coffee-script
- My personal preference is to use coffee-script over javascript. Either works fine.
- grunt
- I hate having to type in multiple commands while working, grunt allows me to easily run all my required commands whenever files change so I can live edit.
- grunt-cli
- This package installs a
command, I usually do this globally as well1
grunt
.1
npm install -g grunt-cli
- grunt-contrib-coffee
- Grunt module to compile coffee-script to javascript.
- grunt-contrib-connect
- In order to work with d3 I need a live site, this allows me to have a quick development webserver to use.
- grunt-contrib-watch
- This is the grunt plugin which watches files for changes and will execute a command. In my case I will compile coffee-script and reload the page in my browser.
- q
- Not required but often I need a promise instead of getting stuck in a callback hell.
This is the basic layout but
and 1
grunt
both require additional configuration.1
bower
Bower Setup
Bower is powered by
and is setup as follows.1
bower.json
1
2
3
4
5
6
7
8
{
// ... project meta data ...
"dependencies": {
"jquery": "1.11.1",
"d3": "latest",
"nvd3": "latest"
}
}
- jquery
- Maybe by habit I include
by default… It is useful for doing1
jQuery
requests which return promises instead of the d3 callback approach.1
.getJSON
- d3
- The latest version of d3 is used to build visualizations, parse information and convert different types.
- nvd3
- This library is useful when creating basic graphs for initial explorations. It lacks tests and is often broken between versions of d3 so I don’t completely trust it.
Grunt Setup
Grunt is powered by a Gruntfile, in my case I am using
.1
Gruntfile.coffee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
module.exports = (grunt) ->
grunt.initConfig
connect:
server:
port: 8000
base: './web'
coffee:
compile:
files:
'web/js/main.js': ['web/app/**/*.coffee']
watch:
coffee:
options:
livereload: true
files: ['web/app/**/*.coffee', 'web/**/*.html']
tasks: ['coffee']
grunt.loadNpmTasks 'grunt-contrib-connect'
grunt.loadNpmTasks 'grunt-contrib-coffee'
grunt.loadNpmTasks 'grunt-contrib-watch'
grunt.registerTask 'default', ['connect', 'watch']
This Gruntfile will setup three different pieces.
- connect
- Initializes a basic webserver which hosts the directory specified by
.1
base
- coffee
- Compiles the different coffee-script files into a single javascript file which can be included in a page.
- watch
- Whenever a
or1
.coffee
file changes this will compile the coffee-script and reload my open browser tab.1
.html
- default
- Using
will say that by default I want to run both those tasks, this allows me to run1
grunt.registerTask 'default', ['connect', 'watch']
with no parameters and let it do what I need it to.1
grunt
Conclusion
This is a basic setup, I haven’t included any testing information but this will get a project started. Using this my workflow is more difficult than using R in Rstudio but it allows me a great deal of freedom. This process forces me to follow an approach which will have reproducible results and is easily shared with others over the web.