Data hierarchy

Compared to ggplot2, ggvis has a much richer data hierarchy. In ggplot2, you could define a data set and aesthetic mappings in the base plot, and override them in each layer, but since layers could not contain other layers, there were only ever two levels in the tree. ggvis is more flexible because ggvis nodes (the equivalent of ggplot2 layers) can contain child nodes. This makes it possible to have whatever level of hierarchy that is best suited to your plot.

This is also related to a confusion in ggplot2 where geom often were actually aliases for a geom + combination. For example:

geom_histogram = geom_bar + stat_bin
geom_smooth = geom_smooth + stat_smooth

In ggvis, there is a clear demarkation between marks (pure drawing) and layers (drawing + transformation):

layer_histogram = mark_bar + transform_bin
layer_smooth = (mark_line + mark_area) + transform_smooth

ggplot2 needed special geoms like geom_smooth because the data hierarchy was not deep enough, and there was no standard way for a stat to take the output of another stat as it’s input, and no way for a stat to feed data to multiple layers without either creating a custom geom or duplicating the stat multiple times.

Data pipeline

A ggvis specification creates a data pipeline that flows from the starting node to all of the leaves (marks). There are two ways to create a data pipeline:

explicitly with pipeline: pipeline(mtcars, transform_bin())
implicitly in ggvis/node: ggvis(mtcars, transform_bin()) The data pipeline is stored in the data element of the ggvis_node object.

To see how a pipeline works, we’ll create some explicitly and then flow flow() properties down them pipeline and see what happens. Since pipelines can also contain reactive data sources, this has to be done in a reactive environment. To avoid this complication, we’ll use sluice() which sets up everything we need.

In this section we’ll explore explicitly creating pipelines to help understand what is going on behind the scenes. We’ll start with a very simple pipeline that just contains a dataset. Sluicing it just returns the data as is.

df <- data.frame(x = 1:9, grp = rep(1:3, each = 3))

# A dataset has a name, and unique id (hash) in bracket
a <- pipeline(df)
a

#> |-> df (dd550670b5d6c60fb7cefceba4d1d876)

sluice(a, props(x = ~x))

#>   x grp
#> 1 1   1
#> 2 2   1
#> 3 3   1
#> 4 4   2
#> 5 5   2
#> 6 6   2
#> 7 7   3
#> 8 8   3
#> 9 9   3

The next two pipelines add a data transformation. b1 summarises the data by binning it. b2 splits the data into pieces according to the value of the grp variable. b2 returns a special data structure called a split_df: transforms need to be written specifically to understand how to operate with this datastructure. Slucing the pipeline performs those transformations:

b1 <- pipeline(df, transform_bin(binwidth = 3))
b1

#> |-> df (dd550670b5d6c60fb7cefceba4d1d876)
#>  -> bin(binwidth = 3, right = TRUE)

b2 <- pipeline(df, by_group(grp))
b2

#> |-> df (dd550670b5d6c60fb7cefceba4d1d876)
#>  -> split_by(variables = grp, env = <environment: 0x6b44338>)

sluice(b1, props(x = ~x))

#>   count__    x xmin__ xmax__ width__
#> 1       0 -1.5     -3      0       3
#> 2       2  1.5      0      3       3
#> 3       3  4.5      3      6       3
#> 4       3  7.5      6      9       3
#> 5       1 10.5      9     12       3

sluice(b2, props(x = ~x))

#> [[1]]
#>   x grp
#> 1 1   1
#> 2 2   1
#> 3 3   1
#> 
#> [[2]]
#>   x grp
#> 4 4   2
#> 5 5   2
#> 6 6   2
#> 
#> [[3]]
#>   x grp
#> 7 7   3
#> 8 8   3
#> 9 9   3
#> 
#> attr(,"class")
#> [1] "split_df"
#> attr(,"variables")
#> attr(,"variables")[[1]]
#> grp

Finally, c combines the operation of b1 and b2, first splitting the data into pieces, then summarising each group with binned counts:

c <- pipeline(df, by_group(grp), transform_bin(binwidth = 3))
c

#> |-> df (dd550670b5d6c60fb7cefceba4d1d876)
#>  -> split_by(variables = grp, env = <environment: 0x6b44338>)
#>  -> bin(binwidth = 3, right = TRUE)

sluice(c, props(x = ~x))

#> [[1]]
#>   grp count__    x xmin__ xmax__ width__
#> 1   1       0 -1.5     -3      0       3
#> 2   1       2  1.5      0      3       3
#> 3   1       1  4.5      3      6       3
#> 
#> [[2]]
#>   grp count__   x xmin__ xmax__ width__
#> 1   2       0 1.5      0      3       3
#> 2   2       2 4.5      3      6       3
#> 3   2       1 7.5      6      9       3
#> 
#> [[3]]
#>   grp count__    x xmin__ xmax__ width__
#> 1   3       0  4.5      3      6       3
#> 2   3       2  7.5      6      9       3
#> 3   3       1 10.5      9     12       3
#> 
#> attr(,"class")
#> [1] "split_df"
#> attr(,"variables")
#> attr(,"variables")[[1]]
#> grp

Note that pipeline will automatically remove redundancies from a pipeline:

pipeline(mtcars, sleep)

#> |-> sleep (00184d898f231f6433845f2c73f844e0)

pipeline(mtcars, by_group(cyl), sleep)

#> |-> sleep (00184d898f231f6433845f2c73f844e0)

Combining props

In ggplot2, layers had an inherit.aes property which control whether or not a layer would inherit properties from the parent dataset - this is particularly useful when writing functions that add annotation to arbitrary plots - you don’t want other properties that the user set interfering with your layer. In ggvis, that’s now a property of props(): props(inherit = FALSE).

To see how ggvis combines properties, you can use the merge_props function:

merge_props <- ggvis:::merge_props
merge_props(props(x = ~x), props(y = ~y))

#> * x.update: <variable> x (scale: auto)
#> * y.update: <variable> y (scale: auto)

merge_props(props(x = ~a), props(x = ~b))

#> * x.update: <variable> b (scale: auto)

merge_props(props(x = ~a, y = ~a), props(x = ~b, inherit = FALSE))

#> * x.update: <variable> b (scale: auto)
#> inherit: FALSE

There is currently no way to remove a prop inherited from the parent. See https://github.com/rstudio/ggvis/issues/37 for progress.

Case studies

Minard’s march

ggplot(Minard.cities, aes(x = long, y = lat)) +
  geom_path(
    aes(size = survivors, colour = direction, group = group),
    data = Minard.troops
  ) +
  geom_point() +
  geom_text(aes(label = city), hjust=0, vjust=1, size=4)

In ggvis, we can make it a little more clear that we have one marked based on survivors dataset and two marks based on the cities dataset.

ggvis(props(x = ~long, y = ~lat)) +
  layer(
    Minard.troops,
    props(size = ~survivors, stroke = ~direction),
    layer_point()
  ) +
  layer(
    Minard.cities,
    layer_point(),
    layer_text(props(text := ~city, dx := 5, dy := -5))
  )