Flirt With Julia

Learn the Julia programming language through real-world examples.

Choropleth Map With Vegalite.jl

18 Feb 2019 »

In this post we’ll use the awesome VegaLite.jl package to build a choropleth (heat) map showing the unemployment rate in U.S. counties (and a little something extra at the end). VegaLite.jl makes it incredibly easy to build high-quality, publication-ready data visualizations in Julia. The map we’ll produce in this post will be rendered as an SVG, which is really nice when you want to scale your visualization without losing any quality!

Without further ado, let’s get it coded! 💻

Install/Load Dependencies


using Pkg
Pkg.add(["URIParser", "VegaLite", "VegaDatasets"])
using URIParser, VegaLite, VegaDatasets


  • URIParser will help us pull in some data from a URL (more on this in a few minutes)
  • As mentioned, we will be using VegaLite.jl to build this visual, and we’ll also use VegaDatasets for our unemployment data and for our shape file (our U.S. counties will be drawn from a TopoJSON file)

First, let’s define a couple of variables to hold our shape file and our unemployment data:

us10m = dataset("us-10m").path
unemployment = dataset("unemployment.tsv").path


Here we’ve simply called the dataset function from VegaDatasets, passing in the name of the data file that we want, and added .path to get the file path of the downloaded/saved file so that our plot knows where to get the info from.

In order to get a better idea of how the VegaLite package works, we’re going to build this visual iteratively. The first step is to simply define our plot area, as so:

@vlplot(width=800, height=600)


If you try to run this, you’ll get an error so let’s go ahead and add in our shape file so that we have something to look at:

@vlplot(width=800, height=600) + 
@vlplot(
    mark={ 
        :geoshape,
        fill=:lightgrey,
        stroke=:white
    },
    data={
        url=us10m,
        format={
            typ=:topojson,
            feature=:counties
        }
    },
    projection={
        typ=:albersUsa
    }
)


Note that we used the + sign to add another piece to our plot. The second “piece” that we added has three important items: a mark, a data item, and since the data item that we added represents geographical shapes, we defined the projection.

For now, don’t get caught up in the strange syntax - you’ll get the hang of it and will eventually stop being bothered by the fact that it looks kind of like a JavaScript object, and kind of like some Julia code (wouldn’t it be cool if JavaScript and Julia had a baby?) 💑🍼

For the mark, we told our plot that it was going to be a :geoshape (note the Symbol syntax) and then we assigned a fill color as well as a stroke color (both Symbols). Then, we defined our data item by assigning our previously-defined us10m variable to url. Next, we needed to give our plot some more info about our data item, so we did that with the format bit that you see in the code, letting our plot know that this data object is in TopoJSON format and that we want the counties feature (all TopoJSON objects have features, so you have to let it know which one(s) we want it to display). Finally, we chose to use an Albers projection to display our map. Running this code produces the following output:

svg

Alright, at this point you’re thinking, “This is a really 💩 looking map. I thought you said we were going to produce something that’s publication quality!” Patience, young grasshopper, I will not fail you 🥋🐉

That reminds me, we have a problem with the way the code is currently written. We want to build a choropleth map, meaning we want each county to vary in color, based on the data (unemployment rates in our case). In order to make that happen, we have to get rid of the fill and stroke that we added, because we want our color to be dependent on the data. Let’s go ahead and delete those two lines and then add a transformation to our TopoJSON data so that the counties vary in color by their relative unemployment rates:

@vlplot(width=800, height=600) +
@vlplot(
    mark={
        :geoshape
    },
    data={
        url=us10m,
        format={
            typ=:topojson,
            feature=:counties
        }
    },
    transform=[{
        lookup=:id,
        from={
            data=unemployment,
            key=:id,
            fields=["rate"]
        }
    }],
    color={
        "rate:q",
        scale={scheme=:reds},
        legend={title="Unemployment Rate"}
    },
    projection={
        typ=:albersUsa
    }
)


So, as you can see, we removed the two lines of code that hard-coded a fill and stroke, added a transform block and then a color block to get the desired coloring effect, as well as a nice legend so that our viewers understand what they’re looking at.

The transform block simply tells our plot to lookup the :id column from our unemployment data, and to grab the corresponding “rate” field, which represents the unemployment rate. Then, we wrote a color block in which we tell our plot that we want the color to be based on the value of the unemployment rate. The :q on the end tells our plot that the rate is a quantitative data type (other options are :t for temporal, :n for nominal, :o for ordinal). We defined a color scheme for our scale and then added a title to the automatically-generated legend. This produces the following output:

svg

TIP: To familiarize yourself with VegaLite, try changing the :q to :o and you’ll see how each county is now ranked ordinally in the visualization. Also, add a custom domain by changing scale={scheme=:reds} to scale={domain=[0, 0.15], scheme=:reds} and see how that changes the output.

Alright, we now have a really nice, publication-ready visualization, but I don’t want to leave you with just this. In the real world, you’re probably not going to be publishing visualizations based on VegaDatasets data, so let’s add another layer onto our visualization that pulls in some data from an external source (this is the reason that we loaded URIParser at the beginning of this post!!). We’ll be loading in a CSV file that contains all major U.S. cities’ latitudes/longitudes, along with their population, and then drawing circles on top of our choropleth map, the sizes of which will correspond to the population of the city. Let’s look at the code first (added on to what we’ve already done):

@vlplot(width=800, height=600) +
@vlplot(
    mark={
        :geoshape
    },
    data={
        url=us10m,
        format={
            typ=:topojson,
            feature=:counties
        }
    },
    transform=[{
        lookup=:id,
        from={
            data=unemployment,
            key=:id,
            fields=["rate"]
        }
    }],
    color={
        "rate:q",
        scale={domain=[0, 0.3],scheme=:reds},
        legend={title="Unemployment Rate"}
    },
    projection={
        typ=:albersUsa
    }
) +
@vlplot(
    mark={
        :circle
    },
    data={
        url=URI("https://raw.githubusercontent.com/plotly/datasets/master/2014_us_cities.csv"),
        format={
            typ=:csv
        }
    },
    longitude="lon:q",
    latitude="lat:q",
    size={
        "pop:q",
        scale={range=[0, 700]},
        legend={title="City Populations"}
    },
    color={value=:purple}
)


We’ve added a mark block in which we tell our plot that we want to draw circles, and then we added a data block, setting the url variable to the parsed URL as seen in the code (parsed by URIParser). We also let the plot know that this data is in CSV format. Next, we assigned the appropriate columns in the CSV file to the latitude and longitude variables on our object, and then we wrote a size block that ensures the circle sizes vary with the relative populations of the cities (feel free to play with the range until the output suits what you’re seeking). Notice that it is within the size block that we defined our legend this time, rather than the color block as we did before. This is because we want our legend to be based on our circle sizes, not our color (as there is only one). Finally, we colored our circles purple.

This is the final output:

svg

The VegaLite.jl library is quite extensive and to learn it thoroughly you’ll need to not only look through VegaLite.jl documentation, but also the original VegaLite documentation.

Until next time, I hope you’ve enjoyed flirting with Julia! 💘