In part 1 we looked at some basic ways of manipulating the Land Registry sales data to create some pieces of analysis.
One of the issues with using a more basic tool like Access is that it soon starts to get a bit flaky once the database size gets above 1GB, with the full Land Registry dataset coming in at 3.5GB you obviously need something else to process it in a single go (you could arguably take the multiple yearly files, process each one in a separate database then join the results by year into a single results database if you were that way inclined but that's obviously a fair amount of work).
Alteryx is an excellent (if not cheap) tool that can create process flows of your data import/processing/analysis and outputs, for the Land Registry data it was a case of importing the file in,
One feature that works well in Alteryx is the ability to run a process on a sample of the data, in this case rather than having to go through all the 21m+ records in the file you can your process on a sample, test the flow works as you'd expect before running on the whole file.
Instead of code like you might see if using something like SQL or Base SAS, you have a graphical interface more similar to that in SAS Enterprise guide although the layout is much more user-friendly in Alteryx with modules/nodes relating to specific tasks easier to find.
One of the advantages of Alteryx over standard code is it can be easier for someone else to take over (although I know plenty of people who are still loyal to Base SAS and if they can't go through something line by line then they don't want to know).
The workflow above takes in the 21m sales records, creates some new fields (e.g., Postcode area, sales price to nearest thousand pounds etc) and then creates a number of aggregate tables (the main one being median sales price by year by postcode area).
This table is then matched to itself to get the previous years value (if there's an easier function for this let me know) to enable us to create a % change figure per year per postcode area which is then output for use in Tableau.
There's plenty you can do within Tableau in terms of calculated fields, sometimes it'll be best to do the work in processing (in this case Alteryx) although sometimes you'll want more dynamic calculations so would need to be done in Tableau or whatever presentation layer tool is being used.
The tableau side of things is relatively straightforward, after adding in the Geographic pack provided by The Information Lab, it's a fairly easy task to map % change by Postcode Area with each year being a separate page, which gives the video below:
This isn't meant to be some in-depth study of property prices, just a quick way of showing that it's fairly straightforward to get from data to insight. In later parts we'll be looking at other tools and other public sources of data.