I recently completed my masters thesis, Using Aggregated Demographic Data To Inform Electoral Boundary Redistributions: 2010 South Australian Election.
Lots of people ask me to tell them what the thesis was about. What follows is a summary of the thesis, which will hopefully answer that question. After it has been examined I'm happy to publish the whole thing here, not that you'll actually want to read it.
Electoral district boundaries in South Australia are reviewed and redrawn if necessary after every state election. These redistributions are conducted by a statutory authority that is independent of the government called the Electoral Districts Boundaries Commission (EDBC).
The EDBC is required to ensure that electoral boundaries conform with a notion of fairness contained in the Constitution. In general terms this means that the boundaries should ensure that the party that receives the majority of the votes (after the distribution of preferences) at an election should be able to form government.
Since this fairness requirement came into effect in 1991, there have been six South Australian elections, and in three of these elections the party that received a majority of the State-wide two-party preferred votes (in all cases, the Liberal Party) was not able to form government. This indicates that either this characterisation of fairness is unworkable in practice, or that more information and advanced techniques are required to implement it effectively.
One key part of the EDBC's method of redistribution involves calculating estimates for the strength of support for each major party in small areas of geography called `collection districts'. There are more than 3000 collection districts in South Australia. These estimates are then used to make decisions about which collection districts to move between electoral districts.
This thesis is chiefly concerned with the calculation of these estimates. We develop new methods of calculating them using new information in an attempt to improve the estimates, and hence improve the information available to the EDBC.
The new information we use is data about the demographics of each collection and electoral district, sourced from the periodical Census of Population and Housing conducted by the Australian Bureau of Statistics (ABS). We use data from the 2006 Census, along with election returns from the 2010 state election. Principal Component Analysis techniques are used to explore and visualise the predictor datasets.
The thesis develops a series of logistic regression models, with either two or three response categories. Gradual improvements are gained over the course of the thesis. The models are verified and checked using standard statistical techniques and using a set of summary statistics and visualisations.
The preferred model in the thesis is one that combines demographic information from the ABS with some 'spatial' information inherent in the system; taking advantage of the fact that collection districts are nested in electoral districts.
After settling on this model, the predictions for the support for each major party in each collection district is compared to the predictions that were actually used by the EDBC.
While further research is required to establish the improved accuracy of our predictions, we argue that they are credible and overcome some clear shortcomings in the EDBC predictions, and that these methods deserve further attention.