Inferential methods in the Populations project are used to estimate, summarise, and predict how software is developed, used, and re-configured. A key concern is how the results of the analysis can be visualised and interpreted by people who influence and impact the engineering and re-engineering of software structures. There are a number of aspects of the data which make the problem non-trivial. The information is inherently dynamic and behaviour is highly heterogeneous within the population. Both these issues are tackled using a range of well-developed inferential methods.

One set of inferential tools used in the project are general Admixture Models, which represent each individual in the population as a vector of mixture weights over a set of data generating distributions. While such models are often used in genetics and document analysis, applications to digital traces arising from software interactions are relatively young. The project aims to develop general admixture models for such data.

Among the data currently being studied is a dataset for a population of users interacting with a smart phone application. User behaviour is unpredictable, and how users explore the application can be described as an admixture of random walks. Parameter-learning leads to a segmentation of users and a probabilistic abstraction of the user population. Important probability values of the abstract model help guide the construction of a formal model of the user population.

The datasets to be analysed come from currently live iPhone applications. This presents a rare opportunity to work with a large amount of current data. Additionally, the domain knowledge of the the developers, and their deep understanding of their own applications, help to guide the exploratory analysis.