Data Clustering: Visualizing a Year of Phone Calls

John Wang
Mar 21, 2016
2 min read

Data clustering is a powerful way of visualizing networks through grouping similar data points together according to common attributes. In preparation for our upcoming project on the mobile phone networks of stroke patients, our lab has been piloting the use of data clustering algorithms on our own cell phone data. The data presented here is of a year’s worth of my own cell phone calls.

K-Means Clustering

To gather our data, we used a program called iExplorer, which can generate a text file containing an iPhone’s call history. We then transformed and analyzed this data using the statistical analysis program, R. R allowed us to import a data clustering algorithm of a particular type, called K-means. This algorithm computes the ideal number of centroids (cluster centers) for given data, and assigns each data point to the closest cluster. We ran the K-means algorithm to cluster my phone contacts according to three different attributes – frequency of calls, average duration of conversations, and total time spent talking to each contact.

The Data

Fig 1. Phone contacts clustered according to frequency of calls made from one year. X-axis shows ID numbers for each phone contact in my phone. Y-axis shows numbers of calls with that contact in a year. Cluster centroids are represented as dashed lines. Note that 6 clusters were formed.

Fig 2. Phone contacts clustered according to average duration of calls from one year. Y-axis shows average duration of calls, in seconds. Note that 6 clusters were formed.

Fig 3. Phone contacts clustered according to total communication time over one year. X-axis shows ID numbers for each phone contact in my phone. Y-axis shows total time spent talking to a contact, in seconds (i.e. frequency x duration). Note that 8 clusters were formed.

Observations and Future Work

The first two visualizations of the data paint an elegant picture; 6 groups of people clustered according to call frequency and duration, with each stratification representing a hypothetical difference in relationship “closeness”. However, it is also clear that these graphs don’t tell the whole story – people who called me more often did not necessarily speak to him for longer on average, and vice versa. With the 3rd graph, a measurement of total time spent talking, 8 clusters are seen, with the first cluster holding the person that I talked to the most and for the longest that year. Regardless of which method captures relationship strength most accurately, all 3 graphs show us that mobile phone call history can be used to visualize a person’s social network.

A key attribute of using data clustering to measure social networks is the objectivity of the resulting data. Our previous study on the social networks of stroke patients relied on interpersonal surveys, which are able to capture the complex relationships between nodes in each person’s network. However, those surveys are also limited by their small size and subjectivity. Networks of mobile phone calls are a counterpart to survey data that provide us with an objective, socio-behavioral measurement of patient relationships on a broad scale. Our lab is eagerly working towards implementing a study on the mobile networks of stroke patients, that will hopefully allow us to visualize changes in their networks over time. Stay tuned on our website for how this project turns out!