Monday 1 December 2008

Datamining the streets

This site is a documentary. Normally: photos and commentary, but today here's something different: 6 months worth of scanned Bluetooth devices from somewhere near the Highbury Vaults. This is the "Highbury Dataset". Which is legal to collect, provided you don't do things like take photographs at the same time as logging the phone "MAC addresses"; numbers that are phon- specific and not related to the SIM card or phone number.

The Bath Experiment implied that one adult in 6 had a phone that could be logged -a Bluetooth phone set to "allow others to see my phone", a phone set to be discoverable. That would be interesting if true, as there were just under 30K phones sightings, which maps to 180K adult pedestrians. Regardless of that, the relative numbers provide insight of their own.

Here's the spread over the week.

Sunday, least popular. Tuesday to Friday most. Less on a Monday. Assuming that a big weekday group is the commuter and students, there could also be a big evening group, in which case less people are going out on a Monday. Or less students go to lectures on an Monday.

More interestingly: breakdown by hour of day

Big rush after 0900 - students again? And look at that evening rush, peak visitors between 1600 and 1700, with a gradual drop off.
Breaking down by half hour slots would show more, then there is flagging which phones have already been seen the same day; this would let you identify who was heading back.

For the paranoid, this experiment is no longer live. Disable bluetooth to avoid participating in similar experiments. And do not feel fear -not from this. Central goverment's plans to track every phone call and email is far more invasive. This project, "Georgian", is a community police state. It also means that Bristol Traffic is rapidly becoming the holder of more accurate data than anyone else. We know who you are and where you are going -stop it.

3 comments:

Chris Hutt said...

It seems to me that the validity of any conclusions drawn from such data depends on the relationship between number of (switched-on) blue-tooth devices and the number of people, which is probably highly variable.

For example those passing late at night and in the early hours are more likely to be young adults and therefore perhaps more likely to have blue-tooth devices than those passing during daylight hours.

I suspect therefore the graph overstates the proportion of people passing during the night compared to the day. There may well be other less obvious factors creating a non-linear relationship between number of blue-tooth devices and people.

Nevertheless this is a very interesting experiment which I'm sure will attract more attention in due course. Perhaps you should be patenting the idea?

SteveL said...

Yes, it is very demographics-dependent. If you look at the bath experiments they measured numbers at a pub and on a street, the latter comparing the number of phones vs number of people. I could have tried this with a camera on the laptop, but then data protection rules engage.

another factor is the devices are detected if they pass through the scan bubble slowly enough to get picked up; scanning takes about 30s. Anyone walking in a hurry runs a chance of being missed; anyone on the other side of the road will probably be skipped too. If they walk down one side and up the other, they won't get picked up in both directions.

Anonymous said...

"... community police state ..."

You should get a grant for that from BCC. Then maybe you could also administer "community punishment" to some of the 4-wheeled offenders shown on this blog.