Thursday, 9 April 2009

Datamining the Traffic

Here's a list of search terms that have brought visitors to this site. These are what userstyped in to their favourite search engine before ending up visiting us at Bristol Traffic. The main ones are fairly predictable, and we are also slowly taking over key street names -these have been omitted. Some of the others are more entertaining.

People who did something they shouldn't
crash while reversing in one way street
parking on a pedestrian crossing
parking on school keep clear sign penalty points
i scraped a car that was parked on double yellow line on a private college campus
parking tickets on pavement near double yellow line
reversing the wrong way up a one way street
reversing in a one-way street legal uk?
reversing car driveway over pedestrian path dropped kerb right of way uk
rta while drunk
Penalty for parking in disabled space, bristol
It's important to remember that browser histories and server logs have both been used as evidence in crime investigations. If you think you have done something you shouldn't, don't tell Yahoo! or Google what you have done, as it may be used as evidence against you.

Crime Questions
where do stolen scooters go?
where to buy stolen motorbikes in bristol
These two may be related.

More unusual Questions
why is oceania always at war
any disasters experiment in physics
We have no answers here.

bristol massage parlour
bristol massage parlours
avonmouth massage parlour
The thought of going to Avonmouth to visit a massage parlour may seem pretty odd, but anyone who has ever had to take a small child down to a birthday party at LaserQuest will know there is a limited amount of ways to entertain yourself in Avonmouth Village for 90 minutes. A massage parlour is probably preferable to the Costco restaurant.

On the topic of datamining, can we draw people's attention to a paper and presentation from Cambridge University. It is a bit low-level, as the fact that solving graph certain graph problems is NP-complete is only of relevance to people who know what NP-completeness is and why it is harder than NP-hard. What is important is that by using the facebook developer APIs, and TOR-relayed spidering of the web pages generated for all users unless they opt out, it is possible to build up the complete graph of who-knows-who. You can even start to infer locality, ideological preferences based on which groups they belong to. We aren't yet using these algorithms in the Bristol Traffic project, but it is only a matter of time. Once we know who you are, and how you park, we can start to determined your friends, your interests, and then go on to model you, them, and yours and theirs behaviour. Similarly, there is a new Google paper, The Unreasonable Effectiveness of Data, which argues that sufficiently large datasets, with sufficiently aggressive data mining algorithms are a pretty good substitute for all the machine reasoning/artificial intelligence work of the past. When applied to the Bristol Traffic dataset it implies that we will eventually be in a position to have the machines reliably predict where the congestion, parking, dangerous driving issues will be, without involving people in the analysis. Admittedly, we have a reasonable idea of some of the answers (near any school, at dropoff and pickup times), but the idea of automating all this has appeal.

1 comment:

Forest Pines said...

Moreover, if you can graph a social network, you can then pretend to be a member of it