We currently have three different versions of N-Screen running:
They all have the same basic design with small tweaks for image size. They interoperate – you can drag and drop between them. The main differences lie in the collection of data for the backends and in the calculation of similarity between videos. For similarity calculation we use three different techniques for the three different datasets, as we had three different sets of data available.
The Redux version was our first experiment in this area. BBC Redux is a BBC research video on-demand testbed. We were lucky enough to be able to obtain anonymised watching data for programmes in a five-month subset of the period it covers. So our first experiment, led by Dan Brickley, was to take that watching data – around 1.2 million observations over 12,000 programmes – and use open source tools to generate similarity indexes. We were able to use a standard function in Mahout, Apache’s machine learning and data mining software, to generate similarity indexes using a Tanimoto Coefficient model. This function essentially uses people as links between programmes (“Bob watched both ‘Cash in the Attic’ and ‘Bargain Hunt'”), and sorts programme pairs according to how many people watched them both. With this dataset, this technique produced some nice examples of clearly related clusters (for example what you might call ‘daytime home-related factual’, see picture below).
It is quite rare to have access to this kind of data about what people have watched. It’s both valuable and private, and may not be readily available. It may not exist, if no-one has watched anything yet. For the TED dataset we therefore took a different approach. TED talks are a diverse set of talks by people prominent in their field, licensed under the Creative Commons BY-NC-ND license. From our point of view, the advantage of using this dataset was that transcripts were available for all talks. To calculate similarity between the talks for N-screen we were therefore able to use a tf-idf algorithm. This technique treats each programme as a document, and finds the most characteristic words for each document within the total corpus of documents, and can be used to match the documents based on the words selected. We were lucky enough to be able to use some Ruby software open sourced by a colleague at the BBC to do this.
This technique produced clearly similar clusters within the 600 video dataset, for example, in this selection, you can clearly see items relating to women and also to drawing and art:
Our third example is an iPlayer version of N-screen. On any given day, there are about 1000 TV and radio programmes available to UK viewers on iPlayer, the BBC’s on-demand service. This is an interesting dataset for us because of its high-quality metadata, including categories, formats, channel and people featured. We were curious as to whether we could generate interestingly similar programme connections using only metadata. Our first approach was to try a Tanimoto similarity over the structured metadata, but the results were not particularly satisfactory – many programmes had no similar items. We then tried tf-idf over the metadata descriptions. This seemed to pick up characteristics of the text rather than of the programmes (for example repeated quirks in phrasing of the descriptions). The best approach we have tried (evaluated only informally) is tf-idf over a combination of metadata and the results of an entity-recognition technique.
We used the existing metadata from /programmes json format (for example http://www.bbc.co.uk/programmes/b00k7pvx.json or http://www.bbc.co.uk/programmes/b015ms3r.json). As you can see from those examples, some have descriptions of people who are in the programme, with mappings to dbpedia where available. We can get more of these by using a service to extract entities from the description text. We used Lupedia, which was developed in the NoTube project by Ontotext for this. We took this data coupled with the channel and the categories to produce a list of keywords, and then ran tf-idf over the top of that. The result can be variable:
but in many cases, reasonably good:
and occasionally throws up an interesting surprise:
The next stage is to evaluate these results formally.
Pingback: Preliminary findings of N-Screen user testing | NoTube