Stephen W Draper and Mark D Dunlop
The field of information retrieval (IR) traditionally addressed the problem of retrieving text documents from large collections by full-text indexing of words. It has always been characterised by a strong focus on evaluation to compare the performance of alternative designs. The emergence into widespread use both of multimedia and of interactive user interfaces has extensive implications for this field and the evaluation methods on which it depends. This paper discusses what we currently understand about those implications. The "system" being measured must be expanded to include the human users, whose behaviour has a large effect on overall retrieval success, which now depends upon sessions of many retrieval cycles, rather than a single transaction. Multimedia raise issues not only of how users might specify a query in the same medium (e.g. sketch the kind of picture they want), but of cross-medium retrieval. Current explorations in IR evaluation show diversity along at least two dimensions. One is that between comprehensive models that have a place for every possible relevant factor, and lightweight methods. The other is that between highly standardised workbench tests avoiding human users vs. workplace studies.