SumTime

From NLG Systems Wiki

Jump to: navigation, search

[edit] Summary

Description

Our goal is to develop technology for producing English summary descriptions of a time-series data set. Currently there are many visualisation tools for time-series data, but techniques for producing textual descriptions of time-series data are much less developed. ...

From a natural-language generation perspective, we expect to focus on lexicalisation and user-modelling. For example, we need to write rules which determine when a particular data set can be described as "rising"; prototype or exemplar models of word meaning may be useful here. We also need to define user models which allow us to decide what information should be communicated in a space-limited text, given the huge number of things that could be said about a particular data set. This should be a very exciting project for anyone interested in the challenge of generating good English texts from non-linguistic input data.

To take a concrete example, consider the following extract from a (real-life) weather report:

Saturday will be yet another generally dull day with early morning mist or fog and mainly cloudy skies being prevalent. There will be the odd bright spell here and there, but it will feel rather damp with patches of mainly light rain to be found across many parts, especially the west and south.

This was produced by a human forecaster from an underlying data set (generated by numerical modelling and prediction techniques) which gave predictions for meteorological parameters such as precipitation, temperature, wind speed, and cloud cover at various altitudes, at regular intervals for various points in the area of interest. The underlined phrases are some of the places where the human forecaster has used short textual summaries to describe this complex data. Our goal is to be able to perform similar types of summarisation in a computer system. That is, for example, to take raw data on cloud cover and solar flux over course of a day, and from this produce a phrase such as odd bright spell here and there. This involves research in both time-series analysis (to detect patterns and phenomena in the data) and natural-language generation (to decide how to best describe these patterns and phenomena in language).

References

  1. Sripada, S. G., Reiter, E., Hunter, J., Yu, J., & Davy, I. P. (2001). Modelling the Task of Summarising Time Series Data using KA Techniques. Paper presented at Applications and Innovations in Intelligent Systems IX: Proceedings of the International Conference on Knowledge Based Systems and Applied Artifical Intelligence (ES2001), Cambridge. Bib
  2. Sripada, S. G., Reiter, E., Hunter, J., & Yu, J. (2001). SumTime: Observations from KA for Weather Domain (Technical ReportNo. AUCS/TR0102). Computing Science Department, University of Aberdeen. Bib
  3. Yu, J., Hunter, J., Reiter, E., & Sripada, S. (2002). Recognising Visual Patterns to Communicate Gas Turbine Time-Series Data. Paper presented at Proceedings of The Twenty-second SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES2002), Cambridge, U.K. Bib
  4. Yu, J., Hunter, J., Reiter, E., & Sripada, S. (2001). An approach to generating summaries of time series data in the gas turbine domain. Paper presented at Proceedings of IEEE International Conference on Info-tech & Info-net (ICII2001), Beijing. Bib
  5. Reiter, E., & Sripada, S. G. (2002). Should Corpora Texts be Gold Standards for NLG? Paper presented at Proceedings of the International Natural Language Generation Conference 2002 (INLG2002). Bib
  6. Sripada, S. G., Reiter, E., Hunter, J., & Yu, J. (2002). SUMTIME-METEO: Parallel Corpus of Naturally Occurring Forecast Texts and Weather Data (Technical ReportNo. AUCS/TR0201). Computing Science Department, University of Aberdeen. Bib
  7. Reiter, E., & Sripada, S. G. (forthcoming). Human Variation and Lexical Choice. Computational Linguistics. Bib
  8. Sripada, S. G., Reiter, E., Hunter, J., & Yu, J. (2002). Segmenting Time Series for Weather Forecasting. Paper presented at Proceedings of The Twenty-second SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES2002), Cambridge, U.K. Bib
  9. Somayajulu, S., Reiter, E., Hunter, J., & Yu, J. (2001). A Two-Stage Model for Content Determination. Paper presented at Proceedings of ENLGW-2001. Bib
  10. batemanzock.bib/reitersripada-2004-inlg: Cite error 9; <BIBCITE_ERROR_9> Bib
  11. Reiter, E. (2007). An architecture for data-to-text systems. Paper presented at Proceedings of the 11th European Workshop on Natural Language Generation. Bib
Facts about SumTimeRDF feed
DescriptionEnglish summary descriptions of time-series data sets  +
Domainweather  +, and gas turbines  +
LanguageEnglish  +
NameSumTime  +
Started2001  +
URLhttp://www.csd.abdn.ac.uk/research/sumtime  +
WorkerReiter  +, Sripada  +, Yu  +, and Hunter  +
Personal tools