Mining sequential patterns from probabilistic databases

Muzammal, Muhammad; Raman, Rajeev

pakdd-journal.pdf (466.25 kB)

Mining sequential patterns from probabilistic databases

journal contribution

posted on 2014-06-26, 13:15 authored by Muhammad Muzammal, Rajeev Raman

This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is uncertainty in associating an event with a source, model this kind of uncertainty in the probabilistic database framework and consider the problem of enumerating all sequences whose expected support is sufficiently large. We give an algorithm based on dynamic programming to compute the expected support of a sequential pattern. Next, we propose three algorithms for mining sequential patterns from probabilistic databases. The first two algorithms are based on the candidate generation framework – one each based on a breadth-first (similar to GSP) and a depth-first (similar to SPAM) exploration of the search space. The third one is based on the pattern growth framework (similar to PrefixSpan). We propose optimizations that mitigate the effects of the expensive dynamic programming computation step. We give an empirical evaluation of the probabilistic SPM algorithms and the optimizations, and demonstrate the scalability of the algorithms in terms of CPU time and the memory usage. We also demonstrate the effectiveness of the probabilistic SPM framework in extracting meaningful sequences in the presence of noise.

History

Citation

Knowledge and Information Systems, 2014.

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Computer Science

Version

AM (Accepted Manuscript)

Published in

Knowledge and Information Systems

Publisher

Springer Verlag

issn

0219-1377;0219-3116

Copyright date

2014

Available date

2015-07-24

Publisher DOI

https://doi.org/10.1007/s10115-014-0766-7

Publisher version

http://link.springer.com/article/10.1007/s10115-014-0766-7

Notes

The file associated with this record is embargoed until 12 months after the date of publication. The final published version may be available through the links above.

Language

en

Administrator link

https://leicester.figshare.com/account/articles/10155167

Usage metrics

Keywords

Mining Uncertain Data Sequential PatternMining Probabilistic Databases

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Mining sequential patterns from probabilistic databases

History

Citation

Author affiliation

Version

Published in

Publisher

issn

Copyright date

Available date

Publisher DOI

Publisher version

Notes

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports