SODA11_030_billep.pdf (1.19 MB)
Download file

Random Access to Grammar-Compressed Strings

Download (1.19 MB)
conference contribution
posted on 11.07.2013, 12:05 by Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, Oren Weimann
Let S be a string of length N compressed into a context-free grammar S of size n. We present two representations of S achieving O(logN) random access time, and either O(n · α[subscript k](n)) construction time and space on the pointer machine model, or 0(n) construction time and space on the RAM. Here, α[subscript k](n) is the inverse of the k[superscript th] row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P|k,k[superscript 4] + |P|}+logN)+occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.

History

Citation

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, 2011, pp. 373-389

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Computer Science

Source

The Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2011), San Francisco, California

Version

VoR (Version of Record)

Published in

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms

Publisher

Society for Industrial and Applied Mathematics (SIAM)

isbn

0898719933;9780898719932

Copyright date

2011

Available date

11/07/2013

Publisher version

https://www.siam.org/proceedings/soda/2011/soda11.php

Temporal coverage: start date

23/01/2011

Temporal coverage: end date

25/01/2011

Language

en

Usage metrics

Categories

Keywords

Exports