刚刚和 Leon 讨论了一下,发现我们对题目的了解有一点不同
原题目:
Given a document and a query of K words, how do u find the smallest window
that covers all the words at least once in that document? (given you know the
inverted lists of all K words, that is, for each word, you have a list of all
its occurrrences). This one is really hard. Could someone propose an
algorithm in O(n)?
我们可以确定:
1. document is given
2. 知道我们关心的是哪 K 个字
3. 这 K 个字的 occurrence list 也是给定的
问题是,n 指的是什么?
1. n = document size (in number of words)
在这种情况下, O(n) 是可能的,方法基本上就是 Leon 提出来的。
补充一个 O(document size) 的做法
// all indices starts from 1
Given occ (list of list of occurrences)
array = [-1, ..., -1] (size = document size)
i = 0
for list in occ
i = i + 1
for index in list
array[index] = i
count = [0, 0, ..., 0] (size = K)
nKinds = 0
start = 1
end = 1
minWindowSize = INF
while end <= document size
while end <= document size and nKinds < K
word = array[end]
end = end + 1
if word == -1 // we are not interested
continue
count[word] = count[word] + 1
if count[word] == 1
nKinds = nKinds + 1
while nKinds == K
word = array[start]
if word == -1 // we are not interested
start = start + 1
continue
count[word] = count[word] - 1
if count[word] == 0 // check this window
nKinds = nKinds - 1
if end - start < minWindowSize
minWindowSize = end - start
start = start + 1
return minWindowSizejk
2. n = 这 K 个字出现的总次数
Ex:
K = 3, 要找出包含 {a, b, c} 的最小 window
occurrences:
a = { 1, 1000, 2500}
b = {400, 1001, 2000}
c = {500, 1002, 1500}
document size = 2500 甚至更高 (其他的位置是其他的字)
n = 3 + 3 + 3 = 9 而不是 2500
在这种情况下,已知可以做到 O(n logK) (RockLee 写的方法)
不过,在这种状况下,给定原本的 document 似乎就没什么用了
而且在读入 input 时,
读入原本的 document 会花掉 O(document size) 的时间
所以在和 Leon 讨论时,他对这种解释方法有一点怀疑