Re: [问题] Google Interview Question (4) stimim PTT批踢踢实业坊

Re: [问题] Google Interview Question (4)

楼主: stimim (qqaa) 2013-03-09 17:19:46

推 scwg:我 (ledia 应该也是) 自动采用 2. 的解释, 因为 inverse 03/09 03:21
→ scwg:list 可以 offline 建好, 之后这个算法可以重复多次 03/09 03:22
我的想法和 scwg 一样，
因为原本的题目其实没有说明的很清楚，以下是我自己脑补的部份：
这个算法想要做什么？有一个猜测是，我觉得他的功能很像是搜寻时的预览功能，
比如说我们用 google search "harry potter spell" 这三个关键字，
我所找到的第一笔结果长这样：
Harry Potter Spell List
www.pojo.com/harrypotter/spelist.shtml - Cached
Harry Potter Spell List. Here I think I got all the spells, charms,
enchantments, curses, jinxes and all the other incantations. If there's
any missing, e-mail us on the ...
另一个比较后面的结果长这样：
HPL: Encyclopedia of Spells - The Harry Potter Lexicon
www.hp-lexicon.org/magic/spells/spells.html - Cached
Dec 27, 2000 –Descriptions of spells and various magical effects in the
Harry Potter universe.
第一行是网页的 title, 第二行是网址，第三行开始则是内文的预览，
那要如何产生一个好的预览呢？
我想这一题的目标：“找出一个最小的 window 来包含所有的关键字”可能就是一种方法
如果我们每次都是用原本的文件来产生这个 window 的话，
（也就是 occurrence lists 要自己产生）
那这题至少是 O(|document size| * O(check word))
O(check word) 指的是检查每个字到底是哪一个关键字所要花的时间
document size 以 word 为单位
突破这个限制的一种方法就是对文件做一些预处理，(所做的事和 indexing 应该差不多)
我们先对文件中的字产生他们的 occurrence list ，
这样的花费也是 O(|document size| * O(check word))
(假设我们有办法储存这些资讯，以供日后使用)
然后，未来有人来搜寻时，我们可以根据事先做好的 occurrence list 来求解
在这种情况下，至少不用再花时间做 check word 。
以长远的角度来看，或许可以省下不少时间。
另一方面，也因为在有 occurrence list 的状况下，
我们已经可以做到 O(|document size|) 了，
所以我会想要试试看能不能做到 O(n) (n = key words 出现的总次数)。
而 Leon 在上一篇中所提到的 show_list 我们其实是没有的，
因为我们在事前不知道使用者会搜寻哪一些字，
我们只能存每个字各自的 occurrence list ，
没办法产生 show_list 。
如果要把 occurrence list 合并成 show_list ，
这应该是没办法在 O(n) 的时间内做到，不然 merge sort 就会是 O(n) 了。

继续阅读

Re: [问题] Google Interview Question (4)Leon Re: [问题] Google Interview Question (4)stimim Re: [问题] Google Interview Question (4)Leon Re: [问题] Google Interview Question (4)Leon Re: [问题] Google Interview Question (4)stimim Re: [问题] Google Interview Question (4)stimim Re: [问题] Google Interview Question (4)Leon Re: [问题] Google Interview Question (4)RockLee Re: [问题] Google Interview Question (4)Leon [问题] Google Interview Question (4)RockLee