Angenommen, wir haben eine Menge und jedes Mitglied von ist ein Daten- und Schlüsselpaar. Wir möchten eine Datenstruktur, die die folgenden Operationen unterstützt:
- Setze in D ein ,
- Löschen Sie das Mitglied (es muss nicht gesucht werden, um e zu finden , z. B. zeigt e auf ein Mitglied in D ).
- MostFrequent, das ein Mitglied zurückgibt, so dass e . k e y eine der häufigsten Tasten ist D (Beachten Sie, dass die häufigste Schlüssel nicht eindeutig zu sein braucht).
Was wäre eine effiziente Implementierung dieser Datenstruktur?
My solution is a heap for the keys and their frequencies prioritized by the frequencies plus a hash table where the hash function maps members with the same key to the same slot in the hash table (with pointers from each part to the other).
This can give for the first two operations and for the third (worst case running time).
I am wondering if there is more efficient solution? (or a simpler solution with the same efficiency?)
Antworten:
In the comparison-based model of computation, you may implement the priority queue using a Fibonacci heap instead of an ordinary heap. This will give you the following bounds:O(1) amortized time for insert and O(logn) amortized time for deletion operations.
If you depart from the comparison-based model and adopt the RAM model where keys are regarded as binary strings, each one contained in one or more machine words, you may implement your priority queue ino(logn) . Indeed, you can achieve for both insert and delete operations O(loglogn−−−−−−−√) and O(1) time for the findMin operation. Thorup proved that
If we can sortn keys in time S(n) per key, then we can implement a priority queue supporting find-min in constant time and updates (insert and delete) in S(n) time.
See M. Thorup. Equivalence between priority queues and sorting, 2002. in Proc. FOCS 2002
Since we can sort inO(nloglogn−−−−−−−√) expected time and linear
space, as shown by
Y. Han and M. Thorup. Integer sorting inO(nloglogn−−−−−−−√) expected time and linear
space. in Proc. FOCS 2002
the bound is proved.
quelle
You can do all of these inO(1) expected amortized time.
The essential trick is that we don't need the full power of a priority queue, since key frequency only changes by 1 during each insert or delete.
My solution below is really just your solution with an "inefficient" priority queue that happens to work well for this case: a max priority queue implemented as a doubly linked lists of buckets of keys has O(1) insertMin, deleteMax, removeFromBucket, and increaseKey.
Maintain a doubly-linked list of Buckets, where each Bucket has a non-empty hash set of keys (that I'll call a Cohort) and a positive integer (that I'll call the ValCount). In a Bucket b, each key k in the Cohort of b has the same number of unique values associated with it in the set you are maintaining. For example, if your set has the pairs (a,apple), (a,avocado), (b,banana), (c,cucumber), (d,dragon fruit) where the single letters are the keys and the fruits are the values, then you would have two Buckets: One Bucket would have a ValCount of 2 and a Cohort consisting only of one key: a. The other Bucket would have a ValCount of 1 and a Cohort consisting of the three keys b, c, and d.
The doubly-linked list of Bucket should be kept ordered by the ValCount. It will be important that we can find the head and the tail of the list inO(1) time and that we can splice in a new Bucket in O(1) time if we know its neighbors.
Unimaginatively, I'll call the list of Buckets the BucketList.
In addition to the BucketList, we'll need a SetMap, which is a hash map mapping keys to ValueBuckets. A ValueBucket is a pair consisting of the ValueSet (a non-empty hash set of values) and a non-null pointer to a Bucket. The ValueSet associated with a key k contains all the unique values associated with k. The Bucket pointer associated with a ValueSet has a Cohort equal to the size of the ValueSet. The Bucket associated with a key k in the SetMap is also associated with the key k in the BucketList.
In C++:
To find a max-frequency key-value pair, we just need to look at the head of the BucketList, find a key in the Cohort, look up that key in the SetMap, and find a value in the ValueSet of its ValueBucket. (phew!)
Inserting and deleting key-value pairs is trickier.
To insert or delete a key-value pair, we first insert or delete it in the SetMap This will change the size of the ValueSet, so we need to modify the Bucket associated with the key. The only Buckets we will need to look at to make this change will be the immediate neighbors of the Bucket the key used to be in. There are several cases here, and they are probably not worth spelling out fully, though I'd be happy to elaborate if you're still having trouble.
quelle
Worst-case complexity
Insert:O(1)
Find-min:O(1)
Decrease-key:O(1)
Delete:O(loglogn)
Space:O(n)
Proof
Which is established with a combination of:
and:
Reference
Thorup, Mikkel. “Integer Priority Queues with Decrease Key in Constant Time and the Single Source Shortest Paths Problem.” In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, 149–158. STOC ’03. New York, NY, USA: ACM, 2003.
quelle