ProgrammingBeginners Guide to Hash Tables and Complexity
Posted:
ProgrammingBeginners Guide to Hash Tables and ComplexityPosted:
Status: Offline
Joined: Nov 05, 201311Year Member
Posts: 2,753
Reputation Power: 452
Status: Offline
Joined: Nov 05, 201311Year Member
Posts: 2,753
Reputation Power: 452
Keep reading if you want to know how to create a hash table and why they are efficient data types.
First things first. Why are hash tables important? Hash tables are essentially the same thing as declaring another variable. Sure variables are the best solution but what if you have an unknown amount of variables? Maybe your first thought is to store the values in an array, but that is not the best choice. Why are arrays bad for multiple values? Arrays are fine for small chunks of data. However, if you need to constantly filter/search through the array, then your program will be slow. Searching through an array is essentially trial and error until the correct item is found. Very slow. In big O notation, this is O(n) To understand the performance difference of hash tables, you need to know Big O notation. Loops, nested loops, and recursion all increase complexity. The goal is to be as close to O(1) as possible.
https://nedbatchelder.com/text/bigo_pix/011.png How does a hash table work? A hash table works by storing values in an array with a fixed size. Instead of appending the array, the index is calculated using a hash function. Essentially, this means the value will be stored randomly. When it's time the get the value, the index is recomputed and retrieved from the array. What is the complexity of a hash table? Hash tables have a complexity of O(1) except when there is a collision. Then the complexity is O(n). This is why collisions should be avoided or reduced. Just show me the code: The first step is to create a hashing algorithm. Make sure your hashing algorithm does not add complexity. (I'm increasing complexity for the sake of making the code easier to understand). A salt is used to increase entropy. It is best to pick prime numbers as a salt because 0 and multiples of two are more likely to cause collisions. The hash function can be anything that produces a pseudorandom number. I chose to multiply the decimal values of each character in the string. For more efficient algorithms refer to this StackOverflow post.
Next, we need a compression algorithm. Since the hash has a fixed size, we need a function that takes the big integer from the hash and produces another pseudorandom number that is smaller than the length of the array. Otherwise, we'll cause an overflow error. We'll use a common compression algorithm found in other languages. This is just math for now. Remember do not to add complexity!
Creating the hash table class Since the compression requires some random values, its best to calculate those values as little as possible. So, we'll calculate it when a hash is made. Depending on the scenario it may be better to define these values as constants.
Storing values and handling collisions Collisions will happen. To handle them we will store all duplicates and we store the key and value in an array. Remember, using these arrays barely affects the complexity because we are reducing collisions. Most likely there will only be a few items in the array in the worst-case scenario O(n). If there is only one item in the array then the complexity is still O(1).
Getting a value: Now use the lookup to get all the collisions from a hash. When we loop over the collisions we return the value if the key matches. Break to reduce complexity.
Removing a key: Removing a key is like getting a key. We search through the possible collisions and delete the key/value pair from the collision array if the key matches.
All together now. Here is all the code and how to use it.
Additional notes Remember hash tables can have a complexity of O(1) but after more data is added it has a higher chance of colliding which creates a complexity of O(n).
Since the hash is compressed based on the size of the internal array, then to reduce collisions you should increase the internal array size. Ideally, you should use a size that is 1.3 times the number of keys you plan to have. |
The following 1 user thanked CriticaI for this useful post:
FOV (12-19-2020)
#2. Posted:
Status: Offline
Joined: Aug 31, 201113Year Member
Posts: 1,883
Reputation Power: 8098
Motto: The extent of the observable world that is seen at any given moment.
Motto: The extent of the observable world that is seen at any given moment.
Status: Offline
Joined: Aug 31, 201113Year Member
Posts: 1,883
Reputation Power: 8098
Motto: The extent of the observable world that is seen at any given moment.
This is very informative. Thanks for the guide! |
- 0useful
- 0not useful
Users browsing this topic: None