From 1d8798c746ff321e463e75e17d96b2d9ce47ba04 Mon Sep 17 00:00:00 2001 From: nekosoffy <161766793+nekosoffy@users.noreply.github.com> Date: Sun, 1 Dec 2024 16:24:15 -0300 Subject: [PATCH] Hash Map Lesson: Make use of terms more consistent --- .../hash_map_data_structure.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/javascript/computer_science/hash_map_data_structure.md b/javascript/computer_science/hash_map_data_structure.md index 3578fb72bb..2a81773451 100644 --- a/javascript/computer_science/hash_map_data_structure.md +++ b/javascript/computer_science/hash_map_data_structure.md @@ -86,7 +86,7 @@ You might be thinking, wouldn't it just be better to save the whole name as a ha ### Buckets -Buckets are storage that we need to store our elements. Simply, it's an array. For a specific key, we decide which bucket to use for storage through our hash function. The hash function returns a number that serves as the index of the array at which we store this specific key value pair. Let's say we wanted to store a person's full name as a key "Fred" with a value of "Smith": +Buckets are storage that we need to store our elements. We can consider each index of an array to have a bucket. For a specific key, we decide which bucket to use for storage through our hash function. The hash function returns a number that serves as the index of the array at which we store this specific key value pair. Let's say we wanted to store a person's full name as a key "Fred" with a value of "Smith": 1. Pass "Fred" into the hash function to get the hash code which is `385`. 1. Find the bucket at index `385`. @@ -98,20 +98,20 @@ This is an oversimplified explanation; we'll discuss more internal mechanics lat Now if we wanted to get a value using a key: -1. To retrieve the value, we hash the key and calculate its bucket number. +1. To retrieve the value, we hash the key and calculate the index of its bucket. 1. If the bucket is not empty, then we go to that bucket. 1. Now we compare if the node's key is the same key that was used for the retrieval. 1. If it is, then we can return the node's value. Otherwise, we return `null`. Maybe you are wondering, why are we comparing the keys if we already found the index of that bucket? Remember, a hash code is just the location. Different keys might generate the same hash code. We need to make sure the key is the same by comparing both keys that are inside the bucket. -This is it, making this will result in a hash table with `has`, `set` and `get`. +This is it, making this will result in a hash map with `has`, `set` and `get`. #### Insertion order is not maintained A hash map does not guarantee insertion order when you iterate over it. The translation of hash codes to indexes does not follow a linear progression from the first to the last index. Instead, it is more unpredictable, irrespective of the order in which items are inserted. That means if you are to retrieve the array of keys and values to iterate over them, then they will not be in order of when you inserted them. -Some libraries implement hash tables with insertion order in mind such as JavaScript's own `Map`. For the coming project however we will be implementing an unordered hash table. +Some libraries implement hash maps with insertion order in mind such as JavaScript's own `Map`. For the coming project however we will be implementing an unordered hash map. Example: if we insert the values `Mao`, `Zach`, `Xari` in this order, we may get back `["Zach", "Mao", "Xari"]` when we call an iterator. If iterating over a hash map frequently is your goal, then this data structure is not the right choice for the job, a simple array would be better. @@ -152,9 +152,9 @@ Up until now, our hash map is a one-dimensional data structure. What if each `No You probably understand by this point why we must write a good hashing function which eliminates as many collisions as possible. Most likely you will not be writing your own hash functions, as most languages have it built in, but understanding how hash functions work is important. -### Growth of a hash table +### Growth of a hash map -Let's talk about the growth of our buckets. We don't have infinite memory, so we can't have infinite number of buckets. We need to start somewhere, but starting too big is also a waste of memory if we're only going to have a hash map with a single value in it. So to deal with this issue, we should start with a small array for our buckets. We'll use an array of size `16`. +Let's talk about our number of buckets. We don't have infinite memory, so we can't have an infinite amount of them. We need to start somewhere, but starting too big is also a waste of memory if we're only going to have a hash map with a single value in it. So to deal with this issue, we should start with a small array for our buckets. We'll use an array of size `16`.
@@ -168,17 +168,17 @@ For example, if we are to find the bucket where the value `"Manon"` will land, t As we continue to add nodes into our buckets, collisions get more and more likely. Eventually, however, there will be more nodes than there are buckets, which guarantees a collision (check the additional resources section for an explanation of this fact if you're curious). -Remember we don't want collisions. In a perfect world each bucket will either have 0 or 1 node only, so we grow our buckets to have more chance that our nodes will spread and not stack up in the same buckets. To grow our buckets, we create a new buckets list that is double the size of the old buckets list, then we copy all nodes over to the new buckets. +Remember we don't want collisions. In a perfect world, each bucket will either have 0 or 1 node only, so we grow our buckets array to have more chance that our nodes will spread and not stack up in the same buckets. To grow our array, we create a new one that is double its size and then copy all existing nodes over to the buckets of this new array, hashing their keys again. -#### When do we know that it's time to grow our buckets size? +#### When do we know that it's time to grow our buckets array? To deal with this, our hash map class needs to keep track of two new fields, the `capacity` and the `load factor`. - The `capacity` is the total number of buckets we currently have. -- The `load factor` is a number that we assign our hash map to at the start. It's the factor that will determine when it is a good time to grow our buckets. Hash map implementations across various languages use a load factor between `0.75` and `1`. +- The `load factor` is a number that we assign our hash map to at the start. It's the factor that will determine when it is a good time to grow our buckets array. Hash map implementations across various languages use a load factor between `0.75` and `1`. -The product of these two numbers gives us a number, and we know it's time to grow when there are more entries in the hash map than that number. For example, if there are `16` buckets, and the load factor is `0.8`, then we need to grow the buckets when there are more than `16 * 0.8 = 12.8` entries - which happens on the 13th entry. Setting it too low will consume too much memory by having too many empty buckets, while setting it too high will allow our buckets to have many collisions before we grow them. +The product of these two numbers gives us a number, and we know it's time to grow when there are more entries in the hash map than that number. For example, if there are `16` buckets, and the load factor is `0.8`, then we need to grow the buckets array when there are more than `16 * 0.8 = 12.8` entries - which happens on the 13th entry. Setting it too low will consume too much memory by having too many empty buckets, while setting it too high will allow our buckets to have many collisions before we resize the array. ### Computation complexity @@ -208,7 +208,7 @@ The following questions are an opportunity to reflect on key topics in this less - [What does it mean to hash?](#what-is-a-hash-code) - [What are buckets?](#buckets) - [What is a collision?](#collisions) -- [When is it a good time to grow our table?](#when-do-we-know-that-its-time-to-grow-our-buckets-size) +- [When is it a good time to grow our buckets array?](#when-do-we-know-that-its-time-to-grow-our-buckets-array) ### Additional resources