Tuesday, 30 August 2016

Why HashSet is called Hash Set?

HashSet is one of concrete implementations for Set interface. Internally, it is implemented by a HashMap. So it uses a key-value pair to store elements, i.e. element's hash code and element itself. Please note here: hash code internally will be transferred into an index number by an equation. It is not hashcode directly become a key. So on this case, HashSet may not guarantee the order, and different hashcode might converted into the same index number. On that case, equality measurement will be incurred.

When adding an element into the HashSet collection, the element's hashcode method will be invoked, and hashcode will be used to determine which bucket allocated in the set. As two elements have the same hashcode, then the equals method will be called, in order to determine the uniqueness of the inserted elements. If they are not equal, then they both are added, even they have the same hash number.

The underlying implementing HashSet, is using hash number as its key in the map. This makes value retrieving efficient, meanwhile using equals method to guarantee the set won't have duplicated elements.

So, as using HashSet, you may need to override the equals and hashCode methods.  The following shows an experiment, when hashCode and equals method are not overridden.

public class Book {
    private String name;

    public Book() {
    }

    public Book(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
    
    @Override
    public String toString(){
        return this.name;
    }
    
}

When inserting several books with the same name, what happens? for hashCode method is not overridden, they give different hashcode, and therefor books will be considered different. 



public class HashSetHash {
    
    public static void main(String[] args) {
        
        Set myBooks = new HashSet<>();
        myBooks.add(new Book("Java Generics and Collections"));
        myBooks.add(new Book("Java Generics and Collections"));
        myBooks.add(new Book("Java Generics and Collections"));
        System.out.println(myBooks);
    }
   
}

From output, 3 same books are in the set.
  
run:
[Java Generics and Collections, Java Generics and Collections, Java Generics and Collections]
BUILD SUCCESSFUL (total time: 0 seconds)

If overriding equals method by adding the following code in the Book class

    @Override
    public boolean equals(Object object) {
        if (object instanceof Book) {
            return this.name.equals(((Book) object).getName());
        }
        return false;
    }

Then what is output? the three books should still be added. On this moment, for 3 different Book instances, so they still have different hashcode; for HashSet, it uses hashCode to determine uniqueness first, but not the equals methods. In another, the equals method will be called later, when instances get the same hash code; So we still see the same output.


run:
[Java Generics and Collections, Java Generics and Collections, Java Generics and Collections]
BUILD SUCCESSFUL (total time: 0 seconds)

Now let's overriding the hashCode method, and adding the following code in Book class.  


    @Override
    public int hashCode() {
        int hash = 7;
        hash = 41 * hash + Objects.hashCode(this.name);
        return hash;
    }

It shows that the hash code depends on the book name; so when the same book names, they gives the same hash code. The HashSet will put the first book in the bucket, then invoking equals method to determine rest two are duplicated.  Certainly, the rest elements may not equal, in other cases, then they will be put in bucket following the first.

run:
[Java Generics and Collections]
BUILD SUCCESSFUL (total time: 0 seconds)


No comments:

Can Jackson Deserialize Java Time ZonedDateTime

Yes, but must include JSR310. Thus ZonedDateTime can be deserialized directly from JSON response to POJO field. <dependency> <g...