How to identify similar images using hashing

Hi πŸ‘‹,

In this article I would like to talk about image hashing.

Image hashing algorithms are specialized hashing functions that output the hash of an image based on the image’s properties. Duplicate images output the same hash value and visually identical images output a hash value that is slightly different.

To simplify

hash("white_cat") = "aaaa"
hash("brown_cat") = "aaba"
hash("car") = "xkjwe"

Some use cases for image hashing are:

  • Duplicate Image Detection
  • Anti-Impersonation / Image Stealing
  • Image filtering
  • Reverse image search

Let’s play around with image hashing techniques using Python and the ImageHash library. Install the library with:

pip install imagehash
pip install six

To obtain some sample images I’ve used Pexels and searched for words like “white cat”, “firetruck”.

Here’s the images that I’m using: cat1, cat2, cat3 and firetruck1.

I’m going to import the necessary stuff and add a function that converts the hexadecimal string given by image hash to an integer.

from PIL import Image
import imagehash


def hash_to_int(img_hash: imagehash.ImageHash):
    return int(str(img_hash), 16)

The reason for the hash_to_int function is that is much easier to do computations using integers rather than strings, in the future if we’re going to build a service that makes use of the image hashing and computes hamming distances, we can store the int hashes in an OLAP database such as ClickHouse and use bitHammingDistance to compute the Hamming Distance.

The next snippet of code opens the images, computes the average and color hashes and for every image in the dataset it computes the hamming distance between the average hash summed with the hamming distance of the color hash.

The lower the hamming distance the more similar the images. A hamming distane of 0 means the images are equal.

def main():
    images = [
        Image.open("cat1.jpg"),
        Image.open("cat2.jpg"),
        Image.open("cat3.jpg"),
        Image.open("firetruck1.jpg")
    ]

    average_hashes = [hash_to_int(imagehash.average_hash(image)) for image in images]
    color_hashes = [hash_to_int(imagehash.colorhash(image)) for image in images]

    image_hashes = list(zip(images, average_hashes, color_hashes))

    source = image_hashes[0]

    for image in image_hashes:
        hamming_average_hash = bin(source[1] ^ image[1]).count("1")
        hamming_color_hash = bin(source[2] ^ image[2]).count("1")
        hamming_distance = hamming_average_hash + hamming_color_hash
        print("Hamming Distance between", source[0].filename, "and", image[0].filename, "is", hamming_distance)


if __name__ == '__main__':
    main()

To compute the hamming distance, you’ll need to XOR the two integers and then count the number of 1 bits bin(source[1] ^ image[1]).count("1"). That’s it.

If the run the program with the source variable set to cat1.jpg, source = image_hashes[0], we get the following result:

Hamming Distance between cat1.jpg and cat1.jpg is 0
Hamming Distance between cat1.jpg and cat2.jpg is 36
Hamming Distance between cat1.jpg and cat3.jpg is 39
Hamming Distance between cat1.jpg and firetruck1.jpg is 33

If we look at our dataset the first image cat1 is somewhat visually similar to the image of the firetruck.

If we run the program with the source variable set to cat2.jpg we can see that cat2 is similar to cat3 since both images contain white cats.

Hamming Distance between cat2.jpg and cat1.jpg is 36
Hamming Distance between cat2.jpg and cat2.jpg is 0
Hamming Distance between cat2.jpg and cat3.jpg is 23
Hamming Distance between cat2.jpg and firetruck1.jpg is 47

Conclusion

We used a Python image hashing library to compute the average and color hash of some images and then we determined which images are similar to each other by computing the hamming distance of the hashes.

Thanks for reading and build something fun! πŸ”¨

References

Full Code

"""
pip install imagehash
pip install six
"""
from PIL import Image
import imagehash


def hash_to_int(img_hash: imagehash.ImageHash):
    return int(str(img_hash), 16)


def main():
    images = [
        Image.open("cat1.jpg"),
        Image.open("cat2.jpg"),
        Image.open("cat3.jpg"),
        Image.open("firetruck1.jpg")
    ]

    average_hashes = [hash_to_int(imagehash.average_hash(image)) for image in images]
    color_hashes = [hash_to_int(imagehash.colorhash(image)) for image in images]

    image_hashes = list(zip(images, average_hashes, color_hashes))

    source = image_hashes[0]

    for image in image_hashes:
        hamming_average_hash = bin(source[1] ^ image[1]).count("1")
        hamming_color_hash = bin(source[2] ^ image[2]).count("1")
        hamming_distance = hamming_average_hash + hamming_color_hash
        print("Hamming Distance between", source[0].filename, "and", image[0].filename, "is", hamming_distance)


if __name__ == '__main__':
    main()

Multiple Python versions on Windows

Hi πŸ‘‹

In this short article I will show you two ways of changing Python versions on Windows. It is useful when you have installed multiple Python versions on your system and want to run a specific version from the terminal.

For example, if we have the following versions installed:

We can use either the Python Launcher py to run Python or the python command.

Python Launcher

To list installed Python versions with Python launcher we can use the py -0 command.

@nutiu ➜ ~ py -0
Installed Pythons found by C:\WINDOWS\py.exe Launcher for Windows
 -3.10-64 *
 -3.7-64

@nutiu ➜ ~ py
Python 3.10.3 (tags/v3.10.3:a342a49, Mar 16 2022, 13:07:40) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

The default version has a star next to it. If we run a simple py command, we’ll get a prompt to Python 3.10. To change the default version all we need to do is to set the environment variable PY_PYTHON to the desired version.

@nutiu ➜ ~ $env:PY_PYTHON = "3.7"
@nutiu ➜ ~ py -0
Installed Pythons found by C:\WINDOWS\py.exe Launcher for Windows
 -3.10-64
 -3.7-64 *
@nutiu ➜ ~ py
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Using the Python command

If you prefer running Python using the full command then you’ll get the Python version which has higher precedence in your path, for example if I run python on my machine I will get:

@nutiu ➜ ~ python
Python 3.10.3 (tags/v3.10.3:a342a49, Mar 16 2022, 13:07:40) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

We can change the order by going to: My PC -> Advanced System Settings -> Environment Variables

Select path from User variables and click Edit…

Python 3.10 has higher precedence in path because it is above Python 3.7. If we want to change the order, we need to select the folders referencing Python37 and click Move Up until they are above Python 3.10

Restarting your terminal and running python again should run your desired Python version.

Thanks for reading! 🍻

A custom HomeKit accessory with Python

Hi πŸ‘‹,

In this short article I want to showcase how I implemented a custom HomeKit accessory with python.

My Home Assistant’s SD card died πŸͺ¦ a few days ago and the support for GPIO based sensors will be removed in newer releases. This makes it unsuitable for my needs, while giving me the perfect opportunity to try other things.

To continue monitoring temperature and humidity in my home I’ve built a custom HomeKit accessory with HAP Python.

The Sensor

A BME680 air quality sensor is used to monitor temperature and humidity. It is connected to the PI according to the following diagram:

The communication with the Pi is done using the I2C protocol. If you want to use I2C in your own setup, it has to be enabled using raspi-config, as it doesn’t come enabled by default.

# Execute
sudo raspi-config
# Then select Interfacing options->I2C and enable it.

Connection can be tested with the following command:

sudo apt-get install build-essential libi2c-dev i2c-tools python-dev libffi-dev git
/usr/sbin/i2cdetect -y 1
pi@raspberrypi:~ $ i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- 76 -- 

It will output the address that the sensor is using, in our case the 0x76 I2C address.

The Code for the Accessory

You can browse the full code for the accessory and bme680 sensor in my git repo.

To run the program, clone the repository and ensure that you’re running it under the pi user, otherwise you will need to change some things.

cd /home/pi && git clone git@github.com:dnutiu/bme680-homekit.git && cd bme680-homekit
sudo apt-get install libavahi-compat-libdnssd-dev
pip3 install -r requirements.txt

Verify that the program works by running python3 main.py. Running it the first time will prompt you to add the accessory to the Home app. If you miss this step you can repeat it by deleting the accessory.state file located in pi’s home directory and by running the program again.

After you’ve verified that it works, you can setup a systemd service to run the accessory’s python script when the PI boots

Copy the bme680-homekit.service to /etc/systemd/system and check that the service is running.

sudo cp bme680-homekit.service /etc/systemd/system
sudo systemctl status bme680-homekit

If you want to run this under another user rather than the pi, you’ll need to tweak the bme680-homekit.service file.

Congratulations for making it this far! πŸŽ‰

You can browse more code examples in the HAP-Python repository.

Thanks for reading and have fun! πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» βš™οΈ

How to install aΒ specific Python version on Linux

Hello, πŸ‘‹

In this article I will show you how to install Python versions on Linux using the following methods: compiling from source, dead snakes ppa and pyenv.

To make things easier, if you want to follow along in an environment that you can break, you can create a local Kubernetes cluster using Minikube.

Next, I’m going to use the following yaml file to create an Ubuntu pod:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
  labels:
    app: ubuntu
spec:
  containers:
  - image: ubuntu
    command:
      - "sleep"
      - "604800"
    imagePullPolicy: IfNotPresent
    name: ubuntu
  restartPolicy: Always

Save the above yaml in a file ubuntu_pod.yaml and run:

kubectl apply -f ./ubuntu_pod.yaml

To get a shell on the Ubuntu pod, run:

kubectl exec -it ubuntu -- /bin/bash

To start from scratch, simply delete the pod with kubectl delete pod/ubuntu and then recreate it.

Compiling Python from source

Before compiling Python, you will need to setup the build environment, thankfully, it is straightforward.

Pyenv has great instructions on it: https://github.com/pyenv/pyenv/wiki#suggested-build-environment.

On Ubuntu, to build Python, install the following packages:

apt-get update; apt-get install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

Then, search the desired python version here and, for example to install Python 3.9, run:

wget https://www.python.org/ftp/python/3.9.9/Python-3.9.9.tgz
tar -xzf Python-3.9.9.tgz
cd Python-3.9.9

Then, run configure:

./configure --enable-optimizations

And finally run make install if you want to replace the default Python installation or make altinstall to install python under the binary name of python3.9

make altinstall

To test the installation run:

python3.9 --version
Python 3.9.9

pip3.9 --version
pip 21.2.4 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)

Installing Python via a third party PPA deadsnakes

To install Python using the deadsnakes ppa run:

apt-get update
apt-get install software-properties-common
add-apt-repository ppa:deadsnakes/ppa
apt-get update
apt install python3.9 python3-pip

Then, to test the installation run:

root@ubuntu:/# python3.9 --version
Python 3.9.10

root@ubuntu:/# python3.9 -m pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.9)

Installing Python via Pyenv

I already written an article on how to install Python using Pyenv, check it out if you wish.

https://nuculabs.wordpress.com//2020/06/27/pyenv-for-linux-users/

Thanks for reading! πŸ“š