Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate green view index for each image and attach score as attribute to point #8

Closed
danbjoseph opened this issue Feb 5, 2024 · 7 comments

Comments

@danbjoseph
Copy link
Member

danbjoseph commented Feb 5, 2024

from the Treepedia project, the green view index is calculated in GreenView_Calculate.py
see also this paper: li2015.pdf

please see the previous steps as described in the README and follow those conventions for reading data in and out. we want the process to be geo-file agnostic to the extent possible. and modular so that in the future we can add or modify steps, etc.

the geofile use in the readme example can be downloaded here: https://drive.google.com/file/d/1fpI4I5KP2WyVD5PeytW_hoXZswOt0dwA/view?usp=sharing

we want to read in the main output file (if following the example in the readme, it will be a geopackage), which will have a collection of points and associated metadata.
311382267-0cfe102e-9583-4c05-ace0-85cb4fa3817d
we want to use the image_path to read in the image associated with each point. if available - note that some points will have a "None" value for image_path and "NULL" for image_id (however, please have a more robust check than watching for a string value of "None" as I think the indicator of no match change in the future). calculate the GVI and then store the value back into the dataset in a new column.

not needed if you're using the readme data but some examples of images we want to use include this

An9LPuVP__95KLo-N1Rrz3QUF1JysT6W2zoVbcNLvtvMdhWf_0DGRWqmdWYq4CKvWFiy7LxuBE7UncXq9SDoGxCW4kHTG8yhTJUqTH6ZN0PNTb2gwlxs4cqLrpmMYB3OBsWOX_dd4cu-9aHLxn-k3os

and this
An-osqrLIb42BqAotjtBsHsIkjZGzybhRFbxhvqSqXua56jVpQnyv-2ryDRLQNIrKysYqoqaVRVPCqCiWq-xpl3zY5NC2S9a8nT_Lummb_TUAAIQS4FMaizyMC-ilyvl787-Y2LUYygScREJkMmghcw

and this
An9gOUMNlxtzcxT1ajOavmG6h0qjhqbqI5bDYxVapsVVNlcpN5pABlE_qpZHhob54UML1ZgIOYRkx6VhcIL_Jq0Y0P9bZDnAExhCrRKYUVxKxZJ0WjWv0s02_ZACsPy5-uIK5LwDxQ3WMOW3KOyISKk

@banjtheman
Copy link

Was able to use Claude Sonnet Vision API to provide a score, here is an example output

{
"gvi": 10,
"reason": "The image shows a residential area with buildings, fences, and a road. There are some small trees and shrubs visible along the road and around the buildings, but overall, the green vegetation coverage appears to be minimal, likely around 10% of the visible area in the image."
}

@banjtheman
Copy link

If we have some images with "known" GVI would like to baseline it against the vision API

@danbjoseph
Copy link
Member Author

Hey @banjtheman, thanks for exploring options! I should add some notes to the readme - we want to prioritize things that are transparent, efficient, open, low or no cost, and can be run locally if possible.

  • The GVI in Treepedia is a researched methodology - see this published paper - that will produce repeatable results. My understanding is that the same prompt to an LLM will not necessarily result in the same output every time. Additionally in your result we can guess that the "10%" mentioned in the reason field is linked to the "10" in the gvi field, but that is only an assumption and not a documented process.
  • Access to the Claude Sonnet Vision API may have a free tier or be otherwise subsidized, but training and running LLMs has been noted as quite costly and I don't trust these companies to not raise prices in the near future.
  • Users of our tool may have slow and/or expensive internet. If we can avoid additional bandwidth requirements, it will make it easier for them.
  • A more specialized image segmentation model could possibly do the same (or better) analysis more efficiently and possibly be something that could run on a laptop.

For the MVP, I would like to match what the Treepedia project did. There are plans to add other analysis options into the tool chain. If you think the Clause Sonnet Vision API offers unique advantages, please post your thoughts onto #10.

@banjtheman
Copy link

Thanks for the added context!!!! Will add the Claude stuff to #10

Hmm, I couldn't access the paper, do you have a copy somewhere would want to see if we can codify

@danbjoseph
Copy link
Member Author

li2015.pdf

@banjtheman
Copy link

banjtheman commented Mar 7, 2024

Was able to convert the old GreenView_calcuate.py to use some modern frameworks, that all run locally
Gets this for the first image

Green view score:
17.177629470825195

import cv2
import numpy as np
from skimage.filters import threshold_otsu


def get_gvi_score(image_path):
    """
    Calculate the Green View Index (GVI) for a given image file.

    Args:
        image_path (str): Path to the image file.

    Returns:
        float: The Green View Index (GVI) score for the given image.
    """
    # Load the image
    original_image = cv2.imread(image_path)

    # Convert to RGB color space
    rgb_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

    # Calculate ExG (Excess Green)
    r, g, b = cv2.split(rgb_image.astype(np.float32) / 255)
    exg = 2 * g - r - b

    # Apply Otsu's thresholding on ExG
    threshold = threshold_otsu(exg)
    green_pixels = (exg > threshold).sum()
    total_pixels = original_image.shape[0] * original_image.shape[1]

    # Calculate the Green View Index (GVI)
    gvi_score = (green_pixels / total_pixels) * 100

    return gvi_score


print("Green view score:")
print(get_gvi_score("example_greenview_image.jpg"))

@ioalexei
Copy link
Contributor

Using the above sample I fleshed this out - had to do it on a sample of 10 points as I don't have enough space to download all the mapillary images for the dataset. Is this in the right area of what you're after?

import pandas as pd
import geopandas as gpd 
import os
import cv2
import numpy as np
from skimage.filters import threshold_otsu

def get_gvi_score(image_path):
    """
    Calculate the Green View Index (GVI) for a given image file.

    Args:
        image_path (str): Path to the image file.

    Returns:
        float: The Green View Index (GVI) score for the given image.
    """
    # Load the image
    original_image = cv2.imread(image_path)

    # Convert to RGB color space
    rgb_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

    # Calculate ExG (Excess Green)
    r, g, b = cv2.split(rgb_image.astype(np.float32) / 255)
    exg = 2 * g - r - b

    # Apply Otsu's thresholding on ExG
    threshold = threshold_otsu(exg)
    green_pixels = (exg > threshold).sum()
    total_pixels = original_image.shape[0] * original_image.shape[1]

    # Calculate the Green View Index (GVI)
    gvi_score = (green_pixels / total_pixels) * 100

    return gvi_score

# Set the directory with the mapillary images 
img_dir = "./data/raw/mapillary" # replace with path to mapillary images 

# Make an empty dataframe to hold the data
df = pd.DataFrame({"filename": [], "gvi_score": []})

# Loop through each image in the Mapillary folder and get the GVI score 
for i in os.listdir(img_dir): 
	gvi_score = get_gvi_score(os.path.join(img_dir, i))

	temp_df = pd.DataFrame({"filename": [i], "gvi_score": [gvi_score]})

	print(i, "\t", str(gvi_score))

	df = pd.concat([df, temp_df], ignore_index=True)

# Create an image ID from the file name, to match to the point dataset
df['image_id'] = df['filename'].str[:-5]

# Open the interim point data
gdf = gpd.read_file("./data/interim/sample10_images.gpkg") # replace with path to interim gpkg

# Join the GVI score to the interim point data using the `image id` attribute
gdf = gdf.merge(df, how='left', on='image_id')

# Export as GPKG
gdf.to_file("./data/processed/sample10_images.gpkg", layer="gvi_scores")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants