I have a csv file (VV_AL_3T3_P3.csv) and each of the rows of each csv file correspond to tiff images of plankton. It looks like this:
Particle_ID  Diameter  Image_File                   Lenght ....etc
          1     15.36  VV_AL_3T3_P3_R3_000001.tif    18.09
          2     17.39  VV_AL_3T3_P3_R3_000001.tif    19.86
          3     17.21  VV_AL_3T3_P3_R3_000001.tif    21.77
          4      9.42  VV_AL_3T3_P3_R3_000001.tif     9.83
The images were located all together in a folder and then classified by shape in folders. The name of the tiff images is formed by the Image_file + Particle ID; for example for the first row: VV_AL_3T3_P3_R3_000001_1.tiff
Now, I want to add a new column called 'Class' into the csv file that I already have (VV_AL_3T3_P3.csv) with the name of the folder where each .tiff file is located (the class) using python; like this:
Particle_ID  Diameter  Image_File                   Lenght   Class
          1     15.36  VV_AL_3T3_P3_R3_000001.tif    18.09   Spherical
          2     17.39  VV_AL_3T3_P3_R3_000001.tif    19.86   Elongated
          3     17.21  VV_AL_3T3_P3_R3_000001.tif    21.77   Pennates
          4      9.42  VV_AL_3T3_P3_R3_000001.tif     9.83   Others
So far, I have a list with the names of the folders where every tiff file is located. This is the list that will be the new column. However, how can I do to fit every folder with its row? In other words, matching the 'Class' with 'Particle ID' and 'Image file'.
For now:
## Load modules:
import os
import pandas as pd
import numpy as np
import cv2
## Function to recursively list files in dir by extension
def file_match(path,extension):
    cfiles = []
    for root, dirs, files in os.walk('./'):
        for file in files:
            if file.endswith(extension):
                cfiles.append(os.path.join(root, file))
    return cfiles
## Load all image file at all folders:
image_files = file_match(path='./',extension='.tiff')
## List of directories where each image was found:
img_dir = [os.path.dirname(one_img)[2:] for one_img in image_files]
len(img_dir)
## List of images:
# Image file column in csv files:
img_file = [os.path.basename(one_img)[:22] for one_img in image_files]
len(img_file)
# Particle id column in csv files:
part_id  = [os.path.basename(one_img)[23:][:-5] for one_img in image_files]
len(part_id)
## I have the information related with the collage picture, particle id and the classification folder.
# Now i need to create a loop where this information is merged...
## Load csv file:
data = pd.read_csv('VV_AL_3T3.csv')
sample_file = data['Image File']  # Column name
sample_id   = data['Particle ID'] # Particle ID
I have seen a similar case here: Create new column in dataframe with match values from other dataframe
but I don't really know how to use the 'map.set_index' and also, he has two data frames whereas I just have one.