Wondering if someone could shed some light on a for loop to perform the following.
df1
col1:
A
B
C
D
E
df2
col2:
A
C
D
If the value in df2 appear in df1, replace with X, else replace with Y and append a new column.
Final df1
col3:
X
Y
X
X
Y
Wondering if someone could shed some light on a for loop to perform the following.
df1
col1:
A
B
C
D
E
df2
col2:
A
C
D
If the value in df2 appear in df1, replace with X, else replace with Y and append a new column.
Final df1
col3:
X
Y
X
X
Y
As Kraigolas commented, you can easily do this without looping.
Check if elements are in another array with np.in1d and then map truth values to "X" and "Y":
import pandas as pd
import numpy as np
df1 = pd.DataFrame()
df1["col1"] = ["A", "B", "C", "D", "E"]
df2 = pd.DataFrame()
df2["col2"] = ["A", "C", "D"]
df1["col3"] = list(map(lambda x: "X" if x else "Y", np.in1d(df1.col1, df2.col2)))
print(df1)
Output:
  col1 col3
0    A    X
1    B    Y
2    C    X
3    D    X
4    E    Y
Given question with dataframe in title, variables df1 and df2 together with col1 and col2 probably is related to pandas or numpy.
Without any further context provided, like code, we can only recommend vague options but not help with a specific solution.
Following are some functions in the solution space:
numpy.in1d (explained below), pandas.Series.isin, set & other or set.intersection()numpy.where (explained below), pandas.Series.where, mapSee numpy's in1d(ar1, ar2, assume_unique=False, invert=False) function:
Test whether each element of a 1-D array is also present in a second array.
import numpy as np
array_1 = np.array(['A', 'B', 'C'])
print(array_1)
# ['A' 'B' 'C']
array_1_elements_exist = np.in1d(array_1, ['C', 'D'])
print(array_1_elements_exist)
# [False False  True]
The mapping can be done using Python's built-in map(mapping_function, array_or_list) as answered by rikyeah.
Or directly use numpy's where(condition, [x, y, ])
Return elements chosen from x or y depending on condition.
to map binary values (in statistics this is called binary classification):
import numpy as np
array_bool = np.array([True, False])
print(array_bool)
# array([ True, False])
array_str = np.where(array_bool, 'x', 'y')
print(array_bool)
# array(['x', 'y'], dtype='|S1')
As the question hasn't shown a reproducible example yet, it is unclear how the combined functionality can be applied in context.
Until some example is provided in given question, the combination of both functions is left open.
Example applications of these functions to pandas are:
Or in built-in Python: