Combine two columns of text in pandas dataframe

Question

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2.

Can anyone help with that?

Searchers: [here's a similar question with more answers](https://stackoverflow.com/questions/33098383/merge-multiple-column-values-into-one-column-in-python-pandas) — ᴍᴇʜᴏᴠ, Oct 18 '22 at 19:40

score 1184 · Answer 1 · edited Nov 11 '21 at 11:20

1184

If both columns are strings, you can concatenate them directly:

df["period"] = df["Year"] + df["quarter"]

If one (or both) of the columns are not string typed, you should convert it (them) first,

df["period"] = df["Year"].astype(str) + df["quarter"]

Beware of NaNs when doing this!

If you need to join multiple string columns, you can use agg:

df['period'] = df[['Year', 'quarter', ...]].agg('-'.join, axis=1)

Where "-" is the separator.

edited Nov 11 '21 at 11:20

iacob

20,084
6
92
119

answered Oct 15 '13 at 10:09

silvado

17,202
2
30
46

28

Is it possible to add multiple columns together without typing out all the columns? Let's say `add(dataframe.iloc[:, 0:10])` for example? – Heisenberg May 09 '15 at 19:15
9

@Heisenberg That should be possible with the Python builtin ``sum``. – silvado May 11 '15 at 11:06
8

@silvado could you please make an example for adding multiple columns? Thank you – c1c1c1 Oct 25 '16 at 16:45
16

Be careful, you need to apply map(str) to all columns that are not string in the first place. if quarter was a number you would do `dataframe["period"] = dataframe["Year"].map(str) + dataframe["quarter"].map(str)` map is just applying string conversion to all entries. – Ozgur Ozturk Feb 01 '17 at 21:17
@OzgurOzturk Based on the OP's example, it seems that the data in quarter are already strings. If not you need to convert of course. – silvado Feb 02 '17 at 12:27
4

Also facilitates the easy addition of a separator. For example the separator '__': ```dataframe["period"] = dataframe["Year"].map(str) + '__' + dataframe["quarter"]``` – H. Vabri Mar 28 '17 at 13:43
25

This solution can create problems iy you have nan values, e careful – Dec 27 '17 at 17:14
this can create problems with text which needs utf-8 encoding – Watt Jun 28 '18 at 18:49
2

I'm getting the `SettingWithCopyWarning` when I use this solution - how can I do this without triggering that warning? – Nate Jul 13 '18 at 18:55
This solution is much faster than using the apply function. – Rahul Jul 22 '18 at 05:28
Not sure which version of python this would work for, but this didn't work. It created a list of the joined list in every row cell of the new column. – xgg Oct 25 '18 at 16:40
5

In general, `astype` should be used for casting pandas.Series to another type. So instead of `.map(str)`, use `.astype(str)`. – flow2k May 07 '19 at 18:59
what to of NaNs? Is there a solution to ignore them? – lil-wolf Apr 16 '20 at 18:54
If both the column are string then df['year_qtr'] = df['year'].str.cat(df['qtr'],sep='_') – kamran kausar Sep 23 '20 at 13:56
3

Instead of `.agg...` use `.apply(lambda x: '_'.join(x), axis=1)`. This was 5 times faster than the .agg version. – agent18 Feb 06 '21 at 11:02
@c1c1c1 see here https://stackoverflow.com/a/57269756/6589617 – geher Jun 01 '21 at 12:51
^^ and you can combine it with str convertig: .apply(lambda x: '__'.join(x.astype(str)), axis=1) – eid May 13 '22 at 10:18

score 400 · Answer 2 · edited May 30 '22 at 17:37

400

Small data-sets (< 150rows)

[''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

or slightly slower but more compact:

df.Year.str.cat(df.quarter)

Larger data sets (> 150rows)

df['Year'].astype(str) + df['quarter']

UPDATE: Timing graph Pandas 0.23.4

Let's test it on 200K rows DF:

In [250]: df
Out[250]:
   Year quarter
0  2014      q1
1  2015      q2

In [251]: df = pd.concat([df] * 10**5)

In [252]: df.shape
Out[252]: (200000, 2)

UPDATE: new timings using Pandas 0.19.0

Timing without CPU/GPU optimization (sorted from fastest to slowest):

In [107]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 131 ms per loop

In [106]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 161 ms per loop

In [108]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 189 ms per loop

In [109]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 567 ms per loop

In [110]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 584 ms per loop

In [111]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 24.7 s per loop

Timing using CPU/GPU optimization:

In [113]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 53.3 ms per loop

In [114]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 65.5 ms per loop

In [115]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 79.9 ms per loop

In [116]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [117]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [118]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 9.38 s per loop

Answer contribution by @anton-vbr

edited May 30 '22 at 17:37

Sunderam Dubey

1
11
20
40

answered Apr 28 '16 at 10:02

MaxU - stand with Ukraine

205,989
36
386
419

What difference between 261 and 264 in your timing? – Anton Protopopov May 21 '16 at 19:57
@AntonProtopopov apparently 100ms out of nowhere :) – Dennis Golomazov Oct 10 '16 at 17:30
@AntonProtopopov, i guess it's a mixture of two timings - one used CPU/GPU optimization, another one didn't. I've updated my answer and put both timing sets there... – MaxU - stand with Ukraine Oct 10 '16 at 17:45
This use of .sum() fails If all columns look like they could be integers (ie are string forms of integers). Instead, it seems pandas converts them back to numeric before summing! – CPBL May 25 '17 at 13:06
@CPBL, try this approach: `df.T.apply(lambda x: x.str.cat(sep=''))` – MaxU - stand with Ukraine May 26 '17 at 07:38
1

@MaxU How did you go about the CPU/GPU optimization? Is that just a more powerful computer or is it something you did with code? – user3374113 Jul 07 '17 at 11:30
@user3374113, it has nothing to do with the code. My notebook has a switch which switches on GPU and enables Intel Turbo Boost Technology – MaxU - stand with Ukraine Jul 10 '17 at 15:11
@MaxU, have you tried comparing the `.str.cat()` method? – LeoRochael Jul 23 '18 at 21:22
Late to the party. But I'm confused as to how `df['Year'].map(str)` works. Does `str` serve as an `arg` and it basically just sets everything in the `df['Year']` as string? Thanks. – Bowen Liu Oct 18 '18 at 18:12
@BowenLiu, exactly, it does the same thing as df['Year'].astype(str) – MaxU - stand with Ukraine Oct 18 '18 at 20:27
And df['period'] = [''.join(x) for x in zip(df['Year'], df['quarter'])] ? – kevins_1 Nov 06 '18 at 04:43
@MaxU - I've found that `apply(str)` is typically faster than `astype(str)`, would be interesting to see how that stacks up here. – elPastor Feb 15 '19 at 16:33
@MaxU could you help the following question https://stackoverflow.com/questions/54865312/how-to-merge-csv-files-with-different-headers-but-same-data-on-condition – panda Feb 26 '19 at 06:43
I'm curious if there is an easy way to make any of these answers amenable to providing an arbitrary list of column names. – jflournoy Apr 09 '19 at 20:17
@jflournoy, pardon, I’m not sure I understood your question. Could you please explain what do you want to achieve? – MaxU - stand with Ukraine Apr 10 '19 at 05:43
I'm curious how you would implement the best performers, `[111]` and `[107]` above, in a function that takes as input a list of arbitrary length. I guess it's obvious one could loop through the list to iteratively build up the full concatenated strings, but thought there might be a better way. – jflournoy Apr 11 '19 at 12:27

score 323 · Answer 3 · edited Jan 08 '18 at 16:21

323

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['period'] = df[['Year', 'quarter']].apply(lambda x: ''.join(x), axis=1)

Yields this dataframe

   Year quarter  period
0  2014      q1  2014q1
1  2015      q2  2015q2

This method generalizes to an arbitrary number of string columns by replacing df[['Year', 'quarter']] with any column slice of your dataframe, e.g. df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1).

You can check more information about apply() method here

edited Jan 08 '18 at 16:21

kepy97

988
10
12

answered Sep 11 '15 at 17:36

Russ

3,644
1
13
15

32

`lambda x: ''.join(x)` is just `''.join`, no? – DSM Sep 19 '16 at 11:54
@DSM no. here we are taking each row of the df[['Year', 'quarter']] and passing it as series to join, and join concatenates the elements in the series. – Ozgur Ozturk Feb 01 '17 at 21:02
8

@OzgurOzturk: the point is that the lambda part of the `lambda x: ''.join(x)` construction doesn't do anything; it's like using `lambda x: sum(x)` instead of just `sum`. – DSM Feb 01 '17 at 21:07
@Russ, if you wanted to join these strings based on some criteria i.e. `join if string != "some_string"` how could you add that? – Chuck Jul 20 '17 at 09:01
6

Confirmed same result when using `''.join`, i.e.: `df['period'] = df[['Year', 'quarter']].apply(''.join, axis=1)`. – Max Ghenis Oct 10 '17 at 05:30
This solution gives me a `TypeError: ('sequence item 0: expected str instance, int found'`, as I want to combine an integer categorical variable with a string category. – Archie Jan 03 '18 at 14:56
1

@Archie `join` takes only `str` instances in an iterable. Use a `map` to convert them all into `str` and then use `join`. – John Strood Mar 27 '18 at 12:51
This solution ended up being reasonably slower for me than using the `df['Foo'].map(function) + '-' + df['Bar'].map(function)` approach. For ~48k rows applying a join along axis 1 took 1.15s whereas the other approach takes 100ms. Not unreasonable on a small dataset, but if you are working with a lot of data that can make a big difference. – Brandon Barney Apr 20 '18 at 12:16
24

'-'.join(x.map(str)) – Manjul Sep 03 '18 at 08:23
Similarly df['period'] = [''.join(x) for x in zip(df['Year'], df['quarter'])] works too which may be faster. – kevins_1 Nov 06 '18 at 04:43
This is the best and most versatile answer, given that the columns to be joined may need to be dynamic and not static and this solution makes that possible. – NL23codes Feb 24 '21 at 16:51

score 195 · Answer 4 · edited Sep 23 '18 at 08:49

The method cat() of the .str accessor works really well for this:

>>> import pandas as pd
>>> df = pd.DataFrame([["2014", "q1"], 
...                    ["2015", "q3"]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014      q1
1  2015      q3
>>> df['Period'] = df.Year.str.cat(df.Quarter)
>>> print(df)
   Year Quarter  Period
0  2014      q1  2014q1
1  2015      q3  2015q3

cat() even allows you to add a separator so, for example, suppose you only have integers for year and period, you can do this:

>>> import pandas as pd
>>> df = pd.DataFrame([[2014, 1],
...                    [2015, 3]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014       1
1  2015       3
>>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
>>> print(df)
   Year Quarter  Period
0  2014       1  2014q1
1  2015       3  2015q3

Joining multiple columns is just a matter of passing either a list of series or a dataframe containing all but the first column as a parameter to str.cat() invoked on the first column (Series):

>>> df = pd.DataFrame(
...     [['USA', 'Nevada', 'Las Vegas'],
...      ['Brazil', 'Pernambuco', 'Recife']],
...     columns=['Country', 'State', 'City'],
... )
>>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
>>> print(df)
  Country       State       City                   AllTogether
0     USA      Nevada  Las Vegas      USA - Nevada - Las Vegas
1  Brazil  Pernambuco     Recife  Brazil - Pernambuco - Recife

Do note that if your pandas dataframe/series has null values, you need to include the parameter na_rep to replace the NaN values with a string, otherwise the combined column will default to NaN.

This seems way better (maybe more efficient, too) than `lambda` or `map`; also it just reads most cleanly. — dwanderson, May 22 '16 at 20:31
Which version of pandas are you using? I get ValueError: Did you mean to supply a `sep` keyword? in pandas-0.23.4. Thanks! — Qinqing Liu, Dec 05 '18 at 20:56
@QinqingLiu, I retested these with pandas-0.23.4 and they seem work. The `sep` parameter is only necessary if you intend to separate the parts of the concatenated string. If you get an error, please show us your failing example. — LeoRochael, Dec 10 '18 at 19:34
@LeoRochael can i do a newline instead of '-' with sep keyword? — Arun Menon, Jun 21 '21 at 02:52
@arun-menon: I don't see why not. In the last example above you could do `.str.cat(df[['State', 'City']], sep ='\n')`, for example. I haven't tested it yet, though. — LeoRochael, Jun 21 '21 at 12:08

Bill Gale · Answer 5 · 2016-03-16T16:57:26.003

Use of a lamba function this time with string.format().

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': ['q1', 'q2']})
print df
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
print df

  Quarter  Year
0      q1  2014
1      q2  2015
  Quarter  Year YearQuarter
0      q1  2014      2014q1
1      q2  2015      2015q2

This allows you to work with non-strings and reformat values as needed.

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': [1, 2]})
print df.dtypes
print df

df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}q{}'.format(x[0],x[1]), axis=1)
print df

Quarter     int64
Year       object
dtype: object
   Quarter  Year
0        1  2014
1        2  2015
   Quarter  Year YearQuarter
0        1  2014      2014q1
1        2  2015      2015q2

This solution worked great for my needs since I had to do some formatting. `df_game['formatted_game_time'] = df_game[['wday', 'month', 'day', 'year', 'time']].apply(lambda x: '{}, {}/{}/{} @ {}'.format(x[0], x[1], x[2], x[3], x[4]), axis=1)` — Dan, Nov 26 '22 at 17:07

score 23 · Answer 6 · answered Jul 30 '19 at 10:38

23

generalising to multiple columns, why not:

columns = ['whatever', 'columns', 'you', 'choose']
df['period'] = df[columns].astype(str).sum(axis=1)

answered Jul 30 '19 at 10:38

geher

475
1
6
14

6

Looks cool but what if I want to add a delimiter between the strings, like '-'? – Odisseo Oct 02 '19 at 17:55
@Odisseo maybe create a delimiter column? – Dd H Sep 20 '21 at 08:15

score 19 · Answer 7 · edited Jun 01 '21 at 12:12

19

You can use lambda:

combine_lambda = lambda x: '{}{}'.format(x.Year, x.quarter)

And then use it with creating the new column:

df['period'] = df.apply(combine_lambda, axis = 1)

edited Jun 01 '21 at 12:12

buhtz

10,774
18
76
149

answered Feb 28 '21 at 16:25

Pobaranchuk

839
9
13

score 15 · Answer 8 · edited Jan 24 '19 at 09:38

15

Let us suppose your dataframe is df with columns Year and Quarter.

import pandas as pd
df = pd.DataFrame({'Quarter':'q1 q2 q3 q4'.split(), 'Year':'2000'})

Suppose we want to see the dataframe;

df
>>>  Quarter    Year
   0    q1      2000
   1    q2      2000
   2    q3      2000
   3    q4      2000

Finally, concatenate the Year and the Quarter as follows.

df['Period'] = df['Year'] + ' ' + df['Quarter']

You can now print df to see the resulting dataframe.

df
>>>  Quarter    Year    Period
    0   q1      2000    2000 q1
    1   q2      2000    2000 q2
    2   q3      2000    2000 q3
    3   q4      2000    2000 q4

If you do not want the space between the year and quarter, simply remove it by doing;

df['Period'] = df['Year'] + df['Quarter']

edited Jan 24 '19 at 09:38

cs95

379,657
97
704
746

answered Jul 22 '18 at 05:20

Samuel Nde

2,565
2
23
23

3

Specified as strings `df['Period'] = df['Year'].map(str) + df['Quarter'].map(str)` – Stuber Aug 07 '18 at 18:58
I'm getting `TypeError: Series cannot perform the operation +` when I run either `df2['filename'] = df2['job_number'] + '.' + df2['task_number']` or `df2['filename'] = df2['job_number'].map(str) + '.' + df2['task_number'].map(str)`. – Karl Baker Mar 03 '19 at 06:43
1

However, `df2['filename'] = df2['job_number'].astype(str) + '.' + df2['task_number'].astype(str)` did work. – Karl Baker Mar 03 '19 at 06:51
@KarlBaker, I think you did not have strings in your input. But I am glad you figured that out. If you look at the example `dataframe` that I created above, you will see that all the columns are `string`s. – Samuel Nde Mar 03 '19 at 17:31
What exactly is the point of this solution, since it's identical to the top answer? – AMC Mar 18 '20 at 01:22

score 14 · Answer 9 · answered Nov 25 '15 at 10:25

Although the @silvado answer is good if you change df.map(str) to df.astype(str) it will be faster:

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

In [131]: %timeit df["Year"].map(str)
10000 loops, best of 3: 132 us per loop

In [132]: %timeit df["Year"].astype(str)
10000 loops, best of 3: 82.2 us per loop

score 12 · Answer 10 · answered Apr 03 '17 at 17:05

Here is an implementation that I find very versatile:

In [1]: import pandas as pd 

In [2]: df = pd.DataFrame([[0, 'the', 'quick', 'brown'],
   ...:                    [1, 'fox', 'jumps', 'over'], 
   ...:                    [2, 'the', 'lazy', 'dog']],
   ...:                   columns=['c0', 'c1', 'c2', 'c3'])

In [3]: def str_join(df, sep, *cols):
   ...:     from functools import reduce
   ...:     return reduce(lambda x, y: x.astype(str).str.cat(y.astype(str), sep=sep), 
   ...:                   [df[col] for col in cols])
   ...: 

In [4]: df['cat'] = str_join(df, '-', 'c0', 'c1', 'c2', 'c3')

In [5]: df
Out[5]: 
   c0   c1     c2     c3                cat
0   0  the  quick  brown  0-the-quick-brown
1   1  fox  jumps   over   1-fox-jumps-over
2   2  the   lazy    dog     2-the-lazy-dog

FYI: This method works great with Python 3, but gives me trouble in Python 2. — Alex P. Miller, Jul 31 '17 at 19:40

score 11 · Answer 11 · answered Jan 09 '18 at 02:13

more efficient is

def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)

and here is a time test:

import numpy as np
import pandas as pd

from time import time


def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)


def concat_df_str2(df):
    """ run time: 5.2758s """
    return df.astype(str).sum(axis=1)


def concat_df_str3(df):
    """ run time: 5.0076s """
    df = df.astype(str)
    return df[0] + df[1] + df[2] + df[3] + df[4] + \
           df[5] + df[6] + df[7] + df[8] + df[9]


def concat_df_str4(df):
    """ run time: 7.8624s """
    return df.astype(str).apply(lambda x: ''.join(x), axis=1)


def main():
    df = pd.DataFrame(np.zeros(1000000).reshape(100000, 10))
    df = df.astype(int)

    time1 = time()
    df_en = concat_df_str4(df)
    print('run time: %.4fs' % (time() - time1))
    print(df_en.head(10))


if __name__ == '__main__':
    main()

final, when sum(concat_df_str2) is used, the result is not simply concat, it will trans to integer.

+1 Neat solution, this also allows us to specify the columns: e.g. `df.values[:, 0:3]` or `df.values[:, [0,2]]`. — Snow bunting, Feb 09 '18 at 09:51

Anton vBR · Answer 12 · 2019-02-20T17:07:44.123

Using zip could be even quicker:

df["period"] = [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

Graph:

import pandas as pd
import numpy as np
import timeit
import matplotlib.pyplot as plt
from collections import defaultdict

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

myfuncs = {
"df['Year'].astype(str) + df['quarter']":
    lambda: df['Year'].astype(str) + df['quarter'],
"df['Year'].map(str) + df['quarter']":
    lambda: df['Year'].map(str) + df['quarter'],
"df.Year.str.cat(df.quarter)":
    lambda: df.Year.str.cat(df.quarter),
"df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df.loc[:, ['Year','quarter']].astype(str).sum(axis=1),
"df[['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df[['Year','quarter']].astype(str).sum(axis=1),
    "df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)":
    lambda: df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1),
    "[''.join(i) for i in zip(dataframe['Year'].map(str),dataframe['quarter'])]":
    lambda: [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]
}

d = defaultdict(dict)
step = 10
cont = True
while cont:
    lendf = len(df); print(lendf)
    for k,v in myfuncs.items():
        iters = 1
        t = 0
        while t < 0.2:
            ts = timeit.repeat(v, number=iters, repeat=3)
            t = min(ts)
            iters *= 10
        d[k][lendf] = t/iters
        if t > 2: cont = False
    df = pd.concat([df]*step)

pd.DataFrame(d).plot().legend(loc='upper center', bbox_to_anchor=(0.5, -0.15))
plt.yscale('log'); plt.xscale('log'); plt.ylabel('seconds'); plt.xlabel('df rows')
plt.show()

score 7 · Answer 13 · answered Mar 15 '19 at 16:37

7

This solution uses an intermediate step compressing two columns of the DataFrame to a single column containing a list of the values. This works not only for strings but for all kind of column-dtypes

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['list']=df[['Year','quarter']].values.tolist()
df['period']=df['list'].apply(''.join)
print(df)

Result:

   Year quarter        list  period
0  2014      q1  [2014, q1]  2014q1
1  2015      q2  [2015, q2]  2015q2

answered Mar 15 '19 at 16:37

Markus Dutschke

9,341
4
63
58

looks like other dtypes won't work. I got a TypeError: sequence item 1: expected str instance, float found – Prometheus Apr 10 '19 at 09:08
apply first a cast to string. The join operation works only for strings – Markus Dutschke Apr 10 '19 at 10:58
This solution won't work to combine two columns with different dtype, see my answer for the correct solution for such case. – Good Will May 16 '19 at 13:21
Instead of `.apply(''.join)` why not use `.str.join('')`? – Bill May 28 '21 at 00:45

score 7 · Answer 14 · answered May 16 '19 at 13:19

Here is my summary of the above solutions to concatenate / combine two columns with int and str value into a new column, using a separator between the values of columns. Three solutions work for this purpose.

# be cautious about the separator, some symbols may cause "SyntaxError: EOL while scanning string literal".
# e.g. ";;" as separator would raise the SyntaxError

separator = "&&" 

# pd.Series.str.cat() method does not work to concatenate / combine two columns with int value and str value. This would raise "AttributeError: Can only use .cat accessor with a 'category' dtype"

df["period"] = df["Year"].map(str) + separator + df["quarter"]
df["period"] = df[['Year','quarter']].apply(lambda x : '{} && {}'.format(x[0],x[1]), axis=1)
df["period"] = df.apply(lambda x: f'{x["Year"]} && {x["quarter"]}', axis=1)

At least your first solution does not work (any more?). I use: `df["period"] = (df["Year"].astype(str) + separator + df["quarter"].astype(str)).astype('category')` — Michel de Ruiter, Mar 16 '23 at 14:01

score 5 · Answer 15 · answered Aug 18 '20 at 04:13

5

my take....

listofcols = ['col1','col2','col3']
df['combined_cols'] = ''

for column in listofcols:
    df['combined_cols'] = df['combined_cols'] + ' ' + df[column]
'''

answered Aug 18 '20 at 04:13

leo

333
4
12

5

You should add an explanation to this code snippet. Adding only code answers encourages people to use code they don't understand and doesn't help them learn. – annedroiid Aug 18 '20 at 10:10

score 3 · Answer 16 · answered Jan 07 '23 at 14:12

3

When combining columns with strings by concatenating them using the addition operator + if any is NaN then entire output will be NaN so use fillna()

df["join"] = "some" + df["col"].fillna(df["val_if_nan"])

answered Jan 07 '23 at 14:12

Ax_

803
8
11

score 2 · Answer 17 · answered Oct 25 '17 at 03:21

2

As many have mentioned previously, you must convert each column to string and then use the plus operator to combine two string columns. You can get a large performance improvement by using NumPy.

%timeit df['Year'].values.astype(str) + df.quarter
71.1 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['Year'].astype(str) + df['quarter']
565 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Oct 25 '17 at 03:21

Ted Petrou

59,042
19
131
136

I'd like to use the numpyified version but I'm getting an error: **Input**: `df2['filename'] = df2['job_number'].values.astype(str) + '.' + df2['task_number'].values.astype(str)` --> **Output**: `TypeError: ufunc 'add' did not contain a loop with signature matching types dtype(' – Karl Baker Mar 03 '19 at 06:56
That's because you are combining two numpy arrays. It works if you combine an numpy array with pandas Series. as `df['Year'].values.astype(str) + df.quarter` – AbdulRehmanLiaqat Feb 10 '20 at 11:23

score 2 · Answer 18 · answered Dec 01 '18 at 10:55

2

One can use assign method of DataFrame:

df= (pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']}).
  assign(period=lambda x: x.Year+x.quarter ))

answered Dec 01 '18 at 10:55

Sergey

487
3
7

score 1 · Answer 19 · answered Dec 04 '21 at 12:43

1

Similar to @geher answer but with any separator you like:

SEP = " "
INPUT_COLUMNS_WITH_SEP = ",sep,".join(INPUT_COLUMNS).split(",")

df.assign(sep=SEP)[INPUT_COLUMNS_WITH_SEP].sum(axis=1)

answered Dec 04 '21 at 12:43

Marc Torrellas Socastro

419
3
8

BMW · Answer 20 · 2017-07-21T20:26:34.640

0

def madd(x):
    """Performs element-wise string concatenation with multiple input arrays.

    Args:
        x: iterable of np.array.

    Returns: np.array.
    """
    for i, arr in enumerate(x):
        if type(arr.item(0)) is not str:
            x[i] = x[i].astype(str)
    return reduce(np.core.defchararray.add, x)

For example:

data = list(zip([2000]*4, ['q1', 'q2', 'q3', 'q4']))
df = pd.DataFrame(data=data, columns=['Year', 'quarter'])
df['period'] = madd([df[col].values for col in ['Year', 'quarter']])

df

    Year    quarter period
0   2000    q1  2000q1
1   2000    q2  2000q2
2   2000    q3  2000q3
3   2000    q4  2000q4

edited Jul 21 '17 at 20:26

answered Jul 21 '17 at 19:38

BMW

509
3
15

NameError: name 'reduce' is not defined – rubengavidia0x Mar 08 '22 at 20:14
1

`from functools import reduce` – pauljohn32 May 25 '22 at 04:22

score 0 · Answer 21 · edited Feb 12 '18 at 07:54

0

Use .combine_first.

df['Period'] = df['Year'].combine_first(df['Quarter'])

edited Feb 12 '18 at 07:54

Keiku

8,205
4
41
44

answered Feb 10 '18 at 04:01

Abul

197
2
4
15

4

This is not correct. `.combine_first` will result in either the value from `'Year'` being stored in `'Period'`, or, if it is Null, the value from `'Quarter'`. It will not concatenate the two strings and store them in `'Period'`. – Steve G Jan 29 '19 at 20:48

Combine two columns of text in pandas dataframe

21 Answers21

Beware of NaNs when doing this!

Small data-sets (< 150rows)

Larger data sets (> 150rows)

Linked

Related