Shuffle two list at once with same order

Question

I'm using the nltk library's movie_reviews corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documents and documents2 I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7

Example (in real are strings tokenized, but it is not relative):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

I have this code:

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Possible duplicate of [Better way to shuffle two numpy arrays in unison](https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison) — Rick Smith, Apr 30 '18 at 21:50

score 310 · Accepted Answer · edited Mar 29 '22 at 22:41

310

You can do it as:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

edited Mar 29 '22 at 22:41

Muhammad Dyas Yaskur

6,914
10
48
73

answered Apr 25 '14 at 09:45

sshashank124

31,495
9
67
76

12

(noob question) - what does the * mean? – ᔕᖺᘎᕊ Apr 02 '15 at 16:18
4

@ᔕᖺᘎᕊ, It means unpack the values of c so it is called as `zip(1,2,3)` instead of `zip([1,2,3])` – sshashank124 Apr 03 '15 at 01:21
2

I used this solution before and `a` and `b` were lists at the end. With Python 3.6.8, at the end of the same example, I get `a` and `b` as tuples. – vvvvv Feb 23 '19 at 16:52
3

...Tuples... so just a=list(a) and b=list(b) – RichardBJ May 09 '19 at 13:07
how would you do this for 3 or 4 arrays? – echan00 Aug 28 '19 at 20:01
Same approach, zip can take a variable number of arguments – sshashank124 Sep 02 '19 at 15:16

score 73 · Answer 2 · answered Apr 10 '18 at 13:35

73

I get a easy way to do this

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

answered Apr 10 '18 at 13:35

hua wei

830
6
6

11

The original post is about normal lists in python, but I needed a solution for numpy arrays. You just saved my day! – finngu Nov 09 '18 at 16:39
2

It seems like `np.random.permutation(a.shape[0])` would be simpler – Itay Nov 09 '21 at 11:47
Saved my day to reverse it. Use inverse_perm = np.argsort(permutation) and np.array(my_array)[inverse_perm] to reverse it. – trinity420 Apr 04 '22 at 12:47

YScharf · Answer 3 · 2020-05-26T08:42:10.310

21

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

edited May 26 '20 at 08:42

answered May 15 '19 at 09:08

YScharf

1,638
15
20

score 7 · Answer 4 · answered Dec 12 '17 at 07:50

Shuffle an arbitray number of lists simultaneously.

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Output:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

Note:
objects returned by shuffle_list() are tuples.

P.S. shuffle_list() can also be applied to numpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

Output:

$ (3, 1, 2) (6, 4, 5)

score 5 · Answer 5 · answered Oct 05 '19 at 21:02

5

Easy and fast way to do this is to use random.seed() with random.shuffle() . It lets you generate same random order many times you want. It will look like this:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

This also works when you can't work with both lists at the same time, because of memory problems.

answered Oct 05 '19 at 21:02

Boris

59
1
1

4

shouldnt it be random.shuffle(a) ? – Khan Jun 02 '20 at 22:10
I think `getstate/setstate` would work better than assigning a particular seed. – pjs Mar 25 '21 at 15:54

score 0 · Answer 6 · answered Mar 25 '21 at 12:47

You can store the order of the values in a variable, then sort the arrays simultaneously:

array1 = [1, 2, 3, 4, 5]
array2 = ["one", "two", "three", "four", "five"]

order = range(len(array1))
random.shuffle(order)

newarray1 = []
newarray2 = []
for x in range(len(order)):
    newarray1.append(array1[order[x]])
    newarray2.append(array2[order[x]])

print newarray1, newarray2

score 0 · Answer 7 · answered Jan 02 '22 at 10:50

This works as well:

import numpy as np

a = ['a', 'b', 'c']
b = [1, 2, 3]

rng = np.random.default_rng()

state = rng.bit_generator.state
rng.shuffle(a)
# use same seeds for a & b!
rng.bit_generator.state = state # set state to same state as before
rng.shuffle(b)

print(a)
print(b)

Output:

['b', 'a', 'c']
[2, 1, 3]

score -2 · Answer 8 · edited Jun 24 '16 at 00:41

You can use the second argument of the shuffle function to fix the order of shuffling.

Specifically, you can pass the second argument of shuffle function a zero argument function which returns a value in [0, 1). The return value of this function fixes the order of shuffling. (By default i.e. if you do not pass any function as the second argument, it uses the function random.random(). You can see it at line 277 here.)

This example illustrates what I described:

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

Output:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]

The `random.shuffle` function calls the `random` function more than once, so using a `lambda` that always returns the same value may have unintended effects on the output order. — Blckknght, Jun 24 '16 at 00:55
You are right. This will be a biased shuffling, depending on the value of r. It may be practically good for many cases but not always. — Kundan Kumar, Jun 30 '16 at 00:26

Shuffle two list at once with same order

8 Answers8

Linked