0

I was going through a college assignment on KNN given in python and in that assignment there was one block of code where they delete X_train,Y_train,X_test and Y_test variables before assigning those variables to other data. And in the comments they added that it prevents memory issues.

 x = large_dataset
 del x
 x = another_large_datset   // block 1


 x = large_dataset
 x = another_large_datset // block 2

what would be the difference between the above two blocks of code.

Thanks :)

terrabyte
  • 13
  • 2
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). Stack Overflow is not intended to replace existing documentation and tutorials. It appears that you merely need to look up how Python handles garbage collection. – Prune Jun 14 '21 at 17:23

1 Answers1

1

Both examples accomplish the same thing - they decrease the reference count of the value "any_dataset" by one. Using del does this explicitly, overwriting a variable does this implicitly. When a value has zero references to it, it will be garbage-collected at some point in the future.

This being the case, I can't see any "memory issues" being prevented by doing it one way or the other.

Further reading material:

jfaccioni
  • 7,099
  • 1
  • 9
  • 25
  • if the data present in x is very large; will there be any effect on the process? – terrabyte Jun 14 '21 at 17:28
  • Like I said, both cases accomplish the exact same thing. If Python decides not to garbage-collect the old dataset before loading the new dataset, you *may* be strangled for memory, depending on your machine, the dataset size etc. But this does not depend on the examples in your question. – jfaccioni Jun 14 '21 at 17:32