I have a dataframe from 840 columns that I read from a .sav file. I convert all columns to factors using data <- haven::as_factor(data)
This is an example, data just after read the file and without convert to factor:
| tenureType | localityType | monthlyRent | 
|---|---|---|
| 1 | 1 | 200 | 
| 1 | 2 | 140 | 
| 1 | 3 | 500 | 
| 2 | 2 | 100 | 
| 1 | 3 | 700 | 
| 2 | 3 | 20 | 
After data <- haven::as_factor(data):
| tenureType | localityType | monthlyRent | 
|---|---|---|
| Full ownership | Rural | 200 | 
| Full ownership | Urban | 140 | 
| Full ownership | Camp | 500 | 
| For free | Urban | 100 | 
| Full ownership | Camp | 700 | 
| For free | Camp | 20 | 
I have to convert the data to its labels as I want to make some processes on the texts.
I want to build a decision tree using C50 library, so I want to convert all columns that their values (as factor) is a numeric -- like monthlyRent -- to factor of intervals
I want the data to be for example like this:
| tenureType | localityType | monthlyRent | 
|---|---|---|
| Full ownership | Rural | 156-292 | 
| Full ownership | Urban | 20-156 | 
| Full ownership | Camp | 428 - 564 | 
| For free | Urban | 20-156 | 
| Full ownership | Camp | 564 - 700 | 
| For free | Camp | 20-156 | 
- I need each numeric column to be converted to 5 categories
- The intervals calculated by: ( max - min ) / 5
In the above sample: (700 - 20 ) / 5 = 136.
Intervals are: [20-156], [156-292], [292-428], [428-564], [564-700].
I have 840 columns, so I don't know the columns names, I want the intervals to be dynamically, as such columns ranges are from 0 to 10 and others ranges 0 - 10000.
I want the best approach for this. If there is better approach than intervals calculated by ( max - min ) / 5 I'd like to know.
 
     
    