Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.
Questions tagged [dummy-variable]
868 questions
                    
                    160
                    
            votes
                
                6 answers
            
        How to force R to use a specified factor level as reference in a regression?
How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?
It's just using some level by default. 
lm(x ~ y + as.factor(b)) 
with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that…
         
    
    
        Matt Bannert
        
- 27,631
- 38
- 141
- 207
                    143
                    
            votes
                
                5 answers
            
        What are the pros and cons between get_dummies (Pandas) and OneHotEncoder (Scikit-learn)?
I'm learning different methods to convert categorical variables to numeric for machine-learning classifiers.  I came across the pd.get_dummies method and sklearn.preprocessing.OneHotEncoder() and I wanted to see how they differed in terms of…
         
    
    
        O.rka
        
- 29,847
- 68
- 194
- 309
                    66
                    
            votes
                
                11 answers
            
        Dummy variables when not all categories are present
I have a set of dataframes where one of the columns contains a categorical variable. I'd like to convert it to several dummy variables, in which case I'd normally use get_dummies.
What happens is that get_dummies looks at the data available in each…
         
    
    
        Berne
        
- 793
- 1
- 7
- 8
                    53
                    
            votes
                
                5 answers
            
        Pandas: Get Dummies
I have the following dataframe:
   amount  catcode    cid      cycle      date     di  feccandid    type
0   1000    E1600   N00029285   2014    2014-05-15  D   H8TX22107   24K
1   5000    G4600   N00026722   2014    2013-10-22  D   H4TX28046  …
         
    
    
        Collective Action
        
- 7,607
- 15
- 45
- 60
                    52
                    
            votes
                
                7 answers
            
        Keep same dummy variable in training and testing data
I am building a prediction model in python with two separate training and testing sets. The training data contains numerical type categorical variable, e.g., zip code,[91521,23151,12355, ...], and also string categorical variables, e.g., city…
         
    
    
        nimning
        
- 527
- 1
- 5
- 5
                    23
                    
            votes
                
                2 answers
            
        Converting pandas column of comma-separated strings into dummy variables
In my dataframe, I have a categorical variable that I'd like to convert into dummy variables. This column however has multiple values separated by commas:
0    'a'
1    'a,b,c'
2    'a,b,d'
3    'd'
4    'c,d'
Ultimately, I'd want to have binary…
         
    
    
        breakbotz
        
- 397
- 1
- 3
- 8
                    23
                    
            votes
                
                2 answers
            
        how to get pandas get_dummies to emit N-1 variables to avoid collinearity?
pandas.get_dummies emits a dummy variable per categorical value. Is there some automated, easy way to ask it to create only N-1 dummy variables? (just get rid of one "baseline" variable arbitrarily)? 
Needed to avoid co-linearity in our dataset. 
         
    
    
        ihadanny
        
- 4,377
- 7
- 45
- 76
                    21
                    
            votes
                
                1 answer
            
        Creating dummy variables in R data.table
I am working with an extremely large dataset in R and have been operating with data frames and have decided to switch to data.tables to help speed up with operations.  I am having trouble understanding the J operations, in particular I'm trying to…
         
    
    
        user2792957
        
- 319
- 2
- 5
                    17
                    
            votes
                
                1 answer
            
        Handling unknown values for label encoding
How can I handle unknown values for label encoding in sk-learn?
The label encoder will only blow up with an exception that new labels were detected.
What I want is the encoding of categorical variables via one-hot-encoder. However, sk-learn does not…
         
    
    
        Georg Heiler
        
- 16,916
- 36
- 162
- 292
                    16
                    
            votes
                
                2 answers
            
        Linear regression with dummy/categorical variables
I have a set of data. I have use pandas to convert them in a dummy and categorical variables respectively. So, now I want to know, how to run a multiple linear regression (I am using statsmodels) in Python?. Are there some considerations or maybe I…
         
    
    
        Héctor Alonso
        
- 181
- 1
- 2
- 12
                    15
                    
            votes
                
                4 answers
            
        How to summarize data by-group, by creating dummy variables as the collapsing method
I'm trying to summarize a dataset by groups, to have dummy columns for whether each group's values appear among the data's ungrouped most frequent values.
As an example, let's take flights data from nycflights13.
library(dplyr, warn.conflicts =…
         
    
    
        Emman
        
- 3,695
- 2
- 20
- 44
                    13
                    
            votes
                
                1 answer
            
        How to create dummy variable columns for thousands of categories in Google BigQuery?
I have a simple table with 2 columns: UserID and Category, and each UserID can repeat with a few categories, like so:
UserID   Category
------   --------
1         A
1         B
2         C
3         A
3         C
3         B
I want to "dummify"…
         
    
    
        wubr2000
        
- 855
- 2
- 8
- 10
                    11
                    
            votes
                
                6 answers
            
        Split a string column into several dummy variables
As a relatively inexperienced user of the data.table package in R, I've been trying to process one text column into a large number of indicator columns (dummy variables), with a 1 in each column indicating that a particular sub-string was found…
         
    
    
        user2262318
        
- 173
- 7
                    10
                    
            votes
                
                3 answers
            
        R: create dummy variables based on a categorical variable *of lists*
I have a data frame with a categorical variable holding lists of strings, with variable length (it is important because otherwise this question would be a duplicate of this or this), e.g.:
df <- data.frame(x = 1:5)
df$y <- list("A", c("A", "B"),…
         
    
    
        Giora Simchoni
        
- 3,487
- 3
- 34
- 72
                    9
                    
            votes
                
                2 answers
            
        multiple seasonality Time series analysis in Python
I have a daily time series dataset that I am using Python SARIMAX method to predict for future. But I do not know how to write codes in python that accounts for multiple seasonalities. As far as I know, SARIMAX takes care of only one seasonality but…
         
    
    
        Samuel1985
        
- 91
- 1
- 2