Both the pandas.crosstab and the Pandas pivot table seem to provide the exact same functionality. Are there any differences?
5 Answers
The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values.
Looking at the source code for crosstab, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_table as appropriate.
In general, use pivot_table if you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab. In most cases, I don't think it will really make a difference which function you decide to use.
- 32,715
- 6
- 74
- 87
-
1I timed a few options, and turns out pivot_table is one order of magnitude slower than crosstab, and even that is slower than a simple but clunky groupby approach, [here](https://stackoverflow.com/questions/51503717/alternative-to-groupby-for-generating-a-summary-table-from-tidy-pandas-dataframe) – MPa Oct 31 '18 at 05:09
-
1@Mpa this makes no sense. crosstab calls pivot_table, how is crosstab going to be faster? crosstab adds overhead. This article shows crosstab is the slowest between groupby, pivot_table and crosstab https://ramiro.org/notebook/pandas-crosstab-groupby-pivot/ – Guzman Ojero Aug 13 '21 at 13:39
Is it the same, if in pivot_table use aggfunc=len and fill_value=0:
pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
EDIT: There is more difference:
Default aggfunc are different: pivot_table - np.mean, crosstab - len.
Parameter margins_name is only in pivot_table.
In pivot_table you can use Grouper for index and columns keywords.
I think if you need simply frequency table, crosstab function is better.
- 822,522
- 95
- 1,334
- 1,252
-
-
pivot_table has a bit more 'analytical' flexibility, such as filling NA values, adding subtotals, etc., but absolutely at the cost of more overhead, as mentioned above. – rocksteady Feb 20 '19 at 20:33
-
1regarding overhead it is just the other way round (if there is any performance difference at all), as `crosstab` first creates a dataframe and then [calls `pivot_table`](https://github.com/pandas-dev/pandas/blob/v0.25.0/pandas/core/reshape/pivot.py#L570). – Stef Aug 07 '19 at 14:54
The pivot_table does not have the normalize argument, unfortunately.
In crosstab, the normalize argument calculates percentages by dividing each cell by the sum of cells, as described below:
normalize = 'index'divides each cell by the sum of its rownormalize = 'columns'divides each cell by the sum of its columnnormalize = Truedivides each cell by the total of all cells in the table
- 909
- 9
- 8
-
3For me this seems the most notorius difference. I think it is very strange that this functionality is not included in `pivot_table`. – rocarvaj Apr 24 '22 at 03:04
Pivot table shows the values from data. Crosstab represent frequency of the data .
- 11
- 1
-
Please provide some example to explain your answer using `crosstab` and `pivot_table`. That way, the your answer will be much clearer. – Azhar Khan Dec 09 '22 at 05:09
-
You can do multiple types of aggregation with crosstabs not just counts by specifying the aggfunc argument. – NathanLite Jan 10 '23 at 14:11
Crosstab utilized count() aggregation to fill the values while pivot_table would use any other aggregation such as sum().
- 125
- 11