A crosstabulation (crosstab for short) is a combination of two or more frequency tables arranged such that each cell in the resulting table represents a unique combination of specific values of the included variables. Crosstabs allow us to identify relationships between variables. Only categorical or nominal variables or continuous variables with a small number of values should be crosstabulated so that the table is easy to analyze and interpret.
Crosstabs are a great way to familiarize yourself with the data you are working with and to get a rough idea of how the variables in your data set are related, if at all. Crosstabs are useful for exploring the data, exploring relationships in your data, and determining future analyzes.
The simplest form of a crosstab is the 2x2 table in which two variables, each with only two distinct values, are crossed. For example, if we conducted a survey and asked people which they would rather have for a pet, a dog or a cat, and then analyzed the data by gender, the crosstab would look something like this:
Female 20 (40%) 30 (60%)
Male 30 (60%) 20 (40%)
Each cell represents a unique combination of the two crosstabulated variables (in this case, Gender and Pet Preference) and the numbers in each cell tell us how many observations fall into each combination of values. For example, this table shows us that more males than females prefer dogs as pets and more females than males prefer cats as pets. Thus, gender and pet preference may be related, although we would need to conduct a statistical test to see if this relationship is significant.
You can also use crosstabs to analyze relationships between more than two variables or variables with more than two response categories. Using the same example from above, we could include a third variable into the crosstab for the type of dwelling that respondents lived in (house vs. apartment/condominium).
Dog Cat Dog Cat
Female 20 30 10 40
Male 30 20 40 10
Theoretically, you could create a crosstab with an unlimited number of variables. However, it is usually difficult to examine and understand tables that involve more than three or four variables. Once you surpass about four variables, it is best to use other statistical or modeling techniques to examine relationships.
Although crosstabs can give us a general idea of whether or not a relationship exists between two or more variable, a statistical test is needed to determine if any relationship is a significant one. There are several tests that can do this, however the Pearson Chi-square is the most commonly used.
Using our example above, if there is no relationship between gender and pet preference, then we would expect roughly an equal number of males and females to choose a dog versus a cat. The chi-square test increases in significance as the numbers deviate from this equal distribution. That is, the greater the difference between males and females choice of pet preference, the higher the significance level in the chi-square test.
StatSoft: Electronic Statistics Textbook. (2011). http://www.statsoft.com/textbook/basic-statistics/#Crosstabulationb