Quantitative Methods in Geography: Correlation and Spatial Autocorrelation

Part 1

The first part of this exercise was focused on correlation. Below are the tasks with the appropriate responses.

Correlation between Distance and Sound Levels

1. Using the data below as well as Excel and SPSS do the following things:

a. In Excel:

i. Create a scatterplot (with Distance on the X axis)

ii. On the scatterplot place the trend line

b. In SPSS find the Pearson Correlation

c. Show the results of the Pearson Correlation

d. What is the hypothesis (remember the Sig. level)

e. Summarize your findings below. Make sure to explain the strength and the direction of the results as well as explain your hypothesis test.

Scatter Plot:

Correlation Chart:

Hypothesis: The null hypothesis would be that there is no linear relationship between the distance and sound level variables. The significance level will be set at 0.05.

Results: The correlation chart indicates that the null hypothesis would be rejected. The null hypothesis would be rejected because the correlation is below the selected significance level of 0.05. In fact even if the significance level were to be set at 0.01 the correlation would still be significant.

It appears that there is a strong negative correlation between the distance and sound level variables based off of the -.896 Pearson Correlation result. The scatter plot would also confirm this. Because the points are so tightly packed to the trend line, it shows that the correlation is strong. Also the direction is apparent as negative, because one variable decreases as the other increases.

Census Tracts and Population in Milwaukee County

Create a correlation matrix with all the data in it:

Perwhite = Percent White Pop. for the 307 Census Tracts in Milwaukee County

PerBlack = Percent Black

PerHis = Percent Hispanic

NO_HS = Percent with No Highschool Diploma

BS = Percent with a Bachelors Degree

Below_Pove = Percent below the Poverty Level

Walk = Percent that walk to work

Results: This correlation matrix contains data representing Milwaukee county. Running correlation tests on all of the variables will show which variables are associated with each other. It is very important to remember that just because two variables may correlate with each other, it does not mean one causes the other. In this example, every time there are two stars shown, there is significant correlation between the two variables. So for example there is a strong negative correlation between the variables percent white and percent black. This means that in general, areas that have higher populations of black people will have lower populations of white people. Another example of correlation found with this data is between the variables percent with a bachelors degree and percent below the poverty level. In this case there is a fairly strong negative correlation indicating that the likelihood of having a bachelor degree and also being below the poverty level is pretty low. An example of strong positive correlation between two variables would be percent below poverty level and percent black. This does not mean that if you are black you will be poor, it just means that there is a statistically significant evidence showing that area with higher amounts of black people also have higher levels of people living in poverty.

Part 2

Introduction

The second part of this assignment takes a look at not only correlation, but spatial auto-correlation. The difference between the two is the element of space. Spatial auto-correlation is the same as correlation, but it takes into account how variable are geographically related. Basically spatial auto-correlation examines if certain correlations are happening in areas that are near each other. In other words spatial auto-correlation is looking for a clustering effect of a certain variable. Spatial auto-correlation can be extremely useful in determining not only what areas are different from another, but how they are different.

To learn the importance of spatial auto-correlation and how it can potentially be used a scenario has been created related to voting patterns in the state of Texas. The scenario is the Texas Election Commission(TEC) wants analysis performed on the presidential elections of 1980 and 2008 to see if election patterns have changed over the last 20 years. Given the voting data, the task is to use correlation and spatial auto-correlation to determine what the voting patterns are in the different areas of Texas.

Methods

Spatial auto-correlation can be performed in the software program called OpenGeoDA. This free software is designed to help with spatial data analysis and one of it's strong points is running spatial auto-correlation. In order to run spatial auto-correlation in OpenGeoDa a weights file must first be created. Basically what a weights file does is assign weights to see how the features that are being analyzed border or touch one another. For this example the weights file determined how much each county within the state of Texas touched or bordered another. This means that bigger counties with long borders would have a larger weight. The weights file essentially is what factors in the spatial portion of spatial auto-correlation

With the weights file created, the actual data and variables can then be analyzed based off of the weights. There are two useful analysis techniques in the OpenGeoDa software. The first is the Morans I, which compares the value of the selected variable at any one location with the value at all of the other locations, determining if there is any spatial auto-correlation. The result from this technique is a value between -1 and 1. The higher the value the more clustered the variable is. This technique also uses a quadrant system, much like a typical X,Y graph. On his graph a scatter plot is generated and a trend line can be placed. This trend line can be a very telling sign of the direction of how the data may be spatially auto-correlated.

The second spatial auto-correlation technique is called local indicators of spatial auto-correlation(LISA). A LISA will essentially output a map that is similar to the X,Y graph output from the Morans I. On this map areas will be highlighted that represent;

Areas that have a high value of the selected area surrounded by other areas that have high values
Areas that have a high value but are surrounded by areas with low values
Areas that have a low value but are surrounded by areas with high values
Areas that have a low value and are also surrounded by other areas with high values

Using this technique will give a visual representation of what the spatial auto-correlation patterns actually look like.

Results

Figure 1: Percent Hispanic Moran's I

Figure 2: Percent Hispanic LISA

Running the spatial auto-correlation tools on the percent Hispanic data for the state of Texas gives a strong idea of where the Hispanic people are located within the state. Obviously the southern part of Texas is where the highest amounts of Hispanic people are clustered, with the NorthEast containing the most clustered non Hispanic populations. Interpreting the Hispanic population is important because it may eventually shed some light as to why some of the voting patterns of Texas exist.

Figure 3: Percent Democratic Vote in 1980 Moran's I

Figure 4: Percent Democratic Vote in 1980 LISA

Figure 5: Percent Democratic Vote in 2008 Moran's I

Figure 6: Percent Democratic Vote in 2008 LISA

The first takeaway from the spatial auto-correlation results on the percent democratic vote in both 1980 and 2008 is the Moran's I result. In 2008 the percent democratic vote has gotten more clustered. Interpreting the LISA maps based off of the Moran's I result shows the greater clustering effect appear similar to that of the Hispanic population clustering. The southern counties of Texas are highly democratic which goes along with the tendency of Hispanics and other minorities to vote democrat. One interesting difference between the two LISA maps of 1980 and 2008 is the NorthEast part of the state. There does not appear to be significant clustering of democratic vote or lack of democratic vote in that area despite showing strong clustering of non-Hispanic populations.

Figure 7: Voter Turnout 1980 Moran's I

Figure 8: Voter Turnout 1980 LISA

Figure 9: Voter Turnout 2008 Moran's I

Figure 10: Voter Turnout 2008 LISA

The spatial auto-correlation results for voter turnout in 1980 and 2008 show significantly less of a clustering effect. Based off the Moran's I result both years that the data was collected showed relatively weak signs of clustering throughout the state. Looking at the LISA map the few areas that do show signs of clustering are quite interesting. The areas that have high voter turnout clustering just so happen to be two of the major metropolitan areas in the state of Texas. Both the Dallas/Fort Worth area and the San Antonio/Austin areas of Texas have high clustering of voter turnout. This of course makes sense because the higher a population of an area the higher the voter turnout will most likely to be. What may be even more significant, is the blue portion of the LISA map in the southern part of the state. This shows that there is a clustering of low voter turnout counties in this part of Texas. This is a telling sign because that is where the clustering stronghold of the percent democratic vote is. That means that even though these areas largely vote democratic, the voter turnout is relatively low, so it most likely will not sway an election. This of course makes sense regarding the state of Texas, which is considered a dominant republican state.

Figure 11: Correlation Matrix with Percent Democratic Vote and Voter Turnout for both 1980 and 2008

-VTP80 = Voter Turnout 1980

-VTP08 = Voter Turnout 2008

-PRES08D = % Democratic Vote 1980

-PRES08D = % Democratic Vote 2008

This correlation matrix echoes what was stated in the previous paragraph. Looking at the voter turnout variables and the percent democratic vote variables there is an apparent negative correlation. This indicates a pattern where the larger the voter turnout the lesser percent of the vote will be democratic. This correlation is moderately strong indicating that not always will the percent of democratic vote go down due to higher voter turnout, but it is still likely.

Conclusion

Texas is generally considered to be a predominantly republican state. The voting data records in both 1980 and 2008 back this up by showing areas of high voter turnout generally representing a lack of democrat percentage and areas of low voter turnout generally representing an increase in democrat percentage. This is a telling sign for the Texas Election Comission and clearly shows significant patterns in the voting data. Depending on the party affiliation of the TEC and the current governor (who requested the analysis to be done through the TEC) this analysis can be used in different ways. If the governor is republican, which judging by these voting patterns he/she is, this data would indicate the major population hubs in the state. As long as the major population centers continue to turnout to vote, most likely the republican will continue to win elections. However if the governor were to be a democrat, the data would be used in a completely different direction. Instead of focusing on major population areas, the democrat candidate would focus on the southern portion of the state and specifically getting the Hispanic portion of the population to get out and vote. If the democrats could turn the southern counties into not just strongholds for democratic vote, but also stronghold of high voter turnout their chances in elections would greatly increase.

This assignment shows the importance of not just correlation and spatial auto-correlation, but of statistics in general. Specifically when it relates to politics, running statistics is essential to any campaign strategy, which also means a lot of money is sure to be spent on political statistics analysis. However, regular statistics is not enough, incorporating space and geography into the mix, such as with spatial auto-correlation, is essential to understand which specific areas are important. Relating statistics to space allows for a greater in depth analysis of the numbers rather than just the numbers themselves. This allows for the people performing the analysis to ask deeper questions, such as rather than just what the data is, but why it is the way it is.

Quantitative Methods in Geography

What is this Blog About?

Thursday, April 9, 2015

Correlation and Spatial Autocorrelation