What is this Blog About?



This quantitative methods in geography blog will showcase the skills and techniques learned in the course GEOG 328 from the University of Wisconsin Eau Claire. The focus of this class is on relating quantitative methods and statistics to geography.

Tuesday, February 24, 2015

Using Z-Scores, Mean Center, and Standard Distance to Study Tornadoes in Kansas and Oklahoma

Introduction

Tornadoes in the states of Kansas and Oklahoma are a common occurrence. Because they are so common there is an increasingly popular idea to build tornado shelters. But not all of the citizens of these states agree with the need for shelters or disagree upon where the shelters should be placed. Using GIS and quantitative methods such as standard distance, mean center, and z-scores a logical analysis can be performed on the area.The outcome of performing this analysis should help predict where tornado shelters should be located if they are necessary at all.

Methods

To examine the placement of tornado shelters, I must analyze related data. The data that I will use to study is:
  • Shapefile of the counties of Kansas and Oklahoma with a total number of tornado occurrences per county.
  • Tornado Locations with there widths from the years of 1995-2006
  • Tornado Locations with there widths from the years 2007-2012
Note: For the purpose of this exercise, I will assume that the wider a tornado the more powerful it is.

The analysis tools I will perform on this data are as follows:
  • Z-Scores- A z-score gives a specific value as it relates to the standard deviation curve or the mean. If a z-score is positive it is above the mean and if it is negative it is below the mean. Z-scores show you how close the value is to the mean and can be of great use determining probability of events.
  • Mean Center- The mean center of a set of data is its spatial center. This is the same concept as an average or a mean, but the only difference is it relates to a spatial location rather than just a number. The mean center can be used much like a mean can be. The mean center can be found via a tool within ArcMap
  • Weighted Mean Center- A weighted mean center is the same as a mean center, but places a weight to it. This means instead of giving the exact spatial center of a set of data, it can give the center of a spatial data weighted by another factor.
  • Standard Distance- If the mean center of a set of data is basically a spatially located mean, then the standard distance of a set of data is basically the spatial equivalent of standard deviation. What a standard distance does is take the mean center and creates a circle around it that resembles the first standard deviation of data points. This is a tool that can also be found within ArcMap.
  • Weighted Standard Distance- The weighted standard distance is the same as the normal standard distance, but instead of using the mean center of a set of data, it uses a weighted mean center. 

Results


Figure 1: 6 maps representing the tornadoes in Kansas and Oklahoma
The above results were found by running the standard distance and mean center tools within ArcMap. The maps with red data points indicate the years between 1995-2006 and the blue data points represent the years 2007-2012.

Mean Center Results

Comparing the mean center of the data and the weighted mean center of the data was a useful way of analyzing the data. The weight that was placed upon the mean center was the widths of the tornadoes. What this indicates is that the wider the tornadoes the bigger and more destructive they were. When this weight was applied to the mean center, the average spatial location was shifted slightly towards the South. This means that there are more dangerous tornadoes towards Oklahoma than there are in Kansas. 

The other interesting thing to note from the mean center calculations is the difference in the years. 

Figure 2: Mean Center and Weighted Mean Centers of Tornadoes from the two different sets of years

Although both mean centers move Southward when weighted, the data that represented the years 2007-2012 had a weighted mean center slightly more to the North and East compared to the earlier set of years.. Indicating the stronger presence of wider tornadoes to the North and East.

Standard Distance Results

The standard distance showed more of what the mean center analysis portrayed. Again the years of 2007-2012 had an area that was slightly more to the Northeast. This simply just means that the strongest tornadoes were recorded more in that direction when compared to past years. The reasoning behind those shifts is unclear.

What is clear is that the mean center and standard distance circles are both centrally located within the area. Even with the changes when the weight was placed on the mean center, it still only affected it slightly. What this shows is that there is no particular area that tends to be effected more by tornadoes. The results are rather inconclusive in determining where tornado shelters should be built. 

Standard Deviation Mapping Results

Figure 3: Standard deviation of total number of tornadoes per county

Because the initial results did not produce conclusive results it is important to look deeper to discover patterns within the data. Above is a map that represents the standard deviation of the total number of tornadoes per county in the years between 2007-2012. What it details is the counties that are well above the mean when it comes to total number of tornadoes. The counties in the darkest shade of red and with the highest number of yellow dots are the counties that would benefit most from tornado shelters. 

It is also important to compare these results to the mean center and standard distance results. Because there was no severe change when it came to determining where the more dangerous tornadoes were occurring, it is fairly safe to presume that the more dangerous tornadoes occur all over the two states. Therefore, the counties with a higher amount of tornadoes should be the ones that receive the highest amount of tornado shelters.

Z-Scores Results

Z-scores can also be used to determine which counties experience a higher amount of tornadoes. To find the z-score, the mean and standard deviation must be found. In this case, I found the mean and standard deviation of the total number of tornadoes from the years 2007-2012 per county.

Mean= 4, meaning the average county has had 4 tornadoes
Standard Deviation = 4 

For examples purpose the z-scores were found for 3 counties

Russel= 5.25
Caddo= 2.25
Alfalfa= 0.25

These z-scores can be used to compare with other counties to determine which counties are experiencing a high or low amount of tornadoes. In this case, all three counties were above the mean, since they were all positive. Russel county's z-score is so extremely high it represents an outlier in the data. This would be one of the darker shade of red counties from above that could definitely use tornado shelters.

Z-scores can also be used to determine probability, such as finding the likelihood that a certain number of tornadoes can be expected within same time frame in the future. So for this example the time frame is 5 years, so predictions can be made for the next 5 years.

For example:

70% of the counties over the next 5 years will experience over  1.92 tornadoes.
20% of the counties over the next 5 years will experience over  7.36 tornadoes

This can be calculated for any percentage desired as long as you know the mean and standard deviation of the data. However, with this set of data these numbers do not mean much. They do not lead towards the research question and in no way help determine where tornado shelters should be placed. 

Conclusion

The results, specifically the standard deviation map, showed that there were definitely areas that had a higher volume of tornadoes. If the states decided to build tornado shelters, placing them within those counties would be the best place to start. However, it is also important to consider other factors such as population when building tornado shelters. Having a tornado shelter where history has shown lots of tornadoes happen, but that area contains no people would be a pointless place to build. It is always important to dig deeper into your data and results and realize exactly what they mean. In this exercise some z-scores were useful while other predictability numbers were not. Understanding your results is more important than just knowing how to find them. 








No comments:

Post a Comment