Instuctions:
Task 1: Downloading Data
1. The first thing you need to do is to download some data with which to work. In your browser window, click on File and New (or ctrl + N) to launch a second browser window. You'll need this window open in one of the two browsers. Remember that Alt + Tab allows you to quickly move between open applications.
2. Point one of the two browsers to: Fake Illinois Web Site.
3. The dialog window in figure 1 below will appear in your browser window (if you are using Internet Explorer. Netscape and Firefox dialog boxes will look different). This window is prompting you to download data. It is generally a good idea to save the file to a location you can easily find, and to scan it quickly for viruses before you open it. You may want to download it to the desktop first, because it's easy to find there.
* If your browser just opens the file without the download, then either use Internet Explorer, or copy and paste the data straight to Excel.
**Please note that much government data is now available in .pdf (Adobe) format. Unfortunately, .pdf format is generally NOT easily translated into a useable format. Try to avoid .pdf files if you can. .pdf files are very difficult to convert into spreadsheets without specialized software especially if data rows and columns are not separated by commas or tabs or other characters.
4. You may choose to rename it something meaningful to you, such as g300_excel_smith. The underscores in the file name ( _ ) may prove useful later as you seek to use the file in non-Microsoft applications. “Smith” or whatever your last name is will help me grade your assignment.

Figure 1: File Download Dialog Box
7. After you have saved the file to an easy to find location, you’ll need to launch Microsoft Excel and prepare to open the file.
8. Since .csv files are not native to Microsoft, you’ll need to change the file type that Excel looks for. Click on the down arrow to the right of the text box to the right of “Files of type”. Search for the “Text Files” type, which include . prn files and .txt files. See Figure 2 below.
9. This will make visible all .csv files. Double click to open the desired file.

Figure 2: Open File Dialog Box.
10. Once you open the file in Excel, you should see the data somewhat as it appears in Figure 3 below.

Figure 3: Initial Spreadsheet Window
11. The other thing you should do at this point is to save the file in excel format. Click on file and save (Ctrl + S) but not "Save As". This will bring up the dialog window in Figure 4 below. Since in this instance you want to save the file as an .xls file, click No and another dialog window will appear prompting you to save the file as a .xls file. Click Save in that window. Notice that the file name displayed across the top of the screen in Excel now refers to the file as a .xls file.
*note I will be giving you the key stroke method I use to speed my work in excel in parentheses...so in addition to spelling out the mouse actions. For example...to open a file you could click on File on the menu bar, then select New...or you could just press control and N at the same time (Ctrl + N)
12. You may also save a copy of the file under a different name (such as G300_excel_original) so you have a backup available in case you do irreparable damage to the file you’re using.

Figure 4: Save Dialog Window displayed when file format is not .xls format.
Task 2: INDEXING DATA SETS
13. A good habit to get into when you are working with data such as this, especially when it’s a large data set, is to create an index column that allows you to reorient/or re-sort the data just the way you found it should that become necessary. Essentially you will give the list of entries a number from 1 to X, giving each row of data a unique value.
14. To create an index column : Place your cursor in cell A1 (column A, row 1) of the spreadsheet. This will place a bold box around the cell indicating that cell A1 is active.
15. Click on Insert on the top menu and choose columns from the drop-down list. (or press Alt + I +C).
16. In cell A1, which should now be blank type in the letters ID (for index). Move down one cell into cell A2 and enter the number 1.
17. Move your cursor to cell B2 and press Ctrl + DOWN ARROW at the same time. This allows you to move to the first cell down column B without data (cell B12). Experiment with the Ctrl + (arrow) function. It becomes absolutely NECESSARY to navigate data this way when you are working with very large data sets. Scrolling down or mousing down a large data set with 20,000 rows takes way too much time.
****NOTE: FOR SOME REASON, THE SYMBOL FOR DOWN ARROW (¯) , UP ARROW (↑) AND RIGHT AND LEFT ARROWS (¬,®) DON'T APPEAR CORRECTLY IN SOME BROWSER WINDOWS. THEREFORE I WILL INDICATE WHEN YOU SHOULD KEY ONE OF THESE ARROW SYMBOLS BY TYPING "UP ARROW" ETC. TO THE RIGHT OF THE SYMBOL ITSELF.
18. Back at cell B12, move with your left arrow button to cell A12. Now press (Ctrl+Shift+↑ UP ARROW )all at the same time. This will highlight all cells from A12 up to A2.
19. With the cells A2 to A12 highlighted, click Edit, Fill, Series (Alt, E, F, S) and the Dialog Box in Figure 5 below will appear. This box allows you to indicate by what increment you wish your list to increase by and in what direction.
20. Since an increment of one is fine and you have highlighted a column of cells, you only need to click OK, but you should make note that other increment values are available and other types of increments (such as dates) can also be used.

Figure 5: Series Fill Dialog Box
21. Column A should now have a column header of ID and numbers 1 through 11 in the cells below.
**MY FAVORITE SHORTCUT KEY IN EXCEL IS CTRL+Z (WHICH IS UNDO). Control + Y is redo. F4 also is good for “do that again”.
TASK 3: VIEWING and SORTING DATA
22. Scroll or Arrow over to column P headed with the word ”areaname”. Notice that much of the data values in this column is hidden. To widen the column, place your cursor on the vertical line between letters P and Q. Your cursor should change from a boxy plus sign to a skinny plus sign where there are arrow points on each end of the horizontal crossbar. Once your cursor has changed, click and drag to the right your mouse. This will expand the width of the column to the width you like best. You can also format column widths and row heights using by clicking Format from the top tool bar and choosing the options there.
23. Once you’ve widened the column of data under areaname you’ll notice it is a list of the large cities (MSAs for metropolitan statistical areas) in Illinois. Perhaps you would like the data sorted alphabetically by the name of the MSAs in your list.
24. Click on Data on the menu bar and select Sort from the drop down menu (Alt+D, S). The dialog box in Figure 6 will appear.
25. This is fantastically useful spreadsheet function. Here the dialog box is asking you which column you want to sort by. The drop down arrow (triangle) allows you to select from a list of column headings (generally row A) in your spreadsheet. It also allows you to sort it Alphabetically (ascending) or in reverse alphabetical order (Descending). If you have numbers in the column you are sorting by, you can sort from biggest to smallest or vice verse.
26. You can also sort using data in several columns in sequence, which is useful if you have multiple data values that are equal in one column, but different in the next. You should experiment a bit with this function.
27. Click OK and your data will sort by alphabetically by the data in column “areaname”. Bloomington-Normal now should be at the top of the list and St. Louis should now be at the bottom of the column headed by “areaname”. It's important that your columns have names, such as "areaname" or "total pop". If there is no column name in row 1, then make sure that the "No header row" option is checked (see figure 6 below).

Figure 6: Data Sort Dialog Box
28. Next sort the data by total pop, but chose the “Descending” option in the dialog box. You should find that the Chicago-Gary-Kenosha CMSA tops your list.
AN IMPORTANT NOTE ON DATA
29. Before you go on to the next step, please notice that numeric values are aligned on the right side of cells and the text or string variables (like the city names are aligned against the left side of their cells. On occasion, you will find that some numbers, like ZIP codes are also left aligned. THIS INDICATES that these numbers are NOT numbers in the normal sense, but nominal data (name data) where numbers are used as identifiers, but you can not (or should not) try to perform mathematical calculations on this data.
30. If your nominal data (like FIPS, SIC codes or ZIP codes) are acting like integers in the software, perhaps getting aligned on the right instead of where they should be on the left side, then you’ll need to tell Excel to make them text. This is exactly what has happened with the data in the column labeled msacmsa, statefips, etc. These are Federal Information Processing (FIPS) numbers, and like ZIP codes, make no sense to add.
31. MORE IMPORTANTLY!!!! If you are going to JOIN this data with a database in a GIS program like ARCVIEW, ZIP codes, FIPS codes, and other geographic identifiers, MUST BE RECOGNIZED BY THE GIS AS NOMINAL DATA or which GIS calls TEXT!! Otherwise it won’t work in the GIS.
32. Here’s the BEST way to convert these numeric values into text values.
NOTE: There are other ways of converting cells from numeric values to text, but thanks to a glitch in Excel, it doesn’t always work correctly AND this method seems to work without problem.
1. Highlight the column of data you want to convert. A good choice is column J (if you've created the new index column in column A). This data is interpreted by Excel as numbers...but it shouldn't be because these are numeric identifiers for places. You should not be able to sum this data...so, place your cursor in cell J2 and press Ctrl + Shift + ¯ DOWN ARROW
2. Click on Data on the menu bar, and select “Text to Columns” from the drop down list or (Alt D, e)
3. Click Next and then Next again.
4. In the Step 3 of 3, click on the Text format from the list of Column data format list. (See Figure 7 below)
5. Click Finish and note that the data is now left justified and can no longer be summed, which means it is text or nominal data.

Figure 7: Text to Column Wizard: Step 3 of 3.
TASK 4: SIMPLE STATISTICAL CALCULATIONS IN EXCEL
Mean/Average
33. Excel allows you to perform a number of simple (and a few more complex) statistical procedures on the data.
34. To calculate the mean (average) for the column of data you sorted in step 26 (total pop) place your cursor two rows below the last row of data in the “total pop” column (Q14). You need to skip a row so that your calculations do not get entangled with your data should you sort the data again. Without the blank row of data, Excel treats your calculations as part of the data and will sort it along with the rest of the data.
35. Once you have placed your cursor in the appropriate cell, type in an equals sign (=). The equals sign is how all formulas must begin in Excel.
36. Next type in the word ‘average” and a left parentheses sign, so you have in the cell =average(
37. Now place your cursor in the bottom most cell of the column you want to derive an average from (in this case it should be the Q12 cell).
38. Next press Ctrl + Shift +↑ UP ARROW. This should highlight the column, but also the column heading which cannot be added, so lift your finger off the Ctrl button and press the ¯ DOWN ARROW once.
39. You should notice that the formula now appears both in the cell where you are placing the formula and in the formula bar just under the tool bars. The formula needs to have a right parentheses “)” to complete the formula, but you can just press enter at this point and Excel will add it in for you.
You could click and drag the cursor to highlight the column and that would be fine in this instance, but again if you have a large data set it would take too long, so please practice using the key stroke method as it has been outlined above.
40. The average population of these cities is 1719846. You should type the word “MEAN” in the cell left of this value (cell P14) so that you can remember what the values in the row are.
41. To calculate the average for each column of data, you need to copy the formula in each of the cells two below the last datum each of the columns.
42. To copy and paste the formula, place your cursor back in the cell bearing the 1719846 and press Ctrl + C, to put the formula in the clipboard. You should notice the “blinking” or “chasing” dashed lines that now outline the cell that has been placed in the clipboard.
43. Press the ↑ UP ARROW twice to move your cursor to cell Q12, two rows above the cell you just copied.
44. Press Ctrl + ® RIGHT ARROW to move to cell DH12 (or you can just scroll over there).
45. Move your cursor down two cells to DH14. (or ¯ DOWN ARROW twice)
46. Press Shift + Ctrl + ¬ LEFT ARROW to highlight all the cells you want to paste the formula into (the formula that is in cell Q14).
47. Press Enter. Notice that the average or mean for each column of data has now been calculated. YOU copied and pasted the FORMULA, which is the default process in EXCEL.
-
If you wanted to only copy and paste just the VALUE in cell Q14, then you need to follow the steps to copy, but when you are ready to paste, you would instead click on Edit and select Paste Special from the options (Alt, E, S). This brings up a dialog box with various options (several of which are very useful) and you would select “Values” from the list of options.
TASK 4: STANDARD DEVIATION
48. Next you are going to calculate the standard deviation of the variables. Standard Deviation tells you how much variation there is around the mean.
-
For example, if there were 100 students in a class and 50 of them weighed 148 pounds and 50 of them weighed 152 pounds, the average weight for the class would be 150 lbs. The average (mean) would be the still be the same if 50 students weighed 100 lbs and the other 50 weighed 200 pounds. The Standard Deviation would be very different though. In the second scenario the distribution of values around the mean is much greater than in the first.
49. Type the word STDEVP in cell Q15 (just under where you typed “MEAN”)
50. There are several ways of calculating standard deviation. One way you should find valuable is via the function key. It may be on the top toolbar and it will look like a “fx”, though it may not be there on your version of Excel. You can also insert a function by clicking on Insert from the menu bar and selecting function from the drop down menu (Alt, I, F). Any procedure should produce a dialog window that looks like the figure 8a below.
Do this step now.

Figure 8a:Insert Function Dialog Window (default). Figure 8b: Insert Function Dialog Window (Statistical)
51. From the category drop down list (Most Recently Used is the default) , select "Statistical". A new list of functions should appear in the box below. See Figure 8b.
52. Scroll down the list of statistical functions and select STDEVP (for Standard Deviation of an entire Population). Select STDEVP in this instance, because this is not a sample of the CSMA cities in Illinois, it is all of them.
53. Click OK.
54. Next a dialog window (figure 9 below) will appear. In the dialog box to the right of Number 1 you need to enter the cells you wish to have calculated by Excel. Since you want to calculate cells Q2 to Q12, you could type those in the box, but better yet is to place your cursor in cell Q12 (the last cell in the column headed Total Pop. Notice the dialog window just has Q12 in it.
55. Next press Ctrl+Shift+↑ UP ARROW. That will highlight the entire column Q, but since you don’t want the column heading (Q1) in the formula, lift your finger off the Ctrl button and press the ¯ DOWN ARROW once. Your cell reference will no longer include the topmost cell in the column and the dialog window should look like Figure 9 below.

Figure 9: Function Arguments Dialog Window.
56. Press Enter and you will have the standard deviation for Total Population among the large cities in Illinois.
57. Repeat the steps from above where you copied the formula and pasted it in the adjoining cells to the right. (see steps 42-47).
TASK 5: CORRELATION
58. The last of the simple statistical functions you will produce is a correlation coefficient. Correlation coefficients are numbers that tell you the degree of relationship between two sets of numbers. For example, you may find a list of dates with daily high temperatures. You may have another set of dates that indicate water use by a city. If you suspect that water usage goes up when the temperature is higher you can use a correlation coefficient. You may find this to be the case, whereupon you would find that a positive correlation. You may find that water usage actually goes down when temperatures are higher, then you would have a negative correlation. You may also find that there is not really much relationship between temperature and water usage (weak positive or weak negative correlation). Eventually you can do regression analysis, maybe with multiple variables, which is similar to correlation and more powerful.
59. Type in the word “PEARSON” in cell P16....so you'll know what that row of data contains.
60. The first step in calculating a correlation coefficient, is to again activate the function dialog box. (Alt, Insert, Function).
61. Again make sure the function category that you have selected is “statistical” (see figure 8a and 8b above)
62. Select from the list of statistical functions “PEARSON” and click OK. The dialog window in Figure 10 should appear.
· Pearson’s R, is a very common correlation coefficient, one used by Geographers frequently, but you should consult a statistics text or manual and know why you are choosing any statistic since many have very specific uses. You could not, for example, use Pearson’s if your data is badly skewed or in ordinal format, whereupon you would probably want to use Spearman’s R for ranked data.
· Click on “Help on this function” in any dialog window to see the formula Microsoft uses to calculate the value you wish to calculate.

Figure 10: Pearson’s Function Dialog Window
63. The first step is to enter the array (or list) of cells that comprise your list of independent variables. The independent variable you will be using is Total Population, and the values are in cells Q2 to Q12. You should highlight them as you have been.
64. After you have highlighted the cells, Q2:Q12 should be in the Array 1 box, but you need to “freeze” the reference to these cells by placing ‘$’ in front of each cell reference. Do this most easily by pressing the F4 button (or just type them in). Note how it looks in Figure 10 above.
65. Use the tab key to move the second function in the dialog box…Array 2.
66. Repeat the process for Array 2, but DO NOT “freeze” the cell reference. Click OK.
67. The correlation that you must get is 1.000. If you just have a 1, then you need to press the increase decimal button on the tool bar “¬.000” which increases the decimal places that are visible in the cell. Sometimes you’ll want to reduce the number of decimals by clicking on the other “decrease decimal” button, which rounds the values.
68. The reason you have a 1.000 value is that each value in the cells in Array 2 change in exact proportion to each change in a corresponding cell in Array 1. This is a perfect correlation because the values in Arrays 1 are equal to the values in array 2. If the values in array 2 were exactly twice (or 3x or 25x) the values in array 1, there would still be a correlation coefficient of 1.000
69. Copy the formula across to all the cells in the row 16. Excel will then calculate the statistical relationship between the total population listed for each city in question and the other variables listed under the other headings.
70. Many of the values you will find have a very high correlation. This makes perfect sense, since there should be a strong relationship between the total number of people and the total number of males, or total number of females, total people under 5 years old, etc.
71. However, you should notice that there is little correlation between the size of the cities and the median age of the populations. In fact, the correlation, though weak, suggests that on average the population is slightly younger in the larger cities than it is in smaller cities in Illinois. There is a slightly stronger, but still negative correlation between vacancy rates and population. It appears that larger cities have fewer vacant properties.
Save your excel file and send it as an attachment to your instructor. For 5 bonus points, submit it via a livetext.com attachment.
The spreadsheet that you produced will be the criteria upon which you are graded for this assignment. The results of your calculations will be used to evaluate your ability to demonstrate basic competency with these few functions of Microsoft Excel.