Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Excel Data Analysis: Calculating Ocean Temperature Stats - Prof. Richard Ord, Lab Reports of Computer Science

Instructions for university students on how to use microsoft excel to analyze a large dataset of ocean temperatures. It covers downloading and importing the data, converting temperatures from celsius to fahrenheit, creating a date column, calculating statistics such as mean, min/max, and standard deviation, and creating line charts with trendlines. The document also includes instructions for dealing with missing data.

Typology: Lab Reports

2009/2010

Uploaded on 03/28/2010

koofers-user-5p9
koofers-user-5p9 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Excel Data Analysis: Calculating Ocean Temperature Stats - Prof. Richard Ord and more Lab Reports Computer Science in PDF only on Docsity! A. Basic Statistics There are some very basic statistical concepts that every college educated person should understand. These are useful for understanding your grade distributions in classes here at UCSD, current events, and all manner of information in your lives. Because this isn't a math class, we are going to focus on the intuition and use of a very few of the most common statistical metrics. Mean In most cases, when people say "Average", they are referring to the arithmetic mean. For a set of data points, the arithmetic mean or "Average" is the sum of those points divided by the number of points. For example, if we have points 4, 5, and 6, the mean is calculated as . This fits with most people's intuition of the word "Average". In this lab, we will use the words "mean" and "average" interchangeably, though Excel has only one function to calculate it: =average() Min/Max The min/max of a set of data points is fairly intuitive and needs no explanation. Excel has two functions: =min() and =max() Standard Deviation The standard deviation is one of the most commonly used statistics, and the least commonly understood by non-statisticians. While the mathematical definition is somewhat complicated, the basic intuition is that the standard deviation describes how wide-spread the data is - the bigger the spread, the higher the standard deviation. The size of the dataset also affects the standard deviation. If we only have a few numbers ranging between 1 and 10 for example, the standard deviation will likely be higher than if we have thousands of numbers ranging between 1 and 10. Example: A B 4, 5, 6, 5, 5 4, 1, 7, 11, 2 Both these sets A and B have an average of 5. But set A has a standard deviation of 0.7 while B has a standard deviation of 4.1. Excel has a simple function to calculate the standard deviation of a set of points: =stdev() Lab 5 - Part 1 CSE 3, Fall 2009 In this lab we will learn some more advanced data manipulation and analysis techniques using Microsoft Excel 2007. B. Download and Import the Data into Excel We're going to be working with and analyzing a large dataset describing ocean temperatures in La Jolla collected automatically from a buoy over the last 8 years. Every hour, the buoy takes two temperature readings. Over the last 8 years, that adds up to a LOT of data! Step 0: Create a Lab5 folder in your CSE folder. Step 1: Go to http://cdip.ucsd.edu/?nav=historic&sub=data&stn=073&stream=p1&xyrmo=200907&xmap_id=17 to see a map of the San Diego area data buoys. Most of the buoys have been decommissioned and are no longer gathering data. Step 2: Click on buoy number 100 in the upper left hand corner of your map. Select "pm download table" underneath "Monthly products" on the left-hand side of the next screen. You should now see a chart of all the information available from that buoy over the last 8 years by month. Click on the "All Data" link in the upper left-hand corner of the chart. This may take several minutes due to the enormous quantity of data. Step 3: Once the data has finished downloading (you will be able to tell because you should see today's date as the very last entry) we need to save it. • On the menu bar, click Edit => Select All • Hit Ctrl-C to copy the data • Open a new Notepad++ document and Ctrl-V to paste the data (this may take a few minutes) • Save the data as Buoy100.txt (watch your spelling!!!!) into your Lab5 folder. Step 4: Now that we have the data saved in text format, we need to import it into Microsoft Excel. If you had tried to save directly into Excel, it would not have worked - it would have tried to save everything into a single column rather into many columns as we want it. • Open Microsoft Excel 2007. This dataset is so large, that Excel 2003 could not hold it all. • Click on the Microsoft Office Button in the upper left-hand corner of the screen and choose "Open" • Navigate to your Lab5 folder and look for files of type: "All Files" and then select Buoy100.txt • An Import Wizard window will pop up and ask you how you want to import the data into Excel D. Analyzing the Data Now that we've got our data cleaned up a bit, let's start playing with it. We'd like to create a graph of the Temperature over time, but right now we have more data points than Excel can chart. Besides, we don't need to have almost 50 datapoints per day! Let's take the average temperature for each day in our dataset instead. Step 1: • At the bottom of the spreadsheet, click the "New Tab" button just to the right of the current tab. This will create a new tab called "Sheet1". • Right click on the word "Sheet1" and click "Rename" • Name this tab "Daily Mean" • In cell A1 and B1 type the labels "Day" and "Mean" or "Average" • In cell A2, type 1/30/01 and underneath it in cell A3 type 1/31/01 (this is where our data starts) • Select both cells A2 and A3 • Grab the fill handle in the lower right-hand corner of the selected cells and drag down until you reach today's date. • In cell B2, use the function =averageif(range,condition,[ave-range]) to calculate the daily mean range = A column in spreadsheet tab "Buoy100" condition = cell A2 in the "Daily Mean" spreadsheet [ave-range] = Temp (F) column (column B) in spreadsheet tab "Buoy100" • Either drag this formula down or copy/select/paste to replicate it for each day Step 2: Now that we've got our daily averages, let's graph it! • Select columns A and B in the Daily Mean spreadsheet tab • On the Insert tab, choose the first option under the Line chart menu in the Charts group • Put your chart on its own tab using the Move Chart button on the Design tab. Give it an appropriate name. • Format the resulting chart by modifying the Title, legend, and axis labels appropriately. Your chart may look something like this: Step 3: Now that we've got our chart, let's try to interpret it. • First, notice how the temperature oscillates between the low winter temperatures and the high summer temperatures. This oscillation helps explain why the Standard Deviation is so big. • Next, note that there are three horizontal lines from the temperature line to the bottom axis. If you zoom in on these areas, you'll find that there is data missing at these spots o 8/14/03 is missing from the original spreadsheet entirely o 10/31/05 to 11/3/05 is missing o 12/6/07 to 12/11/07 is also missing The data may be missing through some accident, or perhaps the buoy was broken on those days, but whatever the cause, we need to do something intelligent to deal with the problem. Step 4: The data may be missing in the "Buoy100" spreadsheet tab, but it's not missing in the "Daily Mean" tab because we just generated dates for every day over those 8 and a half years. If we just deleted those rows from the "Daily Mean" chart as well, it would fix the problem temporarily, but we'd run into more problems later. Instead, let's use the average of the days on either side of the missing data to fill in the gaps. • Using a calculator or Excel, calculate the average value of the temperatures on either side of all three gaps. • Replace the missing values in the "Daily Mean" spreadsheet with these average values. • This kind of data replacement is a perfectly reasonable thing to do AS LONG AS YOU describe it in whatever report or paper you describe your data analysis in. Step 5: Now that you have a nice graph displaying average daily temperatures over a period of eight and a half years, let's look at some trendlines and see if we can see any effects of Global Warming in our Ocean Temperature data. Excel has a number of different built-in methods to analyze data. We're going to use Linear Regression to create a trendline for our data. Trendlines can be used to understand existing data and to predict future data. • In your finished chart, right-click on some data point (the blue line in the example shown above) • Select "Add Trendline" • In the Trendline window that shows up, you'll see lots of options for different types of regression models. We want to choose a "Linear" model. Our data is clearly nonlinear - however a linear model will help us see trends in where the data is headed - it will effectively average out the high's and the low's of a year and create a line that represents the "center" of the data. The direction of this center line will tell us if the water in this particular buoy location has been changing temperature over the last several years or not. • At the bottom of the Trendline window, also check the "Display Equation on chart" checkbox and hit Close. • Your chart will now look something like this:
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved