Archive for the ‘Statistics’ Category

The Probability of Winning the Lottery

Wednesday, September 30th, 2009

Probability concepts are very relevant in our decision-making. We consider the probability of a stock price increasing before we buy a stock, the probability depending on our assumptions about the future course of the firm’s profitability and the strength of the broader market. We consider the probability that we will arrive at an important meeting on time when determining the time we should leave for that meeting. These forms of subjective probability can be fairly elusive to deal with, but are important components in our considerations towards making a decision.

Gambling games, on the other hand is an example where probability is clearly evident in our winning and losing. For example, we can very concretely determining the probability of winning the lottery. Suppose there are 42 numbers to select from, and you must pick 6 different numbers. What is the probability of selecting the right combination of numbers? Using the combinations formula (see the text for more about this) you can determine the number of correct combinations as follows;

C = 42!/(42-6)!6! = 5,245,786

The probability, then, of select the right six numbers is only 1/5,245,786 or .000000191. This is a very bad use of the $1 it costs to by the lottery ticket. You could do far better by putting it in the bank. Here is an interesting link related to these concepts (http://members.cox.net/mathmistakes/rawdata.htm).

Of course, Las Vegas understands probability very well. As a result, they profit and habitual gamblers lose.

 

 

 

 

 


 

Statistical Inference and the Public Option

Tuesday, September 29th, 2009

Survey and research generates data, and the statistics or data derived can very effectively facilitate efforts to systematically analyze and interpret data and arrive at smart decisions. These decisions follow from statistical inference or the steps associated with hypothesis testing. This should support market research efforts, political polling, statistical quality control research, and many other functions. That being said, survey design and the questions used can influence outcomes, and unless the goal of your research effort is to skew the results of the research, this should be avoided. How can you make a smart and rational decision with poorly collected and analyzed data?

 

An example where survey design and the questions being asked has wildly affected survey results is research related to health reform in 2009. For example, different surveys have found different opinions related to the views of doctors on a public health insurance option. Here is a link to an article discussing this, particularly the importance of disclosing research methodology in order to clarify points (http://www.pollster.com/blogs/a_tale_of_two_doctor_polls.php). Here is a link to the New England Journal of Medicine for their information related to this (http://healthcarereform.nejm.org/?cat=72). This report from Kaiser Health News illustrates varying poll results as well (http://www.kaiserhealthnews.org/Daily-Reports/2009/September/29/Poll-and-politics-health.aspx). There have also been indications that how questions are worded affects survey results, and so does the order of the questions. This blog post from ABC News does a very good job of delineating the differences in these analyses and is a good resource for other issues (http://blogs.abcnews.com/thenumbers/2009/08/views-on-a-public-option-let-the-fur-fly.html). What we are seeing through this is that survey design, the wording of questions, order of the questions, and the population sampled all influence outcomes. The choices related to these in this case may be political, deliberate, or accidental. The point is that it is hard to make smart decisions through statistical inference based on poorly constructed and analyzed surveys, and poorly constructed reports.

 

There are many interesting blogs related to numbers, statistics, and data analysis that illustrate applications of data analysis and research, good and bad, or the proper use of numbers. One is through the Numbers Guy in the Wall Street Journal (http://blogs.wsj.com/numbersguy/). Freakonomics on the New York Times is a good resource for interesting uses and misuses of economics and statistics (http://freakonomics.blogs.nytimes.com/). Charles Blow, a columnist with the New York Times also focuses on statistics and the uses of statistics (http://blow.blogs.nytimes.com/). Another resource from Professor Andrew Gelman, Columbia University is (http://www.stat.columbia.edu/~gelman/blog/).

 

I hope you have fun with these and other resources, and think critically about the data and reports you read.

 

 

 

 


 

Regression Analysis and Economic Forecasts

Thursday, September 24th, 2009

The recession of 2007 to 2009 and its impact on households, businesses, and government, indicates the importance of anticipating changes in the direction of the economy and having a model to facilitate this analysis. The following discussion details a proposed model that facilitates efforts to forecast real GDP and also an approach for monitoring changing economic conditions. The model is based on economic logic, and was developed using regression analysis. The independent or explanatory variables that will be focused on include new private housing unit permits, the bond market, and the stock market. The advantage associated with focusing on these is that they are actively monitored and reported on in the press and by analysts. In addition, professional forecasts and data sources are readily available that can be used to facilitate efforts to analyze changing economic conditions. Measures of risk are also identified that serve as very helpful indicators to also facilitate forecasting efforts.

 

The model is specified as follows;

 

Dependent Variable

RGDP – Real Gross Domestic Product (GDP) in a given quarter

 

Independent Variables

PERMIT –     new private housing units permits in a given quarter.

    RVARBOND – real or inflation adjusted variance between the Baa Corporate Bond yield and the 10-year Treasury in a given quarter.

RS&P500 –     the annual percent change in the real or inflation adjusted S&P 500 stock index in a given quarter.

RGDPt-1 –     lagged dependent variable in a given quarter.

Q2 –    indicator variable for the second quarter of the year, accounting

    for seasonal variations in economic activity.

Q3 –     indicator variable for the third quarter of the year, accounting for

    seasonal variations in economic activity.

Q4 –     indicator variable for the fourth quarter of the year, accounting for

    seasonal variations in economic activity.

 

New private housing unit permits (PERMIT) is an important leading indicator as new permits trigger a lot of new economic activity. Housing permits lead to substantial development and construction activity. In addition, the relative strength in the housing market tends to indicate the perception of households related to the strength of the economy. The importance of this indicator has only been reinforced during the most recent period of economic growth, and even more during the recent economic recession.

 

Corporate bonds which are rated Baa are judged by Moody’s to be medium-grade obligations. Moody’s indicates that this means that they are neither highly protected nor poorly secured. Moody’s further judges that interest payments and principal security appear to be adequate for the present but certain protective elements may be lacking or may be characteristically unreliable over any great length of time. These corporate bonds do not have outstanding investment characteristics and reflect speculative characteristics as well. As a result, Moody’s Seasoned Baa Corporate Bond Yield is an indicator related to the relative risk of the current and future business climate. On the other hand, investments in 10-year Treasury are risk-free in that default is an impossibility, these bonds are held by very cautious long-term investors, and their yield is judged to indicate a risk-free rate of return other investments are judged against. As a result, the real or inflation adjusted variance between the Baa Corporate Bond yield and the 10-year Treasury (RVARBOND) in a given quarter is an indicator of business expectations. The bond market is indicating greater risk to the business climate and the economy when the difference between the Baa Corporate Bond yield and the 10-year Treasury increases.

 

The annual percent change for the most recent twelve moth period of the real or inflation adjusted S&P 500 stock index (RS&P500) is another indicator related to the perceived strength of the economy in the future. The stock market, over a ling period of time, is generally considered to be an important leading indicator in that it reflects expectations related to the future strength of the economy. It is true that in the short-term, the stock market can fluctuate seemingly randomly due to changes in market psychology and business and economic news. For this reason, forecasters look at long-term adjustments in the stock market as an economic indicator because over time the stock market better reflects the evaluation and analysis by investors of data related to future business conditions. The S&P 500 stock index is used because is represents a broad range of stocks.

 

The lagged dependent variable (RGDPt-1) is included as an independent variable in order to resolve a potential statistical problem, positive first-order autocorrelation, a condition that is common in time series models. The consequence of this issue is that the validity of this model would be uncertain and the elasticities can not be accepted as reliable. In addition, it is logical that current levels of water use per account would in some degree depend on prior levels of water use per account

 

The forecasting model is;

 

RGDP = -18.7227 + 0.0588 PERMIT – 21.8687 RVARBOND + 1.3874 RS&P500 + 1.0024RGDPt-1 + 22.7832Q2 + 5.7021Q3 + 12.7139Q4

 

All coefficients have the expected signs. An increase in the number of new private housing unit permits should have a positive impact on the economy. On the other hand, an increase in the spread or difference between the real or inflation adjusted Baa Corporate Bond yield and the 10-year Treasury, should indicate a decline in the economy. In other words, the bond market indicating an increase in the relative risk associated with the business climate is an indicator that the economic growth is declining or will decline. It is logical that additional strength in the stock market indicates future strength in the economy. Two reasons can be presented for this. First, it means that the stock market is anticipating stronger economic growth. Second, the wealth affect will be at work in that higher stock prices lead to an increase in wealth and this facilitates additional spending and economic activity.

 

The following tables illustrate a comparison between actual changes in real GDP and predictions that would have been produced by the model for the period leading up to, and including, the current recession. We see that economic growth slowed significantly in the third quarter of 2008, and then real GDP declined after this point. The model would have also predicted the same outcomes, with some small differences in the rates of change. More interestingly, the model would have predicted much lower rates of growth during the periods of the second quarter of 2007 to the second quarter of 2008, signaling the underlying weakness of the real economy that we now accept as a fact. The recession actually is dated to have started in December 2007.

 

                 Actual         Actual

Year (QTR)         Real GDP        12-Month % Change

2007 (1)         13,099.90         1.425%

2007 (2)         13,204.00         1.863%

2007 (3)         13,321.10         2.739%

2007 (4)         13,391.20         2.530%

2008 (1)         13,366.90         2.038%

2008 (2)         13,415.30         1.600%

2008 (3)         13,324.60         0.026%

2008 (4)         13,141.90         -1.862%

2009 (1)         12,925.40         -3.303%

2009 (2)         12,892.50         -3.897%

 

                 Predicted         Predicted

Year (QTR)         Real GDP        12-Month % Change

2007 (1)         13,143.14         2.242%

2007 (2)         13,204.40         1.310%

2007 (3)         13,273.49         1.730%

2007 (4)         13,371.51         2.363%

2008 (1)         13,384.23         1.834%

2008 (2)         13,377.40         1.310%

2008 (3)         13,382.77         0.823%

2008 (4)         13,210.92         -1.201%

2009 (1)         13,008.25         -2.809%

2009 (2)         12,843.60         -3.990%

 

A forecasting model is only as accurate as the analyst’s ability to estimate the future direction of the independent variables. This means that, in this case, it is necessary to monitor the strength of the real estate market, the corporate and U.S. Treasury bond markets, and the stock market. Resources that will be helpful with monitoring the national housing market includes; the National Association of Home Builders, Mortgage Bankers Association, and Freddie Mac. In addition, the National Association for Business Economics, Congressional Budget Office, Federal Reserve Bank of Philadelphia, Livingston Survey, and the Federal Reserve Bank of St. Louis Economic Research are all helpful resources for economic forecasts and data. Measures of risk in the financial markets will be helpful with assessing the current condition of the financial markets.

 

Measures of risk that can be monitored include; the 3-month Treasury bill rate; the three-month Libor; the CBOE Volatility (VIX) Index; inflation expectations. The safest investment is the 3-month Treasury bill. The financial markets signaled extreme distress as the 3-month Treasury bill rate approached 0.00% during the recent financial crisis. This sort of extreme movement in this rate indicates a flight from any sort of risk and indicates that financing any initiatives would be very problematic. The result will be adverse impacts on the stock market and a widening of the spread between the inflation adjusted Baa Corporate Bond yield and the 10-year Treasury yield. The three-month Libor is a measure of what banks pay each other to borrow for three months. This rate rose significantly in the months of September and October 2008 at the start of the recent financial crisis. The significant increase in the three-month Libor reflected risk, the perspective of banks that there was greater risk related to lending to one another. This is characteristic of a credit crunch, and when credit is choked off and lending between banks becomes constrained, the economy will decline. The CBOE Volatility (VIX) Index is a measure of volatility given by S&P 500 stock index options prices, and as a result illustrates the perspectives of those who speculate in the stock market. The volatility demonstrated by this index starting in September 2008 indicated real difficulty for firms seeking to raise funds in the equity markets, and constrains those who are more risk averse from investing in equities. As a result, it can be used as an indicator related to the future strength of the stock market. Finally, Inflation is associated with growing economies. Of course, the exceptionally high rates of inflation seen in the 1970s indicated significant stress on the economy. Very low rates of inflation can also be very problematic as it indicates lower rates of economic growth and potential difficulty for businesses seeking to prudently raise the prices of their products. The difference between the 10-Year Treasury Constant Maturity Rate and the 10-Year Treasury Inflation-Indexed Security is a very good indicator of the bond market’s expectations related to inflation.

 

What is indicated for the future? Assumptions consistent with current economic conditions, and reasonable assumptions about the future, point towards relatively slow economic growth in calendar year 2010, and stronger but still moderate economic growth in 2011. Assuming continued small improvements in the housing market, strengthening of the corporate bond market, relatively low rates of inflation, and marginal increases in the S&P 500 stock market index from here, point towards 1.8% growth in real GDP in 2010 and 2.6% growth in real GDP in 2011. The low level of growth in 2010 indicates a fragile economy. The moderate levels of growth expected for 2011 indicates an economy that has stabilized. Stronger growth after 2011 is logical. This indicates moderate increases in retail sales in 2010 and additional improvements after this. The potential does exist for stronger economic growth if the housing market improves faster and households feel increasing confidence about the future faster.

Uses of Nonparametric Tests

Monday, April 27th, 2009

Suppose that you work in the marketing department for a major automaker. Your goal was originally to evaluate the prospective strength of the demand for three new cars under development; a SUV, a four-door hybrid, and a sports car. You have surveyed a large number of people in a number of demographic groups and various purchasing habits. You have estimated the proportion of people who will consider purchasing one of these new vehicles based on these surveys and found that the proportion exceeds a required benchmark for pursuing development. This was established through hypothesis testing using the process demonstrated in Week 3. For example, suppose that the benchmark is that at least 40% of prospective buyers must indicate an interest in buying one of the specific new vehicles. Then you test to see if this benchmark has net been met based on the results of the survey (H0: π ≥ 0.40 versus H1: π < 0.40). You find that you cannot reject the null hypothesis indicating that the benchmark has been met. Now you want to apply what you learned from the surveys to marketing analyses.

A next step is to look for where there are statistically significant relationships between the characteristics of those who filled out the surveys and their responses related to their interest in purchasing one of the new cars. This facilitates marketing efforts because knowing the extent to which there is a relationship between age and interest in one of the vehicles, or income and interest in one of the vehicles, or for example family size and interest in one of the vehicles, helps with targeting you marketing efforts. This is established by setting up contingency or cross tabulation tables as illustrated in the notes, and then using the chi-square distribution as the test statistic for your hypothesis test.

Here is an example that briefly describes this application. Let’s say that you find that the benchmark is met for the four-door hybrid and you want to determine if there is a relationship between interest in this car and family size. You will set-up a contingency table with these variables. Then your null hypothesis will be that there is no relationship between family size and interest in purchasing the four-door hybrid. The alternative hypothesis is that there is a relationship between family size and interest in purchasing the four-door hybrid. If you cannot reject the null hypothesis then you will not market towards family size. Next you might focus on income as this contingency table seems to show a higher probability associated with higher income households indicating interest in the four-door hybrid. Here your null hypothesis will be that there is no relationship between household income and interest in purchasing the four-door hybrid. The alternative hypothesis is that there is a relationship between household income and interest in purchasing the four-door hybrid. If you can reject the null hypothesis then you will focus your marketing efforts on higher income households as that group appears to be your market.

Fun Applications of Statistics

Friday, April 17th, 2009

We know that statistics is used to make predictions, apply sample data to arrive at inferences about the populations we are interested in, and assess uncertainty. We see statistics used as a research and decision-making tool in business, economics and finance. We also see statistics applied in the fields of engineering, medicine, sociology, psychology, and communications. What follows is a brief discussion about fun applications of statistics for decision-making purposes.

My first exposure to statistics as an interesting topic to focus on was through my love of baseball when I was a kid. I loved baseball cards not for the picture on the front of the card. The attraction for me was the statistics on the back. For example, I learned how to calculate each player’s batting average (hits/official at bats) and on base average (hits+walks/at bats+walks) and other statistics. In time I learned how the team’s manager can use these statistics as an example of statistics probability. By knowing a player’s batting average when facing a left-handed pitcher versus a right-handed pitcher, and when another teammate is already on second or third base (in scoring position) the manager can make a strategic decision related to whether or not to use a different or pinch hitter. Other more detailed data is available for batters, and also pitchers. The manager can use these statistics as subjective probability tools to select the player best suited to succeed in the particular situation confronted by the team. There are great sources for this sort of baseball related data. See The Baseball Encyclopedia and books written by Bill James. An interesting book about Billy Beane, the General Manager of the Oakland A’s who has managed to be successful despite the team playing in a small market, is Moneyball: The Art of Winning an Unfair Game by Michael Lewis. This book is about how Billy Beane used statistics to put together a team of affordable players that could succeed, if not win the World Series, despite working with a small budget.

There are other fun and interesting books about applications of statistics, and examples, and examples of the poor use or interpretation of statistics. Here is a link through the BBC illustrating this (http://www.bbc.co.uk/dna/h2g2/A1091350). Here is another interesting Blog about the misuse of statistics and data you might want to read (http://blogs.wsj.com/numbersguy/). Here is a Blog that addresses interesting application of economics and statistics in unconventional situations that is fun and interesting to read (http://freakonomics.blogs.nytimes.com/).

Have fun with these resources and search for other fun applications of statistics, and practical ones as well.

 

 

 


 

Descriptive Statistics

Friday, September 5th, 2008

Descriptive statistics are used to help us understand and interpret the data that we collect, and as a result facilitate efforts to analyze and interpret sample data and statistical inference. The point of sample data is to better understand the population of interest. We use the mean, median, and mode to better understand the middle or average of a series of data. We use the range, variance, and standard deviation as measures of dispersion or to better understand how spread out data is, or in other words to get a sense of how well the middle describes the data. What follows is a brief description of how these measures of central tendency or descriptive statistics are used to better understand the population of interest.

The mean, median, and mode are used to describe the middle or average. Each measure indicates different things, and as a result is used for different purposes. The sample mean is the average, and as long as the data used to calculate it comes from a random sample it is used as the best unbiased estimate of the population mean. It works well as long as the data it represents is not skewed either by an exceptionally large or small observation. The reason for this is that an observation that is much lower than what is typical of the majority of the data will make the mean smaller than what is represented by the majority of the data. On the other hand, an observation that is much higher than what is typical of the majority of the data will make the mean larger than what is represented by the majority of the data. We will use the median if this is the case.

The median is the 50th percentile. In other words, fifty percent of the observations will be greater than the median and fifty percent of the observations will be smaller than the median. It is the middle observation, and is not skewed by outliers in the data like the mean is. We see this used to describe demographic data like the median household income or the median home price because these measures can be skewed by exceptional big or small observations. An example where this is important would be that of a business person looking to locate a store. The business person would be interested in household incomes and home prices in zip codes because these measures are indicators of buying power. Looking at the mean as a measure of central tendency can be a mistake because it can misrepresent the characteristics of potential buyers. Imagine the owner of antique shops who sells expensive products. Suppose this business person used the mean or average household income and home price as an indicator to select the zip code to locate a store, and a zip code included a handful of very wealthy homeowners. This could misleadingly lead to the assumption that that zip code should be where the store should be, and this could be a disastrous decision because their only potential customers would be the few wealthy homeowners.

The mode or modal observation is the number that occurs most often. It is not used as often as the mean or median because it is not useful for statistical inference or hypothesis testing, and is not as robust an indicator of the middle as the median. One place where it is used is in retail as businesses want to identify the items that sell the most. Knowing this allows businesses to strategically locate items in the store in places that facilitate the sales of other items. For example, the grocery store places staple items in the back of the store and separates them from one end of the store to another, forcing shoppers to walk through the store and increasing the probability that other items will be purchased.

The range, variance, and standard deviation are measures of dispersion. The range is rarely used because it is based on only two observations, the biggest and the smallest. It is also can be affected by exceptionally large or small observations. Where we do see this used is in statistical quality control. Firms use performance measures to track productivity, production targets, and whether or not the products that are produced meet design guidelines. The range can be used as a measure that indicates whether or not targets are being met, and the firm will want to reduce the range indicating increasing consistency. For example, the call center will want to reduce the range of hold times in order to ensure consistent customer service.

The variance and standard deviation are other measures of dispersion or how spread out data is that are used far more often. The variance is the average squared difference between each observation and the mean. The standard deviation is the square root of the variance. We use the variance and standard deviation as measures of consistency, reliability, and risk. An example would be evaluating the productivity of two groups of employees each performing the same task. If the average or mean rate of production of the two groups are about equal then one may conclude the groups are equally productive. However, if the standard deviation of the productivity of one group is significantly greater than the second group, then its productivity measures would be far more disperse. This indicates greater inconsistency or unreliability in their productivity. You also see variance and standard deviation as a measure of risk in finance as it is used to evaluate the variability of the returns generated by an investment portfolio.

These examples were meant to illustrate key points, and reinforce concepts related to statistical measures we see used everyday. Statistics is a powerful tool used to facilitate decision-making, but its use is most profitable if the measures that are used are well understood and are properly targeted. Measures of central tendency like the mean, median and mode are each interpreted differently and have their own strengths and weakness. Similarly, the measures of dispersion we use each have their own interpretations and applications.

 

Challenges with Surveys

Thursday, August 28th, 2008

Surveys are a very valuable resource for organizations whether they are businesses, non-profits, government agencies, or political parties. They help the surveyor when they assess the population of interest by identifying the interests, characteristics, perceptions, likes and dislikes, and expectations of those who fill out the surveys as long as the research process is well designed and executed. This means that a random sampling process is used in order to avoid biased results, the survey questions are well thought out with an eye on the hypothesis tests that will be conducted, and the analysis and interpretation of the data is carried out correctly. Problems do arise, though, if the researcher does not completely have access to the population they are interested in. Here is an important example from the 2008 election campaign.

I have selected two articles in the New York Times about the impact of increasing cell phone usage on the ability of political pollsters to assess the opinions of voters. The problem is that these pollsters are not able to as easily contact voters who use cell phones, and as a result this raises issues about whether or not they are clearly surveying the population of interest. One article, written by Megan Thee appeared in the December 7, 2007 issue of the New York Times (http://www.nytimes.com/2007/12/07/us/07polling.html?scp=14&sq=political%20polls%20&%20cell%20phones&st=cse). The author points out the fact that public opinion researchers have historically relied on their home land-line telephones which have the advantage of being geographically defined by the area codes. Cell phones are not as clearly geographically based, and those who use cell phones are less likely to participate in a survey. After all, you pay to use your cell phone by the minute.

The problems associated with cell phone usage go beyond these points. Few drivers on the road would want to see those around them on their cell phones completing a survey, including political polls. Those who primarily use cell phones tend to be younger and they are usually less likely to vote possibly making the issue less important, but is that true in 2008? The Pew Research Center completed a survey showing the differences in the political views between land-line users and cell phone users are not significant and as a result not adversely affecting the results of political polls. The issue is that the number of people, especially younger people who rely on cell phones, is growing. The problem is constructing a survey design to remedy these concerns. An article by the same author that appeared on July 23, 2008 (http://thecaucus.blogs.nytimes.com/2008/07/23/cellphone-only-it-holds-little-sway-in-polls/?scp=6&sq=political%20polls%20&%20cell%20phones&st=cse) follows up on these points. She reported that at that time pollsters were regularly undertaking cell phone surveys to increase the accuracy of their polls. The Pew Research Center has found that the “cell phone only” cohort differs greatly from the general public, but the “cell phone mostly” cohort does not. The “cell phone only” cohort more strongly supported Barack Obama than the general voting public, but is considered to be less likely to vote.

See the following as examples of opinion pollsters to learn more about their process; Pew Research Center (http://people-press.org/), Gallup (http://www.gallup.com/home.aspx), Harris Interactive (http://www.harrisinteractive.com/), Kaiser Family Foundation (http://www.kff.org/), Survey Research Center (http://www.src.isr.umich.edu/), and Zogby (http://www.zogby.com/).

 

 

Uses of Regression Analysis

Saturday, May 31st, 2008

Regression analysis is a valuable tool for modeling purposes, forecasting and analyzing trends, and estimating. Along with analysis of variance (ANOVA) it is one of the most often used statistical tools in business. The following is a brief description of other examples.

One example is the use of regression analysis in analyzing the pricing decisions of businesses and consumers. This is called hedonic pricing; a price determination model in which the price of a product reflects the value of the attributes of that product as determined by consumers. Regression analysis facilitates this analysis because it allows analysts to quantify the relationship between a dependent variable and the independent variables that determine or affect the value of that dependent variable. In other words a mathematical formula is produced by regression analysis that defines the relationship between the dependent variable and independent variables. An example like this which illustrates hedonic pricing is that of the value or price of a house.

The value of a home is dependent on many factors including; the size of the livable area of the home, size of the garage, size of the lot, number of bedrooms, number of bathrooms, whether or not there is a pool, the quality of the school district, etc. It is reasonable to assume that each of these should positively impact the value of a home. For example, the greater the size of the livable area of the home then it is logical to assume that the price of the home will be higher. It is also reasonable to assume that a pool adds value and so does better schools. The question is how much does each additional square foot of livable add to the price of a home? Multiple regression analysis will allow the analyst to use sample data to produce a regression equation in which the value of the home is the dependent variable and each of the factors listed above are the independent variables. Then a regression equation will be produced and the analyst will have a very good estimate regarding, for example, how much a pool adds to value.

Another example is found in finance. It is reasonable to assume that the price per share of stock issued by a particular firm is dependent on the firm’s earnings per share, interest rates, and the overall performance of the stock market as measured by the S&P 500. It is logical to assume that when a firm’s earnings per share increases than the stock price should increase. Higher interest rates will usually lead to lower stock prices because investors may think they will earn more by owning bonds and because the expected value of future earnings and dividends will be lower. Finally, the prices of the shares of stocks issued by firms will increase when the stock market performs better. The question for analysts to answer is related to how much each of these factors contributes to changes in share prices. The Beta or coefficient related to the affect of the overall stock market on the price per share is a key variable finance professionals focus on. A positive Beta or coefficient related to this independent variable means that a stronger stock market positively affect the price per share issued by a firm.

There are many other applications of regression analysis. I use it to create models that I use to forecast the amount of water the City of Phoenix sells through its water utility, and I also forecast water revenues as well. Social scientists use regression analysis to explain changes in the crime rate. It probably makes real sense that a better economy with a truly strong labor market leads to less crime. Regression analysis allows the analyst to quantify these relationships. We also see regression analysis used to create models that used to produce forecasts as well. Now, it is easy to poke fun at the quality of these forecasts, but it is true that no person can anticipate everything. Who could have predicted the events of September 11 or the excessively high stock prices of the late 1990s and 2000 that led to the stock market crash? The point is that these forecast models allow for sensitivity analysis in which the analyst can evaluate the impact of changes in the values of the independent variables and also allows for a detailed and analytical framework to evaluate deviations from forecasts.