‘Big’ Data Science and Scientists

‘BIG’ DATA SCIENCE

If you could possibly take a trip back in time with a time machine and say to people that today a child can interact with one another from anywhere and query trillions of data all over the globe with a simple click on his/her computer they would have said that it is science fiction !

Today more than 2.9 million emails are sent across the internet every second. 375 megabytes of data is consumed by households each day. Google processes 24 petabyte of data per day. Now that’s a lot of data !! With each click, like and share, the world’s data pool is expanding faster than we comprehend. Data is being created every minute of every day without us even noticing it. Businesses today are paying attention to scores of data sources to make crucial decisions about the future. The rise of digital and mobile communication has made the world become more connected, networked and traceable which has typically resulted in the availability of such large scale data sets.

So what is this buzz word “Big Data” all about ? Big data may be defined as data sets whose size is beyond the ability of typical database software tools to capture, create, manage and process data. The definition can differ by sector, depending on what kinds of software tools are commonly available and what sizes of data sets are common in a particular industry.

The explosion in digital data, bandwidth and processing power – combined with new tools for analyzing the data has sparked massive interest in the emerging field of data science. Big data has now reached every sector in the global economy. Big data has become an integral part of solving the world’s problems. It allows companies to know more about their customers, products and on their own infrastructure. More recently, people have become extensively focused on the monetization of that data.

According to a McKinsey Global Institute Report[1] in 2011, simply making big data more easily accessible to relevant stakeholders in a timely manner can create enormous value. For example, in the public sector, making relevant data more easily accessible across otherwise separated departments can sharply cut search and processing time. Big data also allows organizations to create highly specific segmentations and to tailor products and services precisely to meet those needs. This approach is widely known in marketing and risk management but can be revolutionary elsewhere.

Big Data is improving transportation and power consumption in cities, making our favorite websites & social networks more efficient, and even preventing suicides. Businesses are collecting more data than they know what to do with. Big data is everywhere; the volume of data produced, saved and mined is startling. Today, companies use data collection and analysis to formulate more cogent business strategies. Manufactures use data obtained from the use of real products to improve and develop new products and to create innovative after-sale service offerings. This will continue to be an emerging area for all industries. Data has become a competitive advantage and necessary part of product development.

Companies succeed in the big data era not simply because they have more or better data, but because they have good teams that set clear objectives and define what success looks like by asking the right questions. Big data are also creating new growth opportunities and entirely new categories of companies, such as those that collect and analyze industrial data.

One of the most impressive areas, where the concept of Big data is taking place is the area of machine learning. Machine Learning can be defined as the study of computer algorithms that improve automatically through experience. Machine learning is a branch of artificial intelligence which itself is a branch of computer science. Applications range from data mining programs that discover general rules in large data sets, to information filtering systems that learns automatically the user’s interests.

Rising alongside the relatively new technology of big data is the new job title data scientist. An article by Thomas H. Davenport and D.J. Patil in Harvard Business Review[2] describes ‘Data Scientist’ as the ‘Sexiest Job of the 21st Century’. You have to buy the logic that what makes a career “sexy” is when demand for your skills exceeds supply, allowing you to command a sizable paycheck and options. The Harvard Business Review actually compares these “data scientists” to the quants of 1980s and 1990s on Wall Street, who pioneered “financial engineering” and algorithmic trading. The need for data experts is growing and demand is on track to hit unprecedented levels in the next five years

Who are Data Scientists ?

Data scientists are people who know how to ask the right questions to get the most value out of massive volumes of data. In other words, data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Good data scientists will not just address business problems; they will choose the right problems that have the most value to the organization. They combine the analytical capabilities of a scientist or an engineer with the business acumen of the enterprise executive.

Data scientists have changed and keep changing the way things work. They integrate big data technology into both IT departments and business functions. Data scientist’s must also understand the business applications of big data and how it will affect the business organization and be able to communicate with IT and business management. The best data scientists are comfortable speaking the language of business and helping companies reformulate their challenges.

Data science due to its interdisciplinary nature requires an intersection of abilities of hacking skills, math and statistics knowledge and substantive expertise in the field of science. Hacking skills are necessary for working with massive amount of electronic data that must be acquired, cleaned and manipulated. Math and statistics knowledge allows a data scientist to choose appropriate methods and tools in order to extract insight from data. Substantive expertise in a scientific field is crucial for generating motivating questions and hypotheses to interpret results. Traditional research lies at the intersection of knowledge of math and statistics with substantive expertise in a scientific field. Machine learning stems from combining hacking skills with math and statistics knowledge, but does not require scientific motivation. Science is about discovery and raising knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods. Hacking skills combined with substantive scientific expertise without rigorous methods can beget incorrect analysis.

A good scientist can understand the current state of a field, pick challenging questions were a success will actually lead to useful new knowledge and push that field further through their work.

How to become a Data Scientist ?

No university programs in India have yet been designed to develop data scientists, so recruiting them requires creativity. You cannot become a big data scientist overnight. Data Scientist need to know how to code and should be comfortable with mathematics and statistics. Data Scientist need know machine learning & software engineering. Learning data science can be really hard. They also need to know how to organize large data sets and use visualization tools and techniques.

Data scientists need to know how to code either in SAS, SPSS, Python or R. Statistical Package for the Social Sciences (SPSS) is a software package currently developed by IBM is widely used program for statistical analysis in social science. Statistical Analysis System (SAS) software suite developed by SAS Institute is mainly used in advanced analytics. SAS is the largest market-share holder for advanced analytics. Python is a high-level programming language, which is the most commonly used by data scientist’s community. Finally, R is a free software programming language for statistical computing and graphics. R language has become a de facto standard among statisticians for developing statistical software and is widely used for statistical software development and data analysis. R is a part of the GNU Project which is a collaboration that supports open source projects.

A few online courses would help you learn some of the main coding languages. One such course that is available currently is through the popular MOOCs website coursera.org. A specialization course offered by Johns Hopkins University through coursera helps you learn R programming, visualize data, machine learning and to develop data products. There are few more courses available through coursera that helps you to learn data science. Udacity is another popular MOOCs website that offers courses on Data Science, Machine Learning & Statistics. CodeAcademy also offers similar courses to learn data science and coding in Python.

When you start operating with data at the scale of the web, the fundamental approach and process of analysis must and will change. Most data scientists are working on problems that can’t be run on a single machine. They have large data sets that require distributed processing. Hadoop is an open-source software framework for storing and large-scale processing of data-sets on clusters of commodity hardware. MapReduce is this programming paradigm that allows for massive scalability across the servers in a Hadoop cluster. Apache Spark is Hadoop’s speedy Swiss Army knife. It is a fast -running data analysis system that provides real-time data processing functions to Hadoop. It is important that a data scientist must be able to work with unstructured data, whether it is from social media, videos or even audio.

KDnuggets is a popular website among data scientist that mainly focuses on latest updates and news in the field of Business Analytics, Data Mining, and Data Science. KDnuggets also offers a free Data Mining Course – the teaching modules for a one-semester introductory course on Data Mining, suitable for advanced undergraduates or first-year graduate students.

Kaggle is a platform for data prediction competitions. It is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. Kaggle hosts many data science competitions where you can practice, test your skills with complex, real world data and tackle actual business problems. Many employers do take Kaggle rankings seriously, as they can be seen as pertinent, hands-on project work. Kaggle aims at making data science a sport.

Finally to be a data scientist you’ll need a good understanding of the industry you’re working in and know what business problems your company is trying to solve. In terms of data science, being able to find out which problems are crucial to solve for the business is critical, in addition to identifying new ways should the business should be leveraging its data.

A study by Burtch Works[3] in April 2014, finds that data scientists earn a median salary that can be up to 40% higher than other Big Data professionals at the same job level. Data scientists have a median of nine years of experience, compared to other Big Data professionals who have a median of 11 years. More than one-third of data scientists are currently in the first five years of their careers. The gaming and technology industries pay higher salaries to data scientists than all other industries.

LinkedIn, a popular business oriented social networking website voted “statistical analysis and data mining” the top skill that got people hired in the year 2014. Data science has a bright future ahead there will only be more data and more of a need for people who can find meaning and value in that data. Despite the growing opportunity, demand for data scientist has outpaced supply of talent and will for the next five years.

A Study On Business Forecasting Statistics Essay

The aim of this report is to show my understanding of business forecasting using data which was drawn from the UK national statistics. It is a quarterly series of total consumer credit gross lending in the UK from the second quarter 1993 to the second quarter 2009.

The report answers four key questions that are relevant to the coursework.

In this section the data will be examined, looking for seasonal effects, trends and cycles. Each time period represents a single piece of data, which must be split into trend-cycle and seasonal effect. The line graph in Figure 1 identifies a clear upward trend-cycle, which must be removed so that the seasonal effect can be predicted.

Figure 1 displays long-term credit lending in the UK, which has recently been hit by an economic crisis. Figure 2 also proves there is evidence of a trend because the ACF values do not come down to zero. Even though the trend is clear in Figure 1 and 2 the seasonal pattern is not. Therefore, it is important the trend-cycle is removed so the seasonal effect can be estimated clearly. Using a process called differencing will remove the trend whilst keeping the pattern.

Drawing scattering plots and calculating correlation coefficients on the differenced data will reveal the pattern repeat.

Scatter Plot correlation

The following diagram (Figure 3) represents the correlation between the original credit lending data and four lags (quarters). A strong correlation is represented by is showed by a straight-line relationship.

As depicted in Figure 3, the scatter plot diagrams show that the credit lending data against lag 4 represents the best straight line. Even though the last diagram represents the straightest line, the seasonal pattern is still unclear. Therefore differencing must be used to resolve this issue.

Differencing

Differencing is used to remove a trend-cycle component. Figure 4 results display an ACF graph, which indicates a four-point pattern repeat. Moreover, figure 5 shows a line graph of the first difference. The graph displays a four-point repeat but the trend is still clearly apparent. To remove the trend completely the data must differenced a second time.

First differencing is a useful tool for removing non-stationary. However, first differencing does not always eliminate non-stationary and the data may have to be differenced a second time. In practice, it is not essential to go beyond second differencing, because real data generally involve non-stationary of only the first or second level.

Figure 6 and 7 displays the second difference data. Figure 6 displays an ACF graph of the second difference, which reinforces the idea of a four-point repeat. Suffice to say, figure 7 proves the trend-cycle component has been completely removed and that there is in fact a four-point pattern repeat.

Question 2

Multiple regression involves fitting a linear expression by minimising the sum of squared deviations between the sample data and the fitted model. There are several models that regression can fit. Multiple regression can be implemented using linear and nonlinear regression. The following section explains multiple regression using dummy variables.

Dummy variables are used in a multiple regression to fit trends and pattern repeats in a holistic way. As the credit lending data is now seasonal, a common method used to handle the seasonality in a regression framework is to use dummy variables. The following section will include dummy variables to indicate the quarters, which will be used to indicate if there are any quarterly influences on sales. The three new variables can be defined:

Q1 = first quarter
Q2 = second quarter
Q3 = third quarter
Trend and seasonal models using model variables

The following equations are used by SPSS to create different outputs. Each model is judged in terms of its adjusted R2.

Linear trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Quadratic trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Cubic trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Initially, data and time columns were inputted that displayed the trends. Moreover, the sales data was regressed against time and the dummy variables. Due to multi-collinearity (i.e. at least one of the variables being completely determined by the others) there was no need for all four variables, just Q1, Q2 and Q3.

Linear regression

Linear regression is used to define a line that comes closest to the original credit lending data. Moreover, linear regression finds values for the slope and intercept that find the line that minimizes the sum of the square of the vertical distances between the points and the lines.

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.971a

.943

.939

3236.90933

Figure 8. SPSS output displaying the adjusted coefficient of determination R squared

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

17115.816

1149.166

14.894

.000

time

767.068

26.084

.972

29.408

.000

Q1

-1627.354

1223.715

-.054

-1.330

.189

Q2

-838.519

1202.873

-.028

-.697

.489

Q3

163.782

1223.715

.005

.134

.894

Figure 9

The adjusted coefficient of determination R squared is 0.939, which is an excellent fit (Figure 8). The coefficient of variable ‘time’, 767.068, is positive, indicating an upward trend. All the coefficients are not significant at the 5% level (0.05). Hence, variables must be removed. Initially, Q3 is removed because it is the least significant variable (Figure 9). Once Q3 is removed it is still apparent Q2 is the least significant value. Although Q3 and Q2 is removed, Q1 is still not significant. All the quarterly variables must be removed, therefore, leaving time as the only variable, which is significant.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

16582.815

866.879

19.129

.000

time

765.443

26.000

.970

29.440

.000

Figure 10

The following table (Table 1) analyses the original forecast against the holdback data using data in Figure 10. The following equation is used to calculate the predicted values.

Predictedvalues = 16582.815+765.443*time

Original Data

Predicted Values

50878.00

60978.51

52199.00

61743.95

50261.00

62509.40

49615.00

63274.84

47995.00

64040.28

45273.00

64805.72

42836.00

65571.17

43321.00

66336.61

Table 1

Suffice to say, this model is ineffective at predicting future values. As the original holdback data decreases for each quarter, the predicted values increase during time, showing no significant correlation.

Non-Linear regression

Non-linear regression aims to find a relationship between a response variable and one or more explanatory variables in a non-linear fashion.

(Quadratic)
Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.986a

.972

.969

2305.35222

Figure 11

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

11840.996

1099.980

10.765

.000

time

1293.642

75.681

1.639

17.093

.000

time2

-9.079

1.265

-.688

-7.177

.000

Q1

-1618.275

871.540

-.054

-1.857

.069

Q2

-487.470

858.091

-.017

-.568

.572

Q3

172.861

871.540

.006

.198

.844

Figure 12

The quadratic non-linear adjusted coefficient of determination R squared is 0.972 (Figure 11), which is a slight improvement on the linear coefficient (Figure 8). The coefficient of variable ‘time’, 1293.642, is positive, indicating an upward trend, whereas, ‘time2?, is -9.079, which is negative. Overall, the positive and negative values indicate a curve in the trend.

All the coefficients are not significant at the 5% level. Hence, variables must also be removed. Initially, Q3 is removed because it is the least significant variable (Figure 9). Once Q3 is removed it is still apparent Q2 is the least significant value. Once Q2 and Q3 have been removed it is obvious Q1 is under the 5% level, meaning it is significant (Figure 13).

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

11698.512

946.957

12.354

.000

time

1297.080

74.568

1.643

17.395

.000

time2

-9.143

1.246

-.693

-7.338

.000

Q1

-1504.980

700.832

-.050

-2.147

.036

Figure 13

Table 2 displays analysis of the original forecast against the holdback data using data in Figure 13. The following equation is used to calculate the predicted values:

QuadPredictedvalues = 11698.512+1297.080*time+(-9.143)*time2+(-1504.980)*Q1

Original Data

Predicted Values

50878.00

56172.10

52199.00

56399.45

50261.00

55103.53

49615.00

56799.29

47995.00

56971.78

45273.00

57125.98

42836.00

55756.92

43321.00

57379.54

Table 2

Compared to Table 1, Table 2 presents predicted data values that are closer in range, but are not accurate enough.

Non-Linear model (Cubic)
Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.997a

.993

.992

1151.70013

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

17430.277

710.197

24.543

.000

time

186.531

96.802

.236

1.927

.060

time2

38.217

3.859

2.897

9.903

.000

time3

-.544

.044

-2.257

-12.424

.000

Q1

-1458.158

435.592

-.048

-3.348

.002

Q2

-487.470

428.682

-.017

-1.137

.261

Q3

12.745

435.592

.000

.029

.977

Figure 15

The adjusted coefficient of determination R squared is 0.992, which is the best fit (Figure 14). The coefficient of variable ‘time’, 186.531, and ‘time2?, 38.217, is positive, indicating an upward trend. The coefficient of ‘time3? is -.544, which indicates a curve in trend. All the coefficients are not significant at the 5% level. Hence, variables must be removed. Initially, Q3 is removed because it is the least significant variable (Figure 15). Once Q3 is removed it is still apparent Q2 is the least significant value. Once Q3 and Q2 have been removed Q1 is now significant but the ‘time’ variable is not so it must also be removed.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

18354.735

327.059

56.120

.000

time2

45.502

.956

3.449

47.572

.000

time3

-.623

.017

-2.586

-35.661

.000

Q1

-1253.682

362.939

-.042

-3.454

.001

Figure 16

Table 3 displays analysis of the original forecast against the holdback data using data in Figure 16. The following equation is used to calculate the predicted values:

CubPredictedvalues = 18354.735+45.502*time2+(-.623)*time3+(-1253.682)*Q1

Original Data

Predicted Values

50878.00

49868.69

52199.00

48796.08

50261.00

46340.25

49615.00

46258.51

47995.00

44786.08

45273.00

43172.89

42836.00

40161.53

43321.00

39509.31

Table 3

Suffice to say, the cubic model displays the most accurate predicted values compared to the linear and quadratic models. Table 3 shows that the original data and predicted values gradually decrease.

Question 3

Box Jenkins is used to find a suitable formula so that the residuals are as small as possible and exhibit no pattern. The model is built only involving a few steps, which may be repeated as necessary, resulting with a specific formula that replicates the patterns in the series as closely as possible and also produces accurate forecasts.

The following section will show a combination of decomposition and Box-Jenkins ARIMA approaches.

For each of the original variables analysed by the procedure, the Seasonal Decomposition procedure creates four new variables for the modelling data:

SAF: Seasonal factors
SAS: Seasonally adjusted series, i.e. de-seasonalised data, representing the original series with seasonal variations removed.
STC: Smoothed trend-cycle component, which is smoothed version of the seasonally adjusted series that shows both trend and cyclic components.
ERR: The residual component of the series for a particular observation

Autoregressive (AR) models can be effectively coupled with moving average (MA) models to form a general and useful class of time series models called autoregressive moving average (ARMA) models,. However, they can only be used when the data is stationary. This class of models can be extended to non-stationary series by allowing differencing of the data series. These are called autoregressive integrated moving average (ARIMA) models.

The variable SAS will be used in the ARIMA models because the original credit lending data is de-seasonalised. As the data in Figure 19 is de-seasonalised it is important the trend is removed, which results in seasonalised data. Therefore, as mentioned before, the data must be differenced to remove the trend and create a stationary model.

Model Statistics

Model

Number of Predictors

Model Fit statistics

Ljung-Box Q(18)

Number of Outliers

Stationary R-squared

Normalized BIC

Statistics

DF

Sig.

Seasonal adjusted series for creditlending from SEASON, MOD_2, MUL EQU 4-Model_1

0

.485

14.040

18.693

15

.228

0

Model Statistics

Model

Number of Predictors

Model Fit statistics

Ljung-Box Q(18)

Number of Outliers

Stationary R-squared

Normalized BIC

Statistics

DF

Sig.

Seasonal adjusted series for creditlending from SEASON, MOD_2, MUL EQU 4-Model_1

0

.476

13.872

16.572

17

.484

0

ARMA (3,2,0)

Original Data

Predicted Values

50878.00

50335.29843

52199.00

50252.00595

50261.00

50310.44277

49615.00

49629.75233

47995.00

Application of Regression Analysis

Chapter-3

Methodology

In the application of regression analysis, often the data set consist of unusual observations which are either outliers (noise) or influential observations. These observations may have large residuals and affect the parameters of the regression co-efficient and the whole regression analysis and become the source of misleading results and interpretations. Therefore it is very important to consider these suspected observations very carefully and made a decision that either these observations should be included or removed from the analysis.

In regression analysis, the basic step is to determine whether one or more observations can influence the results and interpretations of the analysis. If the regression analysis have one independent variable, then it is easy to detect observations in dependent and independent variables by using scatter plot, box plot and residual plot etc. But graphical method to identify outlier and/or influential observation is a subjective approach. It is also well known that in the presence of multiple outliers there can be a masking or swamping effect. Masking (false negative) occurs when an outlying subset remains undetected due the presence of another, usually adjacent subset. Swamping (false positive) occurs when usual observation is incorrectly identified as outlier in the presence of another usually remote subset of observations.

In the present study, some well known diagnostics are compared to identify multiple influential observations. For this purpose, first, robust regression methods are used to identify influential observation in Poisson regression, then to conform that the observations identified by robust regression method are genuine influential observations, some diagnostic measures based on single case deletion approach like Pearson chi-square, deviance residual, hat matrix, likelihood residual test, cook’s distance, difference of fits, squared difference in beta are considered but in the presence of masking and swamping diagnostics based on single case deletion fail to identify outlier and influential observations. Therefore to remove or minimize the masking and swamping phenomena some group deletion approaches; generalized standardized Pearson residual, generalized difference of fits, generalized squared difference in beta are taken.

3.2 Diagnostic measures based on single case deletion

This section presents the detail of single case deleted measures which are used to identify multiple influential observations in Poisson regression model. These measures are change in Pearson chi-square, change in deviance, hat matrix, likelihood residual test, cook’s distance, difference of fits (DFFITS),squared difference in beta(SDBETA).

Pearson chi-square

To show the amount of change in Poisson regression estimates that would occurred if the kth observation is deleted, Pearson ?2 statistic is proposed to detect the outlier. Such diagnostic statistics are one that examine the effected of deleting single case on the overall summary measures of fit.

Let denotes the Pearson ?2 and denotes the statistic after the case k is deleted. Using one-step linear approximations given by Pregibon (1981). The decrease in the value of statistics due to deletion of the kth case is

? = E- , k=1,2,3,…..,n 3.1

is defined as:

3.2

=

And for the kth deleted case is:

= 3.3

Deviance residual

The one-step linear approximation for change in deviance when the kth case is deleted is:

?D = D E- D(-k) 3.4

Because the deviance is used to measure the goodness of fit of a model, a substantial decrease in the deviance after the deletion of the kth observation is indicate that is observation is a misfit. The deviance of Poisson regression with kth observation is:

D=2 3.5

Where = exp (

D(-k)= 2 3.6

A larger value of ?D(-k) indicates that the kth value is an outlier.

Hat matrix:

The Hat matrix is used in residual diagnostics to measure the influence of each observation. The hat values, hii, are the diagonal entries of the Hat matrix which is calculated using

H=V1/2X(XTVX)-1XTV1/2 3.7

Where V=diag[var(yi)(ii)]-1

var(yi)=E(yi)=

In Poisson regression model

=i) = (,where g function is usually called the link function and With the log link in Poisson regression

i=

=

V=diag( 3.8

(XTVX)-1 is an estimated covariance matrix of and hii is the ith diagonal element of Hat matrix H. The properties of the diagonal element of hat matrix i.e leverage values are

0

and

Where k indicates the parameter of the regression model with intercept term. An observation is said to be influential if ckn. where c is a suitably constant 2 and 3 or more. Using twice the mean thumb rule suggested by Hoaglin and Welsch (1978), an observation with 2kn considered as influential.

Likelihood residual test

For the detection of outliers, Williams (1987) introduced the likelihood residual. The squared likelihood residual is a weighted average of the squared standardized deviance and Pearson residual is defined as:

3.9

and it is approximately equals to likelihood ratio test for testing whether an observation is an outlier and it also called approximate studentized residual, is standardized Pearson residual is defined as:

= 3.10

is standardized deviance residual is defined as:

= 3.11

= sign(

Where is called the deviance residual and it is another popular residual because the sum of square of these residual is a deviance statistic.

Because the average value, KN, of hi is small is much closer to than to ,and therefore also approximately normally distributed. An observation is considered to be influential if |t(1, n

Difference of fits test (DFFITS)

Difference of fits test for Poisson regression is defined as:

(DFFITS)i= , i=1,2,3,…..,n 3.12

Where and are respectively the ith fitted response and an estimated standard error with the ith observation is deleted. DFFITS can be expressed in terms of standardized Pearson residuals and leverage values as:

(DFFITS)i= 3.13

= =

An observation is said to be influential if the value of DFFITS 2.

Cook’s Distance:

Cook (1977) suggests the statistics which measures the change in parameter estimates caused by deleting each observation, and defined as:

CDi= 3.14

Where is estimated parameter of without ith observation. There is also a relationship between difference of fits test and Cook’s distance which can be expressed as:

CDi= 3.15

Using approximation suggested by Pregibon’s C.D can be expressed as:

() 3.16

Observation with CD value greater than 1 is treated as an influential.

Squared Difference in Beta (SDFBETA)

The measure is originated from the idea of Cook’s distance (1977) based on single case deletion diagnostic and brings a modification in DFBETA (Belsley et al., 1980), and it is defined as

(SDFBETA)i = 3.17

After some necessary calculation SDFBETA can be relate with DFFITS as:

(SDFBETA)i = 3.18

The ith observation is influential if (SDFBETA)i

Diagnostic measures based on group deletion approach

This section includes the detail of group deleted measures which are used to identify the multiple influential observations in Poisson regression model. Multiple influential observations can misfit the data and can create the masking or swamping effect. Diagnostics based on group deletion are effective for identification of multiple influential observations and are free from masking and swamping effect in the data. These measures are generalized standardized Pearson residual (GSPR), generalized difference of fits (GDFFITS) and generalized squared difference in Beta(GSDFBETA).

3.3.1 Generalized standardized Pearson residual (GSPR)

Imon and Hadi (2008) introduced GSPR to identify multiple outliers and it is defined as:

i 3.19

= i 3.20

Where are respectively the diagonal elements of V and H (hat matrix) of remaining group. Observations corresponding to the cases |GSPR| > 3 are considered as outliers.

3.3.2 Generalized difference of fits (GDFFITS)

GDFFITS statistic can be expressed in terms of GSPR (Generalized standardized Pearson residual) and GWs (generalized weights).

GWs is denoted by and defined as:

for i 3.21

= for i 3.22

A value having is larger than, Median (MAD ( is considered to be influential i.e

> Median (MAD (

Finally GDFFITS is defined as

(GDFFITS)i= 3.23

We consider the observation as influential if

GDFFITSi 3

3.3.3 Generalized squared difference in Beta (GSDFBETA)

In order to identify the multiple outliers in dataset and to overcome the masking and swamping effect GSDFBETA is defined as:

GSDFBETAi = for i 3.24

= for i 3.25

Now the generalized GSDFBETA can be re-expressed in terms of GSPR and GWs:

GSDFBETAi = for i 3.26

= for i 3.27

A suggested cut-off value for the detection of influential observation is

GSDFBETA

Analysis of variance models

Abstract: Analysis of variance (ANOVA) models has become widely used tool and plays a fundamental role in much of the application of statistics today. Two-way ANOVA models involving random effects have found widespread application to experimental design in varied fields such as biology, econometrics, quality control, and engineering. The article is comprehensive presentation of methods and techniques for point estimation, interval estimation, estimation of variance components, and hypotheses tests for Two-Way Analysis of Variance with random effects.

Key words: Analysis of variance; two-way classification; variance components; random effects model

1. Introduction

The random effects model is not fraught with questions about assumptions as is the mixed effects model. Concerns have been expressed over the reasonableness of assuming that the interaction term abij is tossed into the model independently of ai and bj . However, uncorrelatedness, which with normality becomes independence, does seem to emerge from finite sampling models that define the interaction to be a function of the main A and B effects. The problem usually of interest is to estimate the components of variance.

The model (1) is referred to as a cross-classification model. A slightly different and equally important model is the nested model. For this latter model see (5) and the related discussion.

2. Estimation of variance components

The standard method of moments estimators for a balanced design(i.e., = n ) are based on the expected mean squares for the sums of nij squares. The credentials of the estimators (4) are that they are uniform minimum variance unbiased estimators (UMVUE) under normal theory, and uniform minimum variance quadratic unbiased estimators (UMVQUE) in general. They do, however, suffer the embarrassment of sometimes being negative, except for .e which is always positive. The actual maximum likelihood estimators would occur on a boundary rather than being negative. The best way is to always adjust an estimate to zero rather than report a negative value. It should certainly be possible to construct improved estimators along the lines of the Klotz-Milton-Zacks estimators used in the one-way classification. However, the details on these estimators have not been

worked out by anyone for the two-way classification. Estimating variance components from unbalanced data is not as straight-forward as from balanced data. This is so for two reasons. First, several methods of estimation are available (most of which reduce to the analysis of variance method for balanced data), but no one of them has yet been clearly established as superior to the others. Second, all the methods involve relatively cumbersome algebra; discussion of unbalanced data can therefore easily deteriorate into a welter of symbols, a situation we do our best (perhaps not successfully) to minimize here1.

On the other hand, extremely unbalanced designs are a horror story. A number of different methods have been proposed for handling them, but all involve extensive algebraic manipulations. The technical detail required to carry out these analyses exceeds the limitations set for this article. On occasion factors A and B are such that it makes no sense to postulate the existence of interactions, so the terms abij should be dropped from (1). In this case .ab disappears from (3) and the estimators for .a and 1 Djordjevic V., Lepojevic V., Henderson?s approach to Variance Components estimation for unbalanced data, Facta Universitatis, Vol.2 No.1, 2004. pg. 59

Another variation on the model (1) gives rise to the nested model. In general, the nested model for components of variance problems occur more frequently in practice than does the cross-classification model. In the nested model the main effects for one factor, say, B, are missing in (1). The reason is that the entities creating the different levels of factor B are not the same for different levels of factor A. For example, the levels (subscript i ) of factor A might represent different litters, and the levels (subscript j) of factor B might be different animals, which are a different set for each litter. The additional subscript k might denote repeated measurements on each animal.

To be specific, the formal model for the nested design is: and independence between the different lettered variables. It is customary with this model to use the symbol b rather than ab because the interpretation for this term has changed from synergism or interaction to one of a main effect nested inside another main effect. For a balanced design the method of moments estimators are based on the sums of squares: which have degrees of freedom I-1, I (J-1), and IJ(n-1) , respectively. The mean squares corresponding to (7) have the expectations: The increasing tier phenomenon exhibited in (8) holds for nested designs with more than two effects. The only complication arises when one or more of the estimates are negative. This is an indication that the corresponding variance components are zero or negligible. One might want to resent any negative estimates to zero, combine the adjacent sums of squares, and subtract the combined mean squares from the mean squares higher in the tier.

Extension of these ideas to the unbalanced design does not represent as formidable a task for the nested design as it does for the crossed design. The sums of squares (7), appropriately modified for unbalanced designs, form the basis for the analysis. It is even possible to allow for varying numbers Ji of factor B for different levels of factor A.

3. Tests for variance components

The appropriate test statistics for various hypothesis of interest can be determined by examining the expected mean squares in the table of analysis of variance. However, we encounter the difficulty that even under the normality assumption exact F tests may not be available for some of the An analogous F statistic provides a test for H0:.b 2 =0 . Under the alternative no null hypotheses, these ratios are distributed as the appropriate ratios of multiplicative constants from (10) times central F random variables. Thus power calculations are made from central F tables for fixed effects models. The F tests of H :.2 =0 and H :.2 =0 mentioned in the 0 ab 0 a preceding paragraph are uniformly most powerful similar tests.

However, they are not likelihood ratio tests, which are more complicated because of boundaries to the parameter space. Although their general use is not recommended because of their extreme sensitivity to no normality, confidence intervals can be constructed based on the distribution theory 10. The complicated method of Bulmer (1957), which is described in Scheffe [11 pg. 27-28], is available. However, the approximate method of Satterhwaite [10 pg. 110-114] may produce just as good results.

The distribution theory for the sums of squares (7) used in conjunction with nested designs is straightforward and simple. To test the hypothesis H0:.b2 =0 one uses the F ratio MS (B)/MS(E), and to test H0:.a 2 =0 the appropriate ratio is MS (A)/MS (B). In all nested designs the higher line in the tier is always tested against the next lower line. If a conclusion is reached that .b2 =0 , then the test of H0:.a2 =0 could be improved by combining SS (B) and SS(E) to form a denominator sum of squares with I(J-1) + I J (n-1) degrees of freedom. Under alternative hypotheses these F ratios are distributed as central F ratios multiplied by the appropriate ratio of variances. This can be exploited to produce confidence intervals on some variance ratios. However, one still needs to rely on the approximate Satterhwaite [10 pg. 110-114] approach for constructing intervals on individual components.

4. Estimations of individual effects and overall mean
For the two-way crossed classification with random effects interest

The classical approach would be to use the estimates ?^ij = yij. The idea would be to shrink the individual estimates toward the common mean as in. where the shrinking factor S depends on the sums of squares SS (E), SS (AB), SS(B), and SS(A) . Unfortunately, the specific details on the construction of an appropriate S have not been worked out for the two-way classification as they have been for the one-way classification. Alternatively, attention might center on estimating a1,…, aI , or, equivalently, on the levels of factor B. Again, specific estimators have not been proposed to date for handling this situation.

In the nested design one sometimes wants an estimate and confidence interval for ?. One typically uses ?^= y… . In the balanced case this estimator has variance. This can be estimated by MS (A)/I J n. In the unbalanced case an estimate for the variability of y can be obtained by substituting estimates .^2, .^b2 and 2 into the expression for the variance of y… . Alternative estimators using different weights may be worth considering in the unbalanced case.

5. Conclusion

Analysis of variance (ANOVA) models have become widely used tools and play a fundamental role in much of the application of statistics today. In particular, ANOVA models involving random effects have found

widespread application to experimental design in a variety of fields requiring Two-Way Analysis of Variance for Random Models measurements of variance, including agriculture, biology, animal breeding, applied genetics, econometrics, quality control, medicine, engineering, and social sciences. With a two-way classification there are two distinct factors affecting the observed responses. Each factor is investigated at a variety of different levels in an experiment, and the combination of the two factors at different levels form a cross-classification. In a two-way classification each factor can be either fixed or random. If both factors are random, the model is called a random effects model.

Various estimators of variance components in the two-way crossed classification random effects model with one observation per cell are compared under the standard assumptions of normality and independence of the random effects. Mean squared error is used as the measure of performance. The estimators being compared are: the minimum variance unbiased, the restricted maximum likelihood, and several modifications of the unbiased and the restricted maximum likelihood estimators.

Rainfall Pattern in Enugu State, Nigeria

CHAPTER ONE

1.0 INTRODUCTION

Enugu State is located in the southeastern part of Nigeria created in 1991 from the old Anambra state and the principal cities in the state are Enugu,Agani,Awgu,Udi,Oji-River and Nsukka. The state shares borders with Abia and Imo State to the south, Ebonyi State to the East, Benue state to the Northeast, Kogi state to the Northwest and Anambra state to the West.

Enugu, the capital city of Enugu state, is approximately 21/2 driving hours away from Port Harcourt where coal shipments exited Nigeria. The word “Enugu” (from Enu Ugwu) means “the top of the hill”. The first European settlers arrived in the area in 1909, led by a British mining engineer, named Albert Kitson. In his quest for silver, he discovered coal in the Udi Ridge, colonial Governor of Nigeria Frederick Lugard took keen interest in the discovery, and by 1914 the first shipment of coal was made to Britain. As mining activities increased in the area, a permanent cosmopolitan settlement emerged, supported by a railway system. Enugu acquired township status in 1917 and became strategic to Britain interests.

Foreign businesses began to move into Enugu, the most notable of which were John Holt, Kingsway Stores, British Bank of West Africa and United Africa Company. From Enugu the British administration was able to spread its influence over the southern province of Nigeria. The colonial past of Enugu is today evidenced by the Georgian building types and meandering narrow roads within the residential area originally reserved for the whites, an area which is today called the Government Reserved Area (GRA).

The state Government and the Local government are the levels of government in Enugu state and have 17 Local Government areas. Economically, the state is predominantly rural and agrarian, with a substantial proportion of its working population engaged in farming, although trading (18.8%) and services (12.9%) are also important. In the urban areas trading is the dominant occupation, followed by services. A small proportion of the population is also engaged in manufacturing activities, with the most pronounced among them located in Enugu, Oji, Ohebedim and Nsukka. The state boasts of a number of markets especially at each of the divisional headquarters, prominent of which is the Ogbete Main market in the State capital, Electricity supply is relatively stable in Enugu and its Environs. The Oji River power station (which used to supply electricity to all of Eastern Nigeria) is located in Enugu state. The state had a population of 3,267,837 people at the census held in 2006 (estimated at over 3.8 million in 2012), it is home of the Igbo of southeastern Nigeria.

The average temperature in this city is cooler to mild (60 degrees Fahrenheit) in its cooler months and gets warmer to hot in its warmer months (upper 80 degrees Fahrenheit) and very good for outdoor activities with family and friends or just for personal leisure. Enugu has good soil-land and climatic conditions all year round, sitting at about 223 meters (732 ft) above sea level, and the soil is well drained during its rainy seasons.

The main temperature in Enugu state in the hottest month of February is about 87.16 0F (30.64 0C), while the lowest temperatures occur in the month of November, reaching 60.54 0F (15.86 0C). The lowest rainfall of about 0.16 cubic centimeters (0.0098 cu in) is normal in February, while the highest is about 35.7 cubic centimeters (2.18 cu in) in July.

The differences in altitude and relief create a large variation in climate in various regions of the country. In places that are characterized as semi-arid zones, climate shows wide fluctuation from year to year and even within seasons in the year. Semi arid regions receive very small, irregular, and unreliable rainfall (Workneh, 1987).

The annual cycle of the climatology of the rainfall over tropical Africa and in particular over Nigeria, is strongly determined by the position of the Inter Tropic Convergence Zone (ITCZ) (Griffiths, 1971). Variations in rainfall pattern throughout the country are the result of differences in elevation and seasonal changes in the atmospheric pressure systems that control the prevailing winds. The climate of Nigeria is characterized by high rainfall variation (Yilma et al., 1994). In Nigeria, several regions receive rainfall throughout the year, but in some regions rainfall is seasonal and low making irrigation necessary (Alemeraw and Eshetu, 2009). Rainfall is the most critical and key variable both in atmospheric and hydrological cycle. Rainfall patterns usually have spatial and temporal variability. This variability affects agricultural production, water supply, transportation, environment and urban planning, thus, the entire economy of a country, and the existence of its people. Rainfall variability is assumed to be the main cause for the frequently occurring climate extreme events such as drought and flood. These natural phenomena affect badly the agricultural production and hence the economy of the nation. In regions where the year-to-year variability is high, people often suffer great calamities due to floods or droughts. Even though damage due to extremes of rainfall cannot be avoided completely, a forewarning could certainly be useful (Nicholls, 1980). Nigeria is one of the countries whose economy is highly dependent on rain-fed agriculture and also facing recurring cycles of flood and drought. Current climate variability is already imposing a significant challenge to Nigeria in general and Enugu in particular, by affecting food security, water and energy supply, poverty reduction and sustainable development efforts, as well as by causing natural resource degradation and natural disasters. Recurrent floods in the past caused substantial human life and property loss in many parts of the country.

Methods of prediction of rainfall extreme events have often been based on studies of physical effects of rainfall or on statistical studies of rainfall time series. Rainfall forecast is relevance to the agriculture sector, since it contributes significantly to the economy of countries like Nigeria. In order to model and predict hydrologic events, one can use stochastic methods like time series methods. Numerous attempts have been made to predict behavioral pattern of rainfall using various techniques (Yevjevich, 1972; Dulluer and Kavas, 1978; Tsakiris, 1998). Awareness about the characteristics of the rainfall over an area such as the source, quantity, variability, distribution and the frequency of rainfall is essential for the implication in utilization and associated problems. Assessing rainfall variability is practically useful in making decision, risk management and optimum usage of water resources of countries. Thus, it is important to obtain accurate rainfall forecast at various geographic levels of Nigeria and work towards identifying periodicities in order to help policy makers improve their decisions by taking into consideration the available and future water resources. In this study, univariate Box-Jenkins methodology to build ARIMA model are used for assessing the rainfall pattern in Enugu State based on data from Nigerian Meteorological Agency.

1.1 Weather and Climate

Weather and climate over the earth are not constant with time: they change on different time series ranging from the geological to the diurnal through annual, the difference between weather and climate is a measure of time. Weather is what condition of the atmosphere over a short period of time and climate is how the atmosphere behaves over relatively long period of time. Seasonal and intra-seasonal time scales. Such variability is an inherent characteristic of the climate. The study of climatic fluctuations involves description and investigation of causes and effects of these fluctuations in the past and their statistical interpretation. Much of the work done is about variability of the two important meteorological parameters: rainfall and temperature. Rainfall is a term used to refer to water falling in drops after condensation of the atmospheric vapor. Also rainfall is the resultant product of a series of complex interactions taking place within the earth-atmosphere system. Rainfall is only water that falls from the sky, whereas precipitation is any wet things that fall from the sky, which include snow, frozen rain….etc. Water in all its forms and in all its various activities plays a crucial role in sustaining both the climate and life. It is also a major factor for planning and management of water resource project and agricultural production. Even though Nigeria enjoys a fairly good amount of rainfall, wide variability in its distribution with respect to space and time are responsible for the two extremes events (floods and droughts) (Yilma et. al,1994).

1.2 Rainfall Characteristics

Rainfall varies with latitude, elevation, topography, seasons, distance from the sea, and coastal Sea-surface temperature. Nigeria enjoys the humid tropical climate type. Because of its location just north of the equator, also, Nigeria enjoys a truly tropical climate characterized by the hot and wet conditions associated with the movement of the inter-Tropical convergence Zone (ITCZ) north and south of the equator.

While there is a general decrease in rainfall in Nigeria, the coastal area is experiencing slight increase. Apart from the general southward shift in rainfall patterns, the duration has also reduced from 50-360 (1993-2003) to 30-280 (2003-2013) rainy days per year. This has created ecological destabilization and altered the pattern of the vegetation belt especially in the northern part of the country. The rainfall pattern has also enhanced wind erosion and desertification, soil erosion and coastal flooding in the north, east and coastal areas of Nigeria respectively.

The country experiences consistently high temperatures all year round. Since temperature varies only slightly, rainfall distribution, over space and time, becomes the single most important factor in differentiating the seasons and climatic distribution are however dependent on the two air masses that prevail over the country. Their influences are directly linked to the movement of the ITCZ, north and south of the equator. The two air masses are the Tropical maritime(Tm) and the Tropical continental (Tc). The former is associated with the moisture-laden south-west winds (south westerlies) which blow from the Atlantic Ocean, while the latter is associated with the dry and dusty north-east winds (easterlies) which blow from the Sahara Desert.

Conversely, with the movement of the ITCZ into the Northern Hemisphere, the rain-bearing south westerlies prevail as far inland as possible to bring rain fall during the wet season. The implication is that there is a prolonged rainy season in the far south, while the far north undergoes long dry periods annually. Nigeria, therefore, has two major seasons, the lengths of which vary from north to south. The mean annual rainfall along the coast in the south-east is 4000mm while it is 500mm in the north-east.

Nigeria can, thus be broadly divided into the following climatic regions:

the humid sub-equatorial, in the southern lowlands
the hot tropical continental, in the far north
the moderated sub-temperate in the high plateaus and mountains
the hot, wet tropical, in the hinterland (the middle-belt )

1.3 The main effects of Rainfall

Trends in rainfall extremes have enormous implications. Extreme rainfall events cause significant damage to agriculture, ecology, and infrastructure. They also cause disruption to human activities, injury, and loss of life. Socioeconomic activities including agriculture, power generating, water supply, human health, etc. are also very sensitive to climate variations. As a result, Nigeria economy is heavily dependent on rainfall for generating employment, income, and foreign currency. Thus, rainfall is considered as the most important climatic element that influences Nigeria agriculture. The severity and frequency of occurrence of rainfall extremes events (meteorological, hydrological, and agricultural) vary for different parts of the country.

Drought: Drought is an insidious hazard of nature. It is often referred to as a “creeping phenomenon” and its impacts vary from region to region. Drought can therefore be difficult for people to understand; it is equally difficult to define, because what may be considered a drought in, say, Bali (six days without rain) would certainly not be considered a drought in Libya (annual rainfall less than 180 mm). Some drought years have coincided with EN events, while others have followed it. According to DDAEPA (2011) the trend of decreasing annual rainfall and increased rainfall variability is contributing to drought conditions in Nigeria Administration. The average annual rainfall patterns of Abuja for the periods 1999 to 2008 and 1984 to 1991 show two important trends. First, annual average rainfall has declined from the mean value by about 8.5% and 10% respectively. Secondly, the variability of rainfall shows an overall increasing trend, suggesting greater rainfall unreliability. These rainfall patterns have led to serious drought/flood episodes throughout the Administration.

Flood: Floods are known as the most frequent and devastating natural disasters in both developed and developing countries (Osti et al., 2008). Between 2000 and 2008 East Africa has experienced many episodes of flooding. Almost all of these flood episodes have significantly affected large parts of Ethiopia. Ethiopia’s topography characteristics has made the country pretty vulnerable to floods and resulting destruction and damage to life, economic, livelihoods, infrastructure, services and health system (FDPPA, 2007). Flooding is common in Ethiopia during the rainy season between June and September and the major type of flooding which the country is experiencing are flash flood and river floods (FDPPA, 2007).

Like other regions of Nigeria, the issue of flood continues to be of growing concern in Enugu especially to peoples residing in lowlands, along or near the flood courses as well as village located at the foot of hills and mountains. Flood disasters are occurring more frequently, and having an ever more dramatic impact on Enugu in terms of the costs on lives, livelihoods and environmental resources. The topography of Enugu Administration mainly consists of mountains and hills with steep slope, valleys, and river basins. The catchment characteristics accompanied with its large area coverage coupled with torrential rain fall during the short and long rainy season had been the main factors that contribute to the pervious flood events.

Soil Erosion: when soil moves from one location to another, it is referred to as soil erosion. The impact of rainfall striking the surface can cause soil erosion; erosion is a concern for farmers as their valuable, nutrient rich top soil can be washed away from rainfall. It can also weaken structures such as bridges or wash out roads. Vegetation can decrease the amount of soil that is eroded during a rain. Erosion has been going on and has produced river valleys and shaped hills and mountains. Such erosion is generally slow but can cause a rapid increase in the rate at which soil is eroded (i.e. a rate faster than natural weathering of bedrock can produce new soil). This has resulted in a loss of productive soil from crop and grazing land, as well as layers of infertile soil being deposited on formerly fertile crop lands: the formation of gullies: silting of lakes and streams, and land slips.

1.4 Aim and Objectives of the study

The main aim of this study is to analyze rainfall pattern in Enugu State using appropriate time series methods based on 15 years (January, 1999-Decimeber, 2013) data recorded at Nigerian Meteorological Agency (Enugu State).

Specific Objectives

1. To fit appropriate time series model to the monthly rainfall data.

2. To forecast the rainfall pattern in the study area.

1.5 Data source

The monthly rainfall data in millimeters for the period January, 1999 to December, 2013, collected from the Nigerian Meteorological Agency (Enugu State) were used in the study. The site was chosen due to availability of relatively long series of meteorological data, the data is a secondary data.

1.6 Significance of the Study

Knowledge of what happens to the water that reaches the earth surface will assist the study of many surface and subsurface water problems, for efficient control and management of water resources. For a country like Nigeria, whose welfare depends very much on rain-fed agriculture, a quantitative knowledge of water requirements of the region, availability of water for plant growth and supplemental irrigation, etc. on a monthly or seasonal basis is an essential requirement for agricultural development. In this regard, increased capacity to manage future climate change and weather extremes can also reduce the magnitude of economic, social and human damage and eventually, lead to better resistance. Assessing seasonal rainfall characteristics based on past records is essential to evaluate rainfall extreme risk and to contribute to development of mitigation strategies. Therefore, a reliable rainfall forecasting and assessing behavior at station, regional and national levels is very important.

Statistics Essays | Analysis of Data

Consider and discuss the required approach to analysis of the data set provided.

As part of this explore also how you would test the hypothesis below and explain the reasons for your decisions. Hypothesis 1: Male children are taller than female children. Null hypothesis; There is no difference in height between male children and female children. Hypothesis 2: Taller children are heavier. Null hypothesis: There is no relationship between how tall children are and how much they weigh.

Analysis of data set

The data set is a list of 30 children’s gender, age, height, the data weight, upper and lower limb lengths, eye colour, like of chocolate or not andIQ.

There are two main things to consider before and the data. These are the types of data and the quality of the data as a sample.

Types of data could be nominal, ordinal, interval or ratio.Nominal is also know as categorical. Coolican (1990) gives more details of all of these and his definitions have been used to decide the types of data in the data set.

It is also helpful to distinguish between continuous numbers, which could be measured to any number of decimal places an discrete numbers such as integers which have finite jumps like 1,2 etc.

Gender

This variable can only distinguish between male or female.There is no order to this and so the data is nominal.

Age

This variable can take integer values. It could be measured to decimal places, but is generally only recorded as integer. It is ratio data because, for example, it would be meaningful to say that a 20 year old person is twice as old as a 10 year old.

In this data set, the ages range from 120 months to 156months. This needs to be consistent with the population being tested.

Height

This variable can take values to decimal places if necessary. Again it is ratio data because, for example, it would be meaningful to say that a person who is 180 cm tall is 1.5 times as tall as someone 120cmtall. In this sample it is measured to the nearest cm.

Weight

Like height, this variable could take be measured to decimal places and is ratio data. In this sample it is measured to the nearest kg.

Upper and lower limb lengths

Again this variable is like height and weight and is ratio data.

Eye colour

This variable can take a limited number of values which are eye colours. The order is not meaningful. This data is therefore nominal(categorical).

Like of chocolate or not

As with eye colour, this variable can take a limited number of values which are the sample members preferences. In distinguishing merely between liking and disliking, the order is not meaningful. This data is therefore nominal (categorical).

IQ

IQ is a scale measurement found by testing each sample member. As such it is not a ratio scale because it would not be meaningful to say, for example, that someone with a score of 125 is 25% more intelligent than someone with a score of 100.

There is another level of data mentioned by Cooligan into which none of the data set variables fit. That is Ordinal Data. This means that the data have an order or rank which makes sense. An example would be if 10students tried a test and you recorded who finished quickest, 2ndquickest etc, but not the actual time.

The data is intended to be a sample from a population about which we can make inferences. For example in the hypothesis tests we want toknow whether they are indicative of population differences. The results can only be inferred on the population from which it is drawn it would not be valid otherwise.

Details of sampling methods were found in Bland (2000). To accomplish the required objectives, the sample has to be representative of the defined population. It would also be more accurate if the sample is stratified by known factors like gender and age. This means that, for example, the proportion of males in the sample is the same as the proportion in the population.

Sample size is another consideration. In this case it is 30.Whether this is adequate for the hypotheses being tested is examined below.

Hypothesis 1: Male children are taller than female children.

Swift (2001) gives a very readable account of the hypothesis testing process and the structure of the test.

The first step is to set up the hypotheses:

The Null hypothesis is that there is no difference in height between male children and female children.

If the alternative was as Coolican describes it as “we do not predict in which direction the results will go then it would have been a two-tailed test. In this case the alternative is that males are taller it is therefore a specific direction and so a one-tailed test is required.

To test the hypothesis we need to set up a test statistic and then either match it against a pre-determined critical value or calculate the probability of achieving the sample value based on the assumption that the null hypothesis is true.

The most commonly used significance level is 0.05. Accordingto Swift (2001) the significance level must be decided before the data is known. This is to stop researchers adjusting the significance level to get the result that they want rather than accepting or rejecting objectively.

If the test statistic probability is less than 0.05 we would reject the null hypothesis that there is no difference between males and females in favour of males being heavier on the one sided basis.

However it is possible for the test statistic to be in the rejection zone when in fact the null hypothesis is true. This is called a TypeI error.

It is also possible for the test statistic to be in the acceptance zone when the alternative hypothesis is true (in other words the null hypothesis is false). This is called a Type II error. Power is 1 -probability of a Type II error and is therefore the probability of correctly rejecting a false null hypothesis. Whereas the Type I error is set at the desired level, the Type II error depends on the actual value of the alternative hypothesis.

Coolican (1990) sets out the possible outcomes in the following table:

Concept of Randomness in Statistics

Part I Introduction

Introduction on Freshman Seminar

Freshman seminar 1205M offers great opportunities for students to work intimately with professors from the Science faculty on various areas of mathematics. The seminar was targeted to encourage us to open our minds to creative ideas and develop curiosity of influential mathematical theories and various subgroups of contemporary mathematics. In addition to exposure to selected subtopics in contemporary mathematics, we had valuable opportunities to develop our presentation and academic essay writing skills.

1.2 Important roles of Analogy and Intuition

The historical development of mathematics is significantly influenced by intuition acquired from real life experience and analogy quoted from various other areas (Harrison & Treagust, 1993). Analogy is an extraordinary method in developing new concepts in the history of science. In this module, famous topics in the contemporary mathematics, including geometry, number theory, set theory, randomness and game theory have been discussed. Among all topics, our team worked on Analogy and Intuition of Randomness. In this seminar, various creative analogy ideas and intuition/counter-intuition thinking have been presented based on specific cases in modern mathematics.

1.3 Method on Research and Presentation

Our team collected relevant source materials on the randomness, including books, journals, and websites on the Internet. As for presenting applications of randomness, in particular, we focused on the historical development of randomness theory, the simplified key concepts in randomness, the counter-intuitive stories happened, overlapping with other fields in nature, and some significant and influential applications of randomness theory in our daily life. We omitted complicated theories, technical formulas and rigorous proofs. Throughout the whole semester, our team has conducted two informal presentations on randomness. In order to illustrate randomness clearly and intuitively, we adopted various methods: problem solving, in-class quizzes, presentations and attractive stories. Subtopics included: Biology, quantum physics, finance, audio engineering, statistics and so on.

Part II Report on Randomness

2.1 Randomness on Communication Theory

2.1.1 Introduction of Noise in Communication Theory

In statistics, irrelevant or meaningless data is considered noise (random error). Whereas in communication theory, random disturbance in a signal is called “noise”. In essence, noise consists of a large number of disturbances with a statistically randomized time distribution.

It is assumed that noise signals have power spectral density that is proportional to 1/f^?, where f stands for frequencies of noise. For example, the spectral density of white noise is ? = 0, while pink noise has ? = 1. This special character is widely used for distinguishing among colors of noise.

2.1.2 Laws and Criterions Used to Distinguish Colors and Characteristics of Noise

The color names for noise are derived from an analogy between the spectrum of noise and the equivalent spectrum of lights with different visible colors. For instance, if we translate the sound wave of “white noise” into light waves, the resulting light will be viewed as white color. In electronics, physics, and many other areas, the color of a noise signal is usually understood as some characteristics of its power spectrum. As different colors of noise have significantly different properties. Therefore, each kind of noise requires a specific “color” to match with it.

Start with the most well-known one: “White noise”, people name different noise after colors. This is in analogy with white color light, which has a flat spectrum of power on its frequency range. Other colors, such as violet, blue, red, pink, are then given to different noises with extremely similar spectrum characteristics.

Although most of them have standardized noise patterns with specific disciplines, there are also plenty of noise spectrums with imprecise and informal definitions, like black noise, green noise, brown noise and so on.

These below parts were summarized from Wikipedia terms: Noise (electronics)

Sites: http://en.wikipedia.org/wiki/Noise_%28electronics%29

2.1.3 Inner Sources of Noise

Thermal noise is generated from the random thermal motion of charges (usually electrons) inside electrical conductors. The amplitude of the signal has a probability density function similar to the Gaussian (Normal) distribution. The amplitude of thermal noise depends on the temperature of the circuit.

Shot noise results from unavoidable random fluctuations when the charges (such as electrons) jump over a gap inside the electric circuits. It sounds rather similar to the noise created by rain falling on a tin roof.

Flicker noise has a frequency spectrum that falls down into the higher frequencies areas steadily.

Burst noise consists of sudden step-like transitions between two or more levels at random and unpredictable times. It sounds like eating popcorn.

2.1.4 Outer sources of Noise

Atmospheric noise is the natural disturbance caused by electricity discharges in thunderstorm and other natural disturbances occurring in nature, like disruptions of high-voltage wires.

Industrial noises are produced by automobiles, aircrafts and so on. The disturbances are produced by the discharge processes in these operations as well, which is similar to the atmospheric noise.

Extraterrestrial noises come from the universe. These noises include: Solar Noise, which is a radiation from the sun due to its intense nuclear reactions and the consequent high temperature, and Cosmic Noise, which are able to transmit its radiation and cosmic rays to almost everywhere.

2.1.5 Classification of Different Colors of Noise

This part was adapted and summarized from an online introductory article: “White, pink, blue and violet: The colors of noise” from the Wired Magazine Science Column, Author: Duncan Geere, Date: Apr. 07, 2011

White noise

White noise has a constant power distribution density on its spectrum. It is named after the white color light, which has a flatten frequency everywhere on the spectrum. The term is widely applied in many scientific and technical areas, including physics, audio engineering, telecommunications, statistical forecasting and many other areas. Specifically, White noise is used as a generator for random numbers. In addition, weather forecasting websites also use white noise to generate random digit patterns and simulate real weather.

Pink noise

The power density of pink noise decreases proportionally to 1/f. In the past, the term of flicker noise sometimes refers to pink noise, but it will be more appropriate if we strictly apply it only to electronic circuits. Moreover, Pink noise is also used in analysis of meteorological data and output radiation power of some astronomical bodies.

Brown noise

According to the precise definition, the term Brown noise refers to a noise whose power density decreases inversely proportional to f^2.

The density function can be generated from integrating white noise or via an algorithm of Brownian motion simulation. Brown noise is not named after the color brown spectrum, which is distinct from other noises. It can be used in climatology to describe climate shifts. However, within the scientific community, scientists have been arguing about its value for such purposes for a long time.

Blue noise

The power density of Blue noise is proportional to frequency. Blue noise has an increasing frequency over a finite frequency range. Blue noise is similar to pink noise, but instead of a decreasing spectrum, we observe an increasing one. Sometimes it is mixed up with Violet noise in informal discussion.

Violet noise

Violet noise is also known as the Purple noise. The power density of Violet noise is proportional to f^2, which means it increases in quadratic form. Violet noise is like another version of Brownian noise. Moreover, as Violet noise is the result of differentiating the white noise signal density, so people also call it the “Differentiated White noise”.

Grey noise

Grey noise is a special kind of white noise process with characteristic equal loudness curve. However, it has a higher power density at both ends of the frequency spectrum but very little power near the center. Apparently, this is different from the standard white noise which is equal loud across its power density. However, actually this phenomenon is due to the humans hearing illusion.

2.2 Randomness on Finance

2.2.1 Brief Introduction to Efficient Market Hypothesis

This part was summarized based on an online informal introductory article: “The Efficient Markets Hypothesis”, Authors: Jonathan Clarks, Tomas Jandik, Gershon Mandelker, Website: www.e-m-h.org

In financial fields, the efficient-market hypothesis asserts that stock market prices will evolve with respect to to a random walk. They have the same probability distribution and independent of each other. Random walk states that stocks take a random and unpredictable path. The probability of a stock’s future price going up is equal to going down. Therefore, the past movement (or trend) of a specific stock price or the overall market performance cannot be used as the basis to predict future movements. In addition, it is impossible to outperform the entire market without taking additional risk or putting extra efforts. However, EMH proves that a long-term buy-and-hold strategy is the most efficient, because long term prices will approximately reflect performance of the company very well, whereas short term movements in prices can be only described as a random walk.

2.2.2 Historical Backgrounds of Efficient Market Hypothesis

This part was summarized based on an online nonprofit educational website: www.e-m-h.org and a research paper: History of the Efficient Market Hypothesis, Nov.2004, Author: Martin, Sewell, Publisher: University College London.

Historically, the randomness of stock market prices was firstly modelled by a French broker, Jules Regnault, in 1863. Shortly after, a French mathematician, Louis Bachelier, developed the mathematics of Brownian motion in 1900. In 1923, the famous economist, Keynes clearly stated that investors in financial markets would be rewarded not for knowing better than other participants in the market, but rather for risk taking.

After the WWII, the efficient-market hypothesis emerged as an outstanding theory in the mid-1960s. In the 1960s, Mandelbrot proposed a randomness model for stock pricing. Fama discussed about Mandelbrot’s hypothesis and concluded that the market data confirmed his model. In addition, he defined the so-called “efficient market” for the first time, in his paper “Random Walks in Stock Market Prices”. He explained how random walks in stock market significantly influence individual stock prices. Later, he introduced definitions for three forms of financial market efficiency: weak, semi-strong and strong.

The term was eventually popularized when Burton Malkiel, a Professor of Economics at Princeton University, published his classic and prominent book: “A Random Walk Down Wall Street.”

2.2.3 Three Major Types of Markets: Weak, Semi-Strong and Strong

The three types of EMH were summarized based on an online technical blog: “The Efficient Markets Hypothesis”, Author: Jodi Beggs, Website: About.com

Weak Form of Efficiency

We cannot predict future prices through analyzing prices from the past. And we cannot earn excessive returns by using information based on historical data. In this level, technical analysis is always profitable, as share prices exhibit no dependencies on their past. This implies that future prices depend entirely on performance of companies.

Semi-Strong Form of Efficiency

Information other than market data is released, such as instant news, companies’ management, financial accounting reports, companies’ latest products. Under such condition, share prices will reflect the new information very rapidly. Therefore, investors cannot gain any excess returns by trading on the public information. Semi-strong-form efficiency market implies that neither technical analysis nor fundamental analysis can produce excess returns.

Strong Form of Efficiency

Under such condition, information typically held by corporate insiders is released. Therefore, share prices reflect not only previously public information, but all private information as well. Theoretically, no one can earn excess returns. However, even before major changes are exposed to the public, corporate insiders are able to trade their company’s stocks from abnormal profits. Fortunately, such insider trading is banned by surveillance authorities, like the Securities and Exchange Commission.

2.2.4 Arguments and Critics on Efficiency Market Hypothesis

However, critics blame that the theory’s applications in markets results in financial crisis. In response, proponents of the hypothesis state that the theory is only a simplification model of the world, which means that it may not always hold true under every conditions. Hence, the market is only practically efficient for merely investment purposes in the real world rather than other aims.

2.2.5 Interesting Counter-intuitive Stories on Monkeys

The story was adapted from the Forbes Magazine, Personal Finance Column, Author: Rick Ferri, Date: Dec, 20, 2012

In order to verify the Efficient Market Hypothesis and illustrate the theories explicitly to the public, a group of researchers conducted a monkey experiment. They randomly picked up thirty stocks from a one thousand stocks poll and then let a hundred monkeys throw darts at the stocks printing on newspaper. They kept repeating this experiment for five decades, and tracked the results.

In the end, to their surprise, monkeys’ performance beat the index by 1.7% per year, which indicates that, there is certain situation where traditional technical analysis cannot even beat randomly-selected portfolios. The results have shocked the whole world by how greatly randomness affects the market stock prices.

2.3 Randomness in Physics and Biology

2.3.1 Application of Randomness in Modern Physics

In the early 19th century, physicists use the philosophy of randomness to study motions and behaviors of molecules, and they build models in thermodynamics to explain phenomenon in gas experiments.

In the 20th century, when the era comes for quantum mechanics, microscopic phenomena are considered as completely random. Randomness of things like radioactive decay, photons passing through polarizers, and other bizarre quantum effects cannot be explained and predicted with classical theories in the usual way (Scott, 2009). Therefore, physicists propose a new theory, which claims that in a microscopic world, some of the outcomes appear casual and random. For example, when we describe a radioactive atom, we cannot predict when the atom will decay. What only left for us is the probability of decay during a specific given period. In order to solve this mystery, Einstein postulates the Hidden Variable theory, which states that nature contains irreducible randomness: properties and variables work beyond our scope somehow, but they actually determine the outcomes appear in our world.

2.3.2 Application of Randomness in Biology

The modern evolutionary states that the diversity of life is due to natural selection. Randomness, an essential component of biological diversity, is associated with the growth of biological organization during evolution (Longo & Montevil, 2012). It plays important roles in determining genetic mutation, and the significance of randomness effects appear at different sizes, from microorganisms to large mammals (Bonner, 2013). During this process, a number of random genetic mutations appear in the gene library under both inner and other influences. Although this process is purely random, it indeed systematically leads to a higher chance for survival and reproduction of those individuals who possess these mutations than those without them. This mechanism plays crucial roles in the survivals of animals.

Surprisingly, randomness in biology has remarkable relations to quantum physics. Schrodinger proposes his notion of negative entropy as a form of Gibbs free energy, which also behaves similarly to randomness properties in abstract quantum world (Schrodinger, 1944).

Part III References

Beggs, J. (2014). The Efficient Markets Hypothesis. About. Retrieved Mar 30, 2014 from http://economics.about.com/od/Financial-Markets-Category/a/The-Efficient-Markets-Hypothesis.htm

Bonner, J. (2013). Randomness in Evolution. Princeton University Press. Retrieved Mar 30, 2014 from http://press.princeton.edu/titles/9958.html

Clarke, J. & Jandik, T. (2012). The Efficient Markets Hypothesis. Retrieved Mar 30, 2014 from http://ww.e-m-h.org/ClJM.pdf

Ferri, R. (2012). Any Monkey Can Beat The Market. Forbes. Retrieved Mar 30, 2014 from http://www.forbes.com/sites/rickferri/2012/12/20/any-monkey-can-beat-the-market/

Geere, D. (2011). White, pink, blue and violet: The colors of noise. Wired. Retrieved Mar 30, 2014 from http://www.wired.co.uk/news/archive/2011-04/7/colours-of-noise/viewall

Harrison, A. G., & Treagust, D. F. (1994). Science analogies. The Science Teacher, 61, 40-43.

Longo, G & Montevil, M. (2012). Randomness Increases Order in Biological

Evolution. Retrieved Mar 30, 2014 from http://www.researchgate.net/profile/Giuseppe_Longo2/publication/221350338_Randomness_Increases_Order_in_Biological_Evolution/file/60b7d51544f17cb8d8.pdf

Schrodinger, E.: What Is Life? Cambridge U.P. (1944)

Scoot, J. (2009). Do physicists really believe in true randomness?

Ask a Mathematician. Retrieved Mar 30, 2014 from http://www.askamathematician.com/2009/12/q-do-physicists-really-believe-in-true-randomness/

Sewell, M. (2004). History of the efficient market hypothesis. Retrieved Mar 30, 2014 from http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/images/Research_Student_Information/RN_11_04.pdf

Factors That Restrict Success Within Youth Sport

Part 1 – With reference to the theory undertaken in this module, evaluate the key factors that restrict success within youth sport.

Theory that has already been undertaken in this module includes details of the factors that restrict success in Youth Sport. Factors that have been researched prior to this piece of work are Participation Rates, Support Structures, Maturation Rates, Talent Identification and School Sport Competition. Follows, will be a detailed report underpinning three of these factors and it will stress why they each restrict success in Youth Sport. Youth is another common title for a young person or young people (Konopka, G., 1973). People have different views on how they would define sport, people suggest that sport is an activity governed by rules or customs and often engaged competitively whilst others suggest differently. Sporting people have different attitudes when it comes to playing sport. Sportsmanship is an attitude that strives for fair play, courtesy toward teammates and opponents, ethical behaviour and integrity, and grace in victory or defeat (Fish and Magee 2003). Sports are most often played just for fun or for the simple fact that people need exercise to stay in good physical condition. Although they do not always succeed, sports participants are expected to display good sportsmanship, standards of conduct such as being respectful of opponents and officials. The three factors that will be that will be detailed are Relative Age Effects, Talent Identification and Significant Others. These seem to be the most contrasting factors that restrict success therefore there will be outcome of an understanding from different views and aspects of the sporting world for youths.

Depending on the dates of a child’s birth, they will be in either one season or the other to determine what school year they will be entering. For sports players, physical appearance is an agenda when it comes to selecting players off an appearance basis and measuring biological maturity is a way of finding best players; (Vaegans et al 2005). Youths that are involved in sport must be adequately prepared for a life in sport Long Term Athlete Development (LTAD) provides a model that they can work from.

Phase 1 – FUNdamentals (FUN)

Objective – TO LEARN FUNDAMENTAL MOVEMENT SKILLS
Content – Overall development, focusing on the ABC’s (Agility, Balance, Coordination, Speed) to underpin the generic skills used in many sports: Running, jumping and throwing.
Frequency – Perform physical activity 5-6 times per week.

Phase2 – Learning to Train (L2T)

Objective – TO LEARN FUNDAMENTALS SPORTS SKILLS

Content –

Concentration on the range of FUNdamental sports skills, such as throwing, catching, jumping and running.
Introduction to readiness: being mentally and physically prepared.
Basic FUNdamental tactics, e.g. if fielding, net/wall, invasion games can be introduced.
Cognitive and emotional developments are central
Skills are practised in challenging formats

Frequency – As above. If there is a favoured sport, it is suggested that at least 50% of the time is allocated to other sports/activities that develop a range of skills.

Phase3 – Training to Train (T2T)

Objective – TO BUILD FITNESS & SPECIFIC SPORTS SKILLS

Content –

This phase ideally occurs post-puberty and attention switches to:

Fitness Training
Detailed mental preparation
A focus on sport-specific skill development, including perceptual skills (reading the game/tactical understanding).

Decision making

Detailed and extensive evaluation

Frequency – For the aspiring performer, sport specific practice will now be 6-9 times per week.

Phase 4 – Training to Compete (T2C)

Objective – TO REFINE SKILLS FOR A SPECIFIC EVENT OR POSITION

Content –

Event and position specific training
Physical conditioning
Technical and tactical preparation
Advanced mental practice

All of the above come together and are developed under competition conditions.

Frequency – Training could be up to 12 timesper week.

Phase5 – Training to Win (T2W)

Objective – TO MAXIMISE PERFORMANCE IN COMPETITION

Content – Development and refinement of the aspects above, but with more use in competition modelling and more attention to rest periods and prevention of injury due to heavier load.

Frequency – Training could be up to 15 times per week

Phase6 – Retainment

For athletes/players retiring from competitive sport, many sports are developing Masters Programmes. An additional phase retainment- keeps the players/athletes involved in physical activity. Experiences gained as competitors can be invaluable, should they move into administration, coaching or officiating.

A move to another sport, perhaps at a more recreational level, may better suit some.

There are consequences that the sports person could be faced with as well as the advantages and disadvantages. According to research it has been found that approximately 70% of successful hockey and football players had a relative age advantage because they were born in the first-half of the defined age-group for their respective sports. By comparison, only 30% of these top-level players were born in the last 6 months of the respective “sport year”. One consequence that has been found is an increased drop-out rate for those youthful hockey players that had been disadvantaged by age in the past (Barnsley & Thompson 1988), suggesting that given the choice, younger children will seek to leave or avoid an activity in which their competitive position is hampered by their relative age. Interestingly and predictably, the relative age effect has also been found in other competitive sports such as baseball (Thompson et al 1991).

Steven Gerrard was affected as a youth when it came to playing football and furthering his career. Follows is a prime example of how he overcome his relative age effect.

Steven Gerrard, one of England’s most talented footballers, was born in May 1980 and was also a late developer. He describes in his autobiography his huge disappointment at not getting into the FA school at Lilleshall and subsequently not playing for England under-16s. Michael Owen, born some six months earlier in December and more physically developed made both squads easily. Steve Gerrard wrote in his autobiography: “The one nagging doubt in the back of my mind was that my rivals were bigger: I was really small and facing some tall, strong units in my position. “ Steven resented his rejection but had coaches and mentors at Liverpool who knew he needed more time.

Significant others can be described as the people who are around the sports performer. There are 4 sectors; Technical, Peers, Family and Supporters. These are also the key interpersonal support factors that affect a young person’s participation and progress in sport. The technical others are the people who see the person as a sports performer. They could be their coach, teacher, club official, sport sciencetist or medical health sciencetist. Peers are made up of people such as friends, classmates and team mates. Family is made up of parents, grandparents and siblings. Finally, supporters are people such as fans and neighbours. All 4 sectors have some form of impact on the sports player whether it is a big one or a small one. Parents are major part for the youth’s life. It is the parent that has brought the child up and it is their duty, by nature, to mould the child into a young promising adolescent. Parents have empathy for their children, perceived sharing their children’s on court emotions. They are perceived to possess knowledge and expertise of the sport so they feel entitled to comment. They also have a continuum of reactions throughout their child’s sport, good and bad. In-game negative comments are usually about 10%. (Holt et al 2008). Significant others can have a negative impact on the sports performer. These effects could include dropping out of sport all together. This could be caused by pressuring parents, lack of peers during adolescence and sibling rivalries. The parents could be either really demanding of the child and either expect too much of them in their sport or could be too strict towards them. Eccles and Harold (1991) proposed the parents expectations influence the child decision to engage in sport and activities including the intensity of effort expended and their child’s actual performance level. Next is an extract which is a study examining parental influence on children’s participation in sport, giving an idea why the parents could restrict success in Youth Sport.

X. Yang et al (1997) states ‘The purpose of this study was to examine parental influences on children’s participation in sport and their later physical activity. The population for the study consisted of a random sample of 1881 9- to 15-year-old boys and girls who were exposed to the extensive research program called “Cardiovascular Risk in Young Finns” in 1980. They and their parents have been followed up for twelve years at three-year intervals by means of a short questionnaire concerning physical activity and other factors. The results indicated that the fathers’ physical activity in 1980 was related to their children’s habitual physical activity in the same year, and gave in boys and girls a significant prediction of PAI values twelve years later when the starting point was the age of 9, and also among boys from 15 years of age to 27. During the three years follow-up period, the extent of participation in sport was higher in families with active parents than in families with passive parents and single parents. The relationship of physical activity and sports participation with fathers’ socioeconomic status and education was not strong as with fathers’ physical activity.’

Talent Identification is a where a talent is found in a person. In this case it is when a talent is found in a Youth. There are different processes in which talent identification can be used in order to find potential sports persons. Scouts are trained talent evaluators who travel extensively for the purposes of watching athletes play their chosen sports and determining whether their set of skills and talents represent what is needed by the scout’s organization. Many scouts are former coaches or retired players, while others have made a career just of being scouts. Skilled scouts who help to determine which players will fit in well with an organization can be the major difference between success and failure for the team with regard to wins and losses. Talent can not only be identified by an official spectator, it can be identified by standard spectators e.g. team mates, coach, teachers. It can also be identified by parents and grandparents. If the non-official spectators were to recognise a talent within a Youth playing Sport then they could hold the boost that the player needs in order to further themselves in their chosen sport. If the non-official spectator fails to let on to the sports person then it could restrict them from being successful in sport. UK Sport holds a number of talent identification programmes for youths and generally people aged 17-25. These are ‘Pitch 2 Podium’, ‘Sporting Giants’ and ‘Girls4Gold’. UK Sport (2008) and the English Institute of Sport (EIS)began asearch for highly competitive sportswomen with the potential to become Olympic champions in cycling and other targeted Olympic sports (bob skeleton, canoeing, modern pentathlon, rowing and sailing). Girls4Gold is the single most extensive female sporting talent recruitment drive ever undertaken in Great Britain.

Girls4Gold can only take a number of people onto their programme, same with any other programme. Following the tremendous success of Team GB at the 2008 Beijing Olympic Games, the Girls4Gold team received over 1300 applications and can’t take on anymore applicants until the next opening. This could be a restriction in itself because this programme could be missing out on extreme sports people including male participants. This is only available for females and therefore could hold a feature which a sporting male might need for success but doesn’t have on offer anywhere else on their sporting grounds.

After summarising the three factors discussed in this essay, it is clear that they all hold possible restrictions for success in Youth Sport. Relative age effects take a greater approach to the physical side of the sports people body, as the older the sports person is, the more mature and developed their body is and the younger the person is the less developed they are. This could be a restriction when it comes to team games more than it could be a restriction to playing sport individually. This means that the player could have more chances of becoming successful in an individual sport rather than a team sport. The restriction depends on what the player’s sport is and which ‘school year’ the player is born into. Significant others can be a restriction in itself depending on who the ‘significant others’ are around the sports player. Provided that there are the right people, technically as well as emotional and mentally, the then player could be stable and could be successful. However, if those people aren’t in place to give the young sports person the interpersonal support they need then they could be self reluctant to take up opportunities in order to be successful. Talent identification is the main key to success in sport. If the payer is not recognised then the player is not faced with the opportunity to make their sport official or even turn it into a career for themselves. With this it could be argued that the sports person should not take chances and wait to be noticed and that they should help themselves be acknowledged by talent identifier. This is when they could apply for programmes such as ‘Girls4Gold’. This is when it could be ‘make or break’ for the sports player, as sometimes they are not always faced with people on the search for talent.

Work-related stress amongst employees

In the main, business managers are failing to deal with the problem of work-related stress amongst employees.

Work-related stress is a common problem of modern lifestyle which has spread all over the world and touched almost all vocations (Life, nd, p.1). “Job stress is a chronic disease caused by condition in the workplace that negatively affects an individual’s performance and/or overall well-being of his body and mind (Life, nd, p.1)”. Sources of work-related stress include high demand of performance, family pressure, poor interpersonal relationships and career concerns. The consequence of the stress will lead to less self-confidence, worse performance and even suicide. As the stress which detrimental to people’s health has become more and more severely, how to cope with the situation is attracting increasing people’s attention. This essay focuses on problems of athletes’ stress and submits some solutions to the problems for sports managers.

The natures of the stress can be split into two parts: physical symptoms and behavioral symptoms (speaking book, 2008). The physical symptoms include: “tiredness, nausea, headaches, muscle tension, nervous twitches and altered sleep patterns. Aggression, anxiety, poor decision-making, inability to prioritize, mood changes, difficulty in concentrating, feelings of failure and isolation are belonging to behavioral symptoms (speaking book, 2008, p.95)”. All of these symptoms can reflect the athletes’ work-related stress.

The causes of the athletes’ job stress are intricate and complex. They can be mostly divided into 4 parts-environmental issues, personal issues, leadership issues and team issues respectively. Firstly, environmental issues, which include selection, finance, and training environment, is a factor that contributes to the stress (Tim & Lew, 2001). Selection is consisted of late selection, a lengthy selection process and unfair selection system. Some athletes illustrate that they feel nervous and tense if they do not know whether they will be chosen for competition. They fear that they will not have enough time to prepare the competition which lead to the stress. And some unfair selection also causes the stress of athletes as they can not obtain the chance of equal competition (Tim & Lew, 2001). Finances play an important role in stress. It includes not having enough funding money and differential financial support. Athletes spend most of time on training so that they do not have extra time for earning money. Therefore, they have to obtain the funding from sport organization, sponsorship or family. If the financial support is not enough or is poorly managed, athletes will feel depressive and anxiety (Tim & Lew, 2001). Training environment may be being able to lead to the athletes’ stress if athletes exist in the two opposite environments at the same time. The incompatible environment will make athletes feel uncomfortable.

The second part which is the most important one is personal issues. Personal issues contain nutrition, career concern, interpersonal relationship, injury and external distractions (Tom, et al 2000; Tim & Lew, 2001). Poor provision of food and disorder eating habit will lead to innutrition or obesity which will influence the athletes’ performance (Tim & Lew, 2001). A female athlete says that diet is her worst puzzle, which will lead to stress (Tim & Lew, 2001). A study shows that external distractions (23%) and career concern (19%) are the two major causes for stress (Pensgaard, 1998). Roberston & Cooper(1983) believe that career stagnation, high expectation from other people and unrealistic goals, which are the main components of career concern, may give rise to stress if athletes fail to achieve the expectation and goals (Tom, et al 2000). At the same time, external influence also brings stress to athletes. The press, media, spectators and family make athletes distract from their work which consequently influence their performance (Pensgaard, 1998). For example, David Beckham, who is a talented football player, fell out with his coach because the coach thought that David paid more attention on entertainment area than on training. Such action had impeded his development of football skill. At that time, David also had to face a big stress from his wife, who was his manager for planning the commercial activities.

Poor interpersonal relationships in a team are another factor of stress. There are three important sets of relationships-relationships with sports managers, with coaches and with teammates. Low interpersonal support from sports managers, coaches and teammates will be linked with high anxiety, tension and low performance satisfaction which increase the risk of obtaining pressure (Tom, et al 2000). In addition, injury, which is the worst thing for athletes, often results in pressure. Most of the athletes who get hurt will worry about their career as they fear that they can not get opportunities to go to the competition or they will lag behind because of less training (Tim & Lew, 2001).

The third part of the cause is leadership issues which focus on the aspect of coach. Coach’s differential treatment of athletes, overbearing coach, coach very demanding and coach-athlete tension are the reasons of athletes’ pressure (Tim & Lew, 2001). Coach’s attitude influences athletes deeply because he plays a vital role in a team and has the right to decide which athlete will be chosen for competition. Most of athletes fear that they will be ignored by their coach and some athletes feel stress as they can not bear the workload (Tim & Lew, 2001). Moreover, coaching style is another cause of athletes’ stress. Some athletes can not adapt to different coaching styles which may deter their development. The poor performance of an athlete results in a rise of the pressure (Tim & Lew, 2001).

Team issues, as the last part of the cause, can not be ignored. It main includes: team atmosphere, communication and support (Tim & Lew, 2001).Team atmosphere is a main issue which relates to the tension between the athletes. A new team member, injured athletes and separate groups within team may lead to the poor team atmospheres which engender a tense situation in the team. The supports from teammates, coaches and sports managers are the mental underpinning of athletes’ which make them get rid of the negative mood. Without support, athletes may feel helpless and even stress (Tim & Lew, 2001).

Persistent stress may result in long term consequences which may alter the way the athletes feel, thinks and behaves, and may also change their physiological function (Stansfeld et al, 1999; Santer&Murphy, 1995; Cincirpini et al, 1984; Stainbrak &Green, 1983, cited in Tom, et al 2000). Effects of athletes’ stress may work on individuals and teams respectively. For individuals, effects of stress may include: “sleep disturbances, headaches, gastrointestinal upset, cardiovascular disease, anxiety and depression, labile emotions, less of concentration, lack of motivation, substance misuse and poor performance (University of Cambridge, 2008)”. While for team, consequences of stress may mean low morale, increased athletes’ complaints, increased accidents, high absenteeism and poor performance which will influence the development of the team (University of Cambridge, 2008).

Because of the high dangers of the athletes’ stress, how to tackle the problem has become the focus of sports managers. There are some solutions can be used to help sports managers cope with the stress of athletes. First of all, sports managers have to take responsibility for athletes’ diets and ensure athletes maintain good nutrition (Dean, 2007). Secondly, sports managers should prevent athletes from working overload. Therefore, they should give athletes manageable training schedule so that they will not feel too tired. Keep good relationship with athletes and manage the relationship between athletes are both important for sports managers. The supports from teammates, coaches and sports managers are the mental underpinning of athletes which can help them release from stress (Tim & Lew, 2001).

To athletes, stress is a persistent problem which often influences their performance and life. Although sports managers are trying to deal with the situation and some of them have been taken some solutions, athletes continue to be affected by stress problem (Pensgaard, 1998). The solution taken by sports managers, such as effective time management, health diet and keeping good relationship, are useful at the certain extent. However, some accidents which can not be predicted by sports managers would also result the stress. So, while deal with the existing stress is important, detecting the possible sources of stress for athletes may be is a more effective way to avoid the happening of pressure (Pensgaard, 1998).

Bibliography

Pensgaard AM, Ursin H. (1998). ‘Stress, control, and coping in elite athletes’ in Scand J Med Sci Sports Journal Vol. 8 pp183-189

Tim, W. & Lew. H. (2001). ‘ A case study of Organizational Stress in Elite Sport’in APPLIED SPORT PSYCHOLOGY Journal Vol. 13 pp207-238

Tom. C et al. (2000) Research on Work-related Stress. Bilbao: European Agency for Safety and Health at work (see Epi)

Life. ‘Stress at work’.

< http://www.lifepositive.com/mind/psychology/stress/stress-at-work.asp>

Joan, M. & Sebastian W. (2008) English for Academic Study: Speaking. England: Garnet Publishing Ltd

Human Resources Division of Cambridge University. (2008) ‘Effects of Work-Related Stress’

http://www.admin.cam.ac.uk/offices/hr/policy/stress/effects.html> [Accessed on 27/5/2008]

Dean H. (2007) ‘Stress and the Athlete’

http://coachdeanhebert.wordpress.com/2007/12/16/stress-and-the-athlete/> [Access on 16/12/2007]

Working With Special Populations

Spirduso et al. (2005) gives the definition of ageing as ‘ A process or group of processes occurring in living organisms that begins with birth and, with the passage of time, leads to a loss of adaptability, functional impairment and eventually death’. Also Swain and Leutholtz (2002) define aging as a result in years of physical inactivity, and that much to do with the biological consequences of age is the sedentary lifestyles most aging people have.

Those who remain physically active throughout life demonstrate much slower rates of physical decline than do the sedentary, and a growing body of research indicates that those who have been sedentary for many years can experience significant improvements by beginning an exercise programme even at very advanced ages (Fiatrone et al. 1990).

The World Health Organization (WHO) estimates that there is over 20 percent of the population in the United Kingdom over the age of 65 and by the year 2025 there is an overall projection that, that value will rise to almost as much as 30% of the population living in the united Kingdom to be over the age of 65 (Mcardle, Katch and Katch 2010).

Research shows that when properly prescribed exercise, elderly people can significantly improve their aerobic power (Eshani 1987), muscular strength and size (Fiatrone et al. 1990; Frontera et al. 1988), and bone density (Dalsky 1989). Improvements in functional movements such as walking speed and stair climbing power have also been reported (Fiatrone et al. 1990). These results can reverse the effects of many years of physical decline and lead to greater independence and a much higher quality of life.

More than half of elderly people have at least one disability or chronic condition, participation in a regular physical activity/exercise programme has many physiological health benefits including reducing the risk and lessening the impact of many chronic diseases (DiPietro, Caspersen and Ostfield 1995).

Aging has numerous effects on organ systems in the body, effecting skeletal muscle, body composition, the cardiovascular system, the metabolic system, the respiratory system, the nervous system, energy expenditure and energy intake and also thermoregulation. These can all seem to be contraindications for exercising when elderly such as thermoregulation being affected this means there is a decreased ability to regulate body temperature when homeostasis is challenged; decreased amount of sweat per active sweat gland; reduced response to increased blood flow during exercise attributable to structure and response of cutenous blood vessels; inadequate ability to reduce splanchnic blood flow during exercise (Kenney 1997 and King, Martin 1998).

In general, if an individual leads an active lifestyle it preserves and enhances skeletal muscle, strength and endurance, flexibility, cardio respiratory fitness and body composition for later life.

Main Content

Physiological Factors

Cardiovascular Fitness + Training

Since many elderly individuals have a low initial fitness level, it is prudent to begin exercise programmes at a low intensity and to progress gradually (Swain and Leutholtz 2002). Low cardiorespiratory fitness is a risk factor for cardiovascular disease and all cause mortality (Blazer 1982). Low VO2 peak is associated with reduced ability to perform ADL’s (activities of daily living) including climbing stairs and brisk walking (Birdt 1998). A small improvement in cardiovascular fitness is associated with lower risk of death. Healthy sedentary older men and women can increase their cardiorespiratory fitness by performing aerobic exercise training (Engels et al. 1998: Kuczmarski et al. 1994). Physical activities that the elderly population, should engage in are walking (indoors, outdoors, or treadmill), gardening, swimming (water aerobics), golf and cycling (White 1995).

Combining strength with endurance training is also beneficial for the elderly individual. One study showed that after 6 months of combined resistance and endurance training, older healthy individuals increased their VO2 peak (11%) and their upper and lower body strength (Blazer 1982). The ability to carry out normal daily task such as carrying laundry, vacuuming and climbing stairs translated to carrying 14% more weight and moving 10% faster.

Resistance Training

Elderly individuals, including the oldest old and very frail elderly, demonstrate physiological adaptations to strength training (Kuczmarski et al. 1994). How much adaptation depends on the frequency, volume, mode, type of training and initial training state (Ferketich, Kirby and Alway 1998). Strength training has the potential to improve functional capacity and quality of life of the elderly person (Fiatrone et al. 1990). Most elderly individuals can participate in a resistance training programme that is individually designed. Those with hypertension or arthritis or at risk of osteoporotic fracture need to be assessed and evaluated by a physician prior to initiating resistance training programme (White 1995).

A ACSM recommendation for the elderly that bears some scrutiny is the recommendation to use machines as apposed to free weights. Swain and Leutholtz (2002) evaluates that although it is true that machines require less skill, free weights have the advantage of teaching balance and greater neuromuscular control, which may be transferrable to real world activities. Furthermore they also talk about free weights being more superior by allowing the user to add small amount of weight onto their dumbbells i.e. 1kg whereas resistance machines normally have increments of 4.5kg or more which is a large leap when the user is frail, on the other hand ACSM realise that machines require less balance requirements and the risk of injury.

Resistance training programmes lasting from 8 weeks to 1 year can increase muscle strength and mass in elderly, regardless of age and sex (Fiatrone et al. 1990).

Psychological + Sociological Factors

International Society of Sport Psychology (1992) states that “Individual psychological benefits of physical activity include: positive changes in self perceptions and well-being, improvement in self-confidence and awareness, positive changes in mood, relief of tension, relief of feelings such as depression and anxiety, influence on premenstrual tension, increased mental well-being, increased alertness and clear thinking, increased energy and ability to cope with daily activity, increased enjoyment of exercise and social contacts, and development of positive coping strategies.”

Many older individuals do not have a spouse, close children or friends to rely on for socialization, assistance and support (Evans 1999). Although with age, social relationships may change from family to more formalized organizations or nonfamily members, many elderly live in social isolation and are very lonely. This is important because epidemiological studies have demonstrated a relationship between social support and physical health (Evans 1999). To add on to this it has been show that in several studies, lack of social support is a major risk factor for depression, morbidity and mortality (Engels et al. 1998).

Participation in an organized training session provides an excellent opportunity for interaction between other elderly people and when organising a session it is been seen to do all activities as in one whole group to get a more major interaction between participants (Evans 1999). Also another method which could be used to improve social interaction for the elderly participating in a exercise program could include a ‘buddy’ exercise system where individuals are matched up with similar ability to perform their exercises together.

Exercise Recommendations

Physical activity recommendations for the elderly are updated regularly by the American College of Sports Medicine (ACSM 2000).

High intensity activities such as running, rowing, aerobic/gravity riders, and stair steppers may not be appropriate unless the individual has a rare high fitness level. Low to moderate intensity exercise programmes can be performed daily. Higher intensity exercise sessions (>70% heart rate reserve) should only be performed 3 to 5 days per week (ACSM 2000). This allows for recovery days, which are more important for the older adult than the younger person as elders recover slower. Older individuals with a low exercise capacity may benefit from multiple daily sessions of short duration, whereas the more capable individual can benefit from three sessions per week with exercise bouts performed once per day (ACSM 2000).

Elderly individuals who are unable to perform ambulatory activities may be candidates to perform seated chair activities, stationary cycling and water activities. T’ ai chi is seen to be one of the best activities for elderly individuals to undertake as it improves strength and balance according to Dalsky (1989).

For the healthy older individual, it is recommended that exercise be performed minimally for 30 minutes but not beyond an hour in duration. If an individual beginning an exercise programme is predominately sedentary, has severe chronic disease, or has a very low fitness level, a minimum of 30minutes of continuous activity may not be possible. Sessions of as little as 10 minutes two or three times a day is appropriate in this situation. Health benefits are still obtained this way (ACSM 2000).

National and Regional Strategies
Summary

Physical activity of light to moderate intensity helps to improve health, whereas moderate to high intensity physical activity with an emphasis on aerobic endurance improves cardiorespiratory fitness (VO2) as well as health in older individuals. Elderly individuals demonstrate improvements during resistance training by increasing muscle mass and strength; this improves gait, balance, and overall functional capacity and bone health this staving off chronic diseases such as osteoporosis and improve overall quality of life. There are also psychological benefits associated with regular physical activity and exercise. Dr. Robert Butler, former director of the National Institute of Aging states ‘If exercise could be put in a bottle, it would be the strongest medicine money could buy’

In general the elderly person can improve physical and mental health by performing regular physical activity, and this should be encouraged by all medical and exercise professionals.

Ultimately, regardless of age or level of frailty, nearly all elderly persons can derive some physiologic, functional or quality of life benefit from initiating an exercise programme.

Training Sessions

Mode

Frequancy

Itenisity

Duration

Special Considerations

Aerobic Training

Intensity

Load

Reference Page

American College of Sports Medicine. (2000) ACSM’s Guidlines for Exercise Testing and Prescription. 6th Edition. Baltimore: Lippincott, Williams and Wilkins.

Birdt, T.A. (1998) Alzheimer’s disease and other primary dementia. In Harrison’s principles of internal medicine. New York: McGraw and Hill; pp. 2348-2356.

Blazer, D.G. (1982) Social support and mortality in an elderly community population. American Journal of Epidemiology; 115:684-694.

Dalsky, G.P. (1989) The role of exercise in the prevention of osteoporosis. Comprehensive Therapy. 15(9):30-37.

DiPietro L, Caspersen C.J., Ostfield A.M. (1995) A survey for assessing physical activity among older adults. Medical Science Sports and Exercise; 25: 628-642.

Engels, H.J., Drouin, J., Zhu, W., Kazmierski, J.F.(1998) Effects of low impact, moderate intensity exercise training with and without wrist weights on functional capacities and mood status on older adults. Gerontology: 44:239-244

Eshani, A.A. (1987). Cardiovascular adaptations to exercise training in the elderly. Journal of Applied Physiology. 46:1840-1843

Evans, W.J.(1999) Exercise Training Guidelines for The Elderly. Medical Science of Sport and Exercise; 31:12-17

Ferketich, A.M., Kirby, T.E., Alway, S.E. (1998) Cardiovascular and muscular adaptations to combined endurance and strength training in elderly women. Acta Physiology Scandinavia; 259-267.

Fiatarone, M.A., Marks E.C., Ryan N.D., Meredith C.N., Lipsitz L.A., Evans W.J. (1990) High intensity strength training in nonagenarians. Journal of American Medical Association. 263:3029-3034.

Frontera, W.R., Meredith, C.N. O’Reilly, K.P. Knuttgen, H.G. Evans, W.J. (1988) Strength conditioning in older men: Skeletal muscle hypertrophy and improved function. Journal of Applied Physiology, 64:1038-1044.

International Society of Sport Psychology (1992). Physical activity and psychological benefits: International Society of Sport Psychology Position Statement. The Physician and Sports medicine, 20(10), 179-184.

Keen, W.L. (1993) The older Athlete: Exercise in hot environments. Sports Science Exchange 6:44.

King, A.C. and Martin, J.E. (1998) Physical Activity promotion: Adoption and Maintenance. American College of Sports Medicines Research Manual for Guidelines for Exercise Testing and Prescription. Baltimore: Williams and Wilkins pp 564-569.

Knutzen, K.M., Brilla, L.R. and Caine, D. (1999) Validity of 1RM prediction equations for older adults. Journal of Strength and Conditioning Research 13, 242-246.

Kuczmarski, R.J., Flegal, K.M., Campbell, S.M., Johnson, C.L. (1994) Increasing prevalence of overweight among U.S. adults. Journal of American Medical Association; 272:205-211.

McArdle, W.D., Katch, F.I. and Katch V.I. (2010) Exercise Physiology: Nutrition, Energy and Human Performance. 7th Edition. Baltimore: Lippincott Williams and Wilkins.

Seguin, R. and Nelson, M.E. (2003) The benefits of strength training for older adults. American Journal of Preventive Medicine 25 (Suppl. 2), 141-149.

Spirduso WW, Francis KL, MacRae PG (2005). Physical Dimensions of Ageing (2nd ed). Human Kinetics, Champaign, IL, pp. 131-55.

Swain, D.P and Leutholtz, B.C. (2002) Exercise Prescription: A case study approach to the ACSM Guidelines. Champaign: Human Kinetics.

White, T.P. (1995) Skeletel muscle structure and function in older mammals. In Perspectives in Exercise Science and Sports Medicine. Carmel: Cooper; pp.115-174.