A neural network is a model characterized by an activation function, which is used by interconnected information processing units to transform input into output. A neural network has always been compared to human nervous system. Information in passed through interconnected units analogous to information passage through neurons in humans. The first layer of the neural network receives the raw input, processes it and passes the processed information to the hidden layers. The hidden layer passes the information to the last layer, which produces the output.

The advantage of neural network is that it is adaptive in nature. It learns from the information provided, i. e. trains itself from the data, which has a known outcome and optimizes its weights for a better prediction in situations with unknown outcome. A perceptron, viz. single layer neural network, is the most basic form of a neural network.

A perceptron receives multidimensional input and processes it using a weighted summation and an activation function. It is trained using a labeled data and learning algorithm that optimize the weights in the summation processor. A major limitation of perceptron model is its inability to deal with non linearity. A multilayered neural network overcomes this limitation and helps solve non linear problems. The input layer connects with hidden layer, which in turn connects to the output layer. The connections are weighted and weights are optimized using a learning rule.

Please set working directory in R using setwd function, and keep cereal. csv in the working directory. We use rating as the dependent variable and calories, proteins, fat, sodium and fiber as the independent variables. We divide the data into training and test set. Training set is used to find the relationship between dependent and independent variables while the test set assesses the performance of the model. We use 60% of the dataset as training set.

The assignment of the data to training and test set is done using random sampling. We perform random sampling on R using sample function. We have used set. seed to generate same random sample everytime and maintain consistency. We will use the index variable while fitting neural network to create training and test data sets. The R script is as follows:Now we fit a neural network on our data.

We use neuralnet library for the analysis. The first step is to scale the cereal dataset. The scaling of data is essential because otherwise a variable may have large impact on the prediction variable only because of its scale. Using unscaled may lead to meaningless results. The common techniques to scale data are: min max normalization, Z score normalization, median and MAD, and tan h estimators. The min max normalization transforms the data into a common range, thus removing the scaling effect from all the variables.

Unlike Z score normalization and median and MAD method, the min max method retains the original distribution of the variables. We use min max normalization to scale the data. The R script for scaling the data is as follows. We have evaluated our neural network method using RMSE, which is a residual method of evaluation. The major problem of residual evaluation methods is that it does not inform us about the behaviour of our model when new data is introduced.

We tried to deal with the “new data” problem by splitting our data into training and test set, constructing the model on training set and evaluating the model by calculating RMSE for the test set. The training test split was nothing but the simplest form of cross validation method known as holdout method. A limitation of the holdout method is the variance of performance evaluation metric, in our case RMSE, can be high based on the elements assigned to training and test set. The second commonly cross validation technique is k fold cross validation. This method can be viewed as a recurring holdout method.

The complete data is partitioned into k equal subsets and each time a subset is assigned as test set while others are used for training the model. Every data point gets a chance to be in test set and training set, thus this method reduces the dependence of performance on test training split and reduces the variance of performance metrics. The extreme case of k fold cross validation will occur when k is equal to number of data points. It would mean that the predictive model is trained over all the data points except one data point, which takes the role of a test set. This method of leaving one data point as test set is known as leave one out cross validation. Now we will perform k fold cross validation on the neural network model we built in the previous section.

The number of elements in the training set, j, are varied from 10 to 65 and for each j, 100 samples are drawn form the dataset. The rest of the elements in each case are assigned to test set. The model is trained on each of the 5600 training datasets and then tested on the corresponding test sets. We compute RMSE of each of the test set. The RMSE values for each of the set is stored in a Matrix. This method ensures that our results are free of any sample bias and checks for the robustness of our model.

We employ nested for loop. The R script is as follows:The article discusses the theoretical aspects of a neural network, its implementation in R and post training evaluation. Neural network is inspired from biological nervous system. Similar to nervous system the information is passed through layers of processors. The significance of variables is represented by weights of each connection.

The article provides basic understanding of back propagation algorithm, which is used to assign these weights. In this article we also implement neural network on R. We use a publically available dataset shared by CMU. The aim is to predict the rating of cereals using information such as calories, fat, protein etc. After constructing the neural network we evaluate the model for accuracy and robustness.

We compute RMSE and perform cross validation analysis. In cross validation, we check the variation in model accuracy as the length of training set is changed. We consider training sets with length 10 to 65. For each length a 100 samples are random picked and median RMSE is calculated. We show that model accuracy increases when training set is large.

Before using the model for prediction, it is important to check the robustness of performance through cross validation.