ieee

Learning Technology

 

publication of

 

IEEE Computer Society

 

Technical Committee on Learning Technology (TCLT)

 

Volume 10 Issue 2

ISSN 1438-0625

April 2008

 

 

From the editor…... 2

Adaptive Fuzzy Regression Model for the Prediction of Dichotomous Response Variables using Cancer data: A Case Study. 3

The development of a web-based platform for fathers of children with Autism.. 11

Does the availability of audio podcasts enhance the classroom experience for first year dental students? Data on use and perceived benefits. 16

 


From the editor…

 

 

 

Welcome to the April 2008 issue of Learning Technology.

 

This newsletter focuses on bringing emerging technologies in education to the readers. New developments and practices with learning technologies are the core of this newsletter. This issue covers adaptive technique for predicting dichotomous responses to educating parents of autistic children through Internet.

 

Nagar and Srivastava proposed an adaptive technique in the prediction of dichotomous response variable by combining fuzzy concept with statistical logistic regression which was tested on cancer dataset in predicting cancer susceptibility. Ferdig et. al. looks at the introduction of an Internet based system to educate fathers of autistic children in acquiring the requisite skills. They then identify the impact of this technology-based intervention on fathers and the results effects on their children with autism. Whitney and Pessina recorded their lectures and provided them to their students and investigate the effects of these podcasts on their learning. They looked at the possibility of increased active learning with the availability of podcasts and how podcasts supplement the learning content.

 

This newsletter focuses publishing new and emerging technologies in education focussing on advanced learning technologies and its usage in different contexts. Please feel free to bring forward your ideas and views.

 

Besides, if you are involved in research and/or implementation of any aspect of advanced learning technologies, I invite you to contribute your own work in progress, project reports, case studies, and events announcements in this newsletter. For more details, please refer author guidelines at http://lttf.ieee.org/learn_tech/authors.html.

 

 

 

 

 

Ali Fawaz Shareef, PhD

Director General

Centre for Open Learning

Maldives

a.f.shareef@ieee.org

 


Adaptive Fuzzy Regression Model for the Prediction of Dichotomous Response Variables using Cancer data: A Case Study

 

Abstract

This paper proposes an adaptive technique in the prediction of dichotomous response variable by combining fuzzy concept with statistical logistic regression. The model was tested on cancer dataset in predicting cancer susceptibility. In this paper we will present the development, evaluation and validation of the proposed model based on the experiment carried out. Explanatory power of the adaptive model was calculated and compared with fuzzy neural network and statistical logistic regression models using calibration and discrimination techniques. Area under ROC values calculated indicates that the proposed model has compatible predictive ability to both fuzzy neural network and statistical logistic regression models.

 

Keywords: Artificial Neural Network, Fuzzy  Regression, Logistic regression

 

 

 

Introduction

 

Predictive models and cancer screening

 

Precise and accurate predictive models are very important in screening initiatives. The need for new approaches and philosophies in modeling cancer prediction and susceptibility are influenced by the recent advances in soft computing as well the questionable accuracy and inapplicability to individual prediction of previously sought after statistical analysis techniques. Thus establishing precise predictive models become increasingly more difficult for multivariable predictive models. Traditionally, such regression problems have been addressed by statistical logistic regression techniques for binary dependent variables.

 

 

Machine learning technique and interval

 

Prediction

 

A machine learning technique is an algorithm that estimates an unknown dependency between a set of given input variables and its response variable. When such dependency is discovered, it can be used to predict or deduce the future output associated with a different set of input values. This is done by identifying the target function that best describes the behavior governing the input-output pattern. Learning in this context refers to the process of minimizing the difference between observed data and model output [7].

 

An interval prediction is usually comprised of the upper and lower limits between which a future unknown value is expected to lie with a prescribed probability. The prediction interval deals with the accuracy of the estimates with respect to the observed target values [7]. The use of prediction interval in machine learning is appropriate when dealing with multivariate functions where available data are very imprecise and limited and when explanatory variables are interacting in uncertain, vague manners [1]. In other words a fuzzy phenomenon is best modeled by a fuzzy functional relationship. The use of prediction interval in machine learning is referred to as fuzzy linear regression technique.

 

 


 

Motivation of study

 

Existing Prediction Techniques include statistical techniques and artificial intelligent techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM), Fuzzy Logic, k-nearest neighbors (k-NN),   Fuzzy  Neural   Network   (FNN)   and   Fuzzy  Regression [9, 10]. However there are limitations and drawbacks of the above listed prediction techniques.

 

Problems normally arise in statistical prediction when there is an inadequate number of observations and when distribution assumptions are not satisfied [1]. As for the artificial intelligent prediction techniques, common limitations involve low interpretation ability due to the “Black box” nature of the model (ANN and SVM), limited model ability to explicitly identify possible causal relationships between variables, over fitting problems (ANN and SVM), difficult to build (k-NN), lack in flexibility to incorporate new knowledge (SVM), risk of eroding old but valid information when new knowledge are introduced in the system (SVM) and unsuitable use for high-dimensional data (SVM) [6, 9, 10].

Thus the main question that sparked this study was whether there exist new measures particularly among the artificial intelligent techniques that can be used in predicting binary outcome. The proposed model was supposed to provide answers to the following research questions:

 

·         How do we improve the prediction accuracy using artificial intelligent techniques?

·         What can be used to handle ambiguous relationship between the independent (explanatory) and dependent (response) variables?

·         What can be introduced in the prediction of dichotomous outcome?

·         How do we analyze the non-linear relationship between the independent and dependent variables in multivariate environment?

 

As a result, an adaptive model was developed by combining the concept of fuzzy with statistical logistic regression. New algorithm to be used for intrinsically linear functions involving linear transformation processes was formulated. This adapted fuzzy logistic regression model can then be used to deduce prediction interval output for binary response variable.

 

This paper is organized as follows: section I gives the introduction of the proposed fuzzy regression model, section II describes the theory that underlies fuzzy linear regression and fuzzy logistic regression predictive models. The algorithm adapted is shown in section III. Section IV discusses the experiment conducted and model validation. Finally in section V, conclusions from the presented work are drawn.

 

 

Underlying theories for the adaptive fuzzy logistic regression model

 

Fuzzy linear regression theory

 

Regression analysis is an estimation method used in finding a crisp relationship between the dependent and independent variables and also used to estimate the variance of measurement error. Fuzzy regression analysis is an extension of the classical regression analysis in which some elements of the models are represented by fuzzy numbers [3]. Fuzzy regression methods have been successfully applied to modeling problems in financial forecasting and engineering [2, 8, 11].

 

There are two categories of fuzzy regression analysis; the first is a possibilistic regression analysis which is based on possibility concepts. Possibilistic regression analysis uses fuzzy linear system as a regression model whereby the total vagueness of the estimated values for the dependent variables is minimized. It was first proposed by Tanaka et al. [1, 3].

 

The second category of fuzzy regression analysis adopts the fuzzy least squares method (FLSM) for minimizing errors between the given outputs and the estimated outputs. The advantage of Tanaka’s possibilistic model is in its simplicity in programming and computation, while FLSM in its minimum degree of fuzziness between the observed and estimated values [3].

 

Statistical logistic regression theory

 

Logistic regression is a mathematical modeling approach that is used to describe the relationship between several explanatory variables X’s to a dichotomous dependent variable Y [5]. Logistic regression can be used to predict the outcome from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. That is, logistic regression makes no assumption about the distribution of the independent variables. They do not have to be normally distributed, linearly related or of equal variance within each group. The dichotomous dependent variable can take the value of 1 with a probability of success P, or the value of 0 with probability of failure 1-P. This type of variable is called Bernoulli (or binary) variable. The relationship between the predictor and response variables is not a linear function in logistic regression, instead, logistic regression function is used which is the logit transformation of P[5]

 

 

where P is the probability of a 1, e is the base of the natural logarithm (about 2.718) and a and b are the parameters of the model.

 

 

The adaptive fuzzy logistic regression model

 

The adaptive fuzzy logistic regression model is based on Tanaka’s possibilistic regression analysis described above in which the response variable Y is writ