Friday, February 19, 2010

Guest Lecture

I was posted to PulseMetrics Pte Ltd during my internship and I learned a lot from Mr Chin Yen. As a supervisor, my boss, he has a lot of patients by explaining in terms of BI skills on how to apply it in our projects and he was the first person to actually teach me about BI.

He was invited as the guest lecturer and to share his view on BI as a implementer of BI, he helped me understand BI better and he will not hesitate to share his thoughts and experience with us and I truly gained a lot through the sharing session with Mr Chin and the other two guest lecturers, Mr Erwin Lim and Ms Carolyn Khiu whom I have already met before the lecture and to hear their point of view on how BI benefitted the business, greatly inspired me on how interesting BI can be.

This will be my last post for my e-portfolio and overall, I enjoyed this subject a lot, although there were so many many many theories to memorise, the whole subject itself is really fun. I actually like doing the dashboard very much since internship, so if I were to be given another chance to choose my elective, BI will still be one my choices.

Thank you to all the teachers for teaching me and to share their knowledge with us. And I'm signing off now. Bye

-Audrina =) -

Week 10

Data Visualization: Modern Approaches
http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/

By Vitaly Friedman
August 2nd, 2007

Data presentation can be beautiful, elegant and descriptive. There is a variety of conventional ways to visualize data – tables, histograms, pie charts and bar graphs are being used every day, in every project and on every possible occasion. However, to convey a message to your readers effectively, sometimes you need more than just a simple pie chart of your results. In fact, there are much better, profound, creative and absolutely fascinating ways to visualize data. Many of them might become ubiquitous in the next few years.

So what can we expect? Which innovative ideas are already being used? And what are the most creative approaches to present data in ways we’ve never thought before?
Let’s take a look at the most interesting modern approaches to data visualization as well as related articles, resources and tools.





Below are some of the images that I took from the article:


Mindmaps: Trendmap 2007








Displaying news: Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator. The size of data blocks is defined by their popularity at the moment.






Digg Stack: Digg stories arrange themselves as stack as users digg them. The more diggs a story gets, the larger is the stack.




Displaying Data: Amaztype, a typographic book search, collects the information from Amazon and presents it in the form of keyword you’ve provided. To get more information about a given book, simply click on it.





Time Magazine uses visual hills (spikes) to emphasize the density of American population in its map.








Comments

Personally I feel that with all these nice graphics, it make it very attractive for viewers to see but of course, the visualization has to bring across the message clearly, so these graphics should not be too complicated and difficult for readers to read and understand.

Week 9

TOWARDS A SUCCESSFUL BI IMPLEMENTATION
http://www.infosysblogs.com/eim/2009/04/towards_a_successful_bi_implem.html

When companies grow they need systems that will streamline and optimize operation performance, help them in making better decision based on data and trends and come up with strategies that are in line with the business goals of the organization. This is possible through Business Intelligence.

In an effort to implement a BI system, companies either plan to (1) have a team that will be dedicated to write the software, (2) outsource the job of developing the systems to a vendor or (3) implement a product that will meet the BI requirements. Option (3) is, in most cases, better than option (1) and (2) because it can be implemented faster and in most cases will cost lesser.

The most important factors of a BI implementation is to know and understand why BI is required, the goals that should be met, the strategies that will help in meeting the goals and product that will fit the immediate and the long term needs of the organization. Reports say that companies, over a period of time have purchased and implemented different BI products under different leaderships. CIOs come and go and they leave their mark in the company in terms of a product implementation or changes to the software environment based on what they feel are the best.

This drains out the resources of the company and never gets them what they really want. In this article, I will try to focus on how a company can take the right steps to successfully implement a BI system that will meet the strategic business goals of the company.


Most of the companies get all their reporting needs by having someone write SQL queries against the database and export the reports to excel. When their business and database grow, their current systems fail to meet all their reporting needs. Some reports become very complex to be generated. Some might take a very long time to get generated. In such cases, companies typically give the responsibility of implementing a BI system to one of the IT managers who would then consult a BI product company or a System Integrator for advice and implementation.

During the engagement, a BI product is identified (not using scientific means, but in most cases driven by budget availability) and implemented to take care of their operational and other reporting needs. The advantage is that this approach will meet the short term goals and possibly improve report generation speed and also make the generation of a few complex reports possible.

But the disadvantage is that this approach (1) will not meet the long term needs (2) will not, in most cases, have full support from the management and the users because they don’t understand the benefits or none are consulted, (3) will be used mostly as a reporting system without taking the full advantage of BI features and (4) since the IT managers have minimal business knowledge the final system fail to deliver the expected business benefits.


Summary

This article is basically talking about to help the business in making better decision based on data and trends, it would be better to implement BI into the system. The article also mentioned that in order to do so, company either have a team to write the software or outsource the job to a vendor, otherwise implement a product that will meet BI requirements. It also mentioned about how CIOs may drain out the resources so it is important for company to take the right steps when implementing the BI system that will fulfill the strategic business goals of the company.


Comments

I think this topic teaches me a very important lesson which is even though you have all the resources and with the latest technology, a team that can work together and getting the support from the executives are crucial in making the implemenation successful. I think that it is important for the team to have a balance of skills whereby both business and IT people have to be in the team. As mentioned in the article, the final system will fail if the IT manager has minimal business knowledge as the systems fails to meet the expected business benefits. So this topic is very useful and important to business if the company wants to implement BI system.

Week 8

Article: Text Mining Improves Business Intelligence and Predictive Modeling in Insurance

Information Management Magazine, July 2003

Business intelligence and statistical analysis techniques are running out of steam. Or at least that appeared to be the case.

Fireman's Fund Insurance Company, for example, tried a wide range of analytic techniques to understand rising homeowner claims and suspicious auto claims, but could not find predictive patterns in the data. The insurance company's team of analysts, led by one of the authors (Ellingsworth), realized the problem was not with their techniques, but with their data. The analysts were dealing with new types of claims that were not fully described by the structured data collected by the company. Fortunately, the additional information was available in adjuster notes and other free-form texts.

To satisfy the accuracy needs of the modeling programs, the company used basic text mining techniques to isolate new attributes from the text and then combined those with previously available structured data to expand the total amount of relevant usable information. The thinking was that if business intelligence techniques seem inadequate, one should just build a better mousetrap. Fireman's Fund subsequently discovered that success might just mean paying closer attention to the supply chain of information where basic data features originate.

In this article, we will describe a basic text mining technique, term extraction, and discuss how it was successfully used at Fireman's Fund to gain insights into urgent business problems. We will also provide some tips that may be of value when introducing text mining to your own organization.

Term Extraction

Term extraction is the most basic form of text mining. Like all text mining techniques, this one maps information from unstructured data into a structured format. The simplest data structure in text mining is the feature vector, or weighted list of words. The most important words in a text are listed along with a measure of their relative importance. For example, consider the following hypothetical claims adjuster notes:

"The claimant is anxious to settle; mentioned his attorney is willing to negotiate. Also willing to work with us on loss adjustment expenses (LAE) and calculating actual cash value. Unusually familiar with insurance industry terms. Claimant provided unusual level of details about accident, road conditions, weather, etc. Need more detail to calculate the LAE."

This text reduces to a list of terms and weights as shown in Figure 1.This list of terms does not capture the full meaning of the text, but it does identify the key concepts mentioned. To identify key terms, text mining systems perform several operations. First, commonly used words (e.g., the, and, other) are removed. Second, words are stemmed or replaced by their roots. For example, phoned and phoning are mapped to phone. This provides the means to measure how often a particular concept appears in a text without having to worry about minor variations such as plural versus singular versions of words.
Figure 1: Example List of Terms and Weights
The final step calculates the weight for each remaining term in a document. There are many methods for calculating these weights, but the most common algorithms use the number of times a word appears in a document (the term frequency, or tf factor) and the number of times the word appears in all of the documents in a collection (the inverse document frequency, or idf factor).1 In any event, large term frequency factors increase the weight of a term while large inverse term frequency factors lower the weight. The general assumption behind this calculation is that terms that appear frequently in a document describe distinguishing concepts unless those terms appear frequently across all texts in the collection.

For another example, consider a workers' compensation claims system. As with other insurance applications, this would track demographics about claimants, location of the accident, type of accident, etc. It may also include Boolean indicators for common conditions involved in past claims, such as slippery floor; but there are practical limitations to the number of such indicators – therefore, free-form text is used for additional details.

Narratives could be used to describe activity prior to the accident, unusual environmental conditions, distracting factors, etc. Term extraction could identify key terms in each narrative (e.g., turning, bending, twisting prior to the accident; leaks, ambient temperature, wind conditions in the environment conditions notes; and noise, foot traffic and other distracting factors in the final narrative). By mapping the free-form text to a feature vector, the text is modeled in the same attribute/value model used by structured data and thus lends itself to analysis using traditional business intelligence tools such as ad hoc reports, OLAP analysis, data mining and predictive modeling.
Applications of text mining are not limited to claims processing. Many business transaction applications, such as customer relationship management, e-mail responses, clinical records and enterprise resource planning (ERP), include both structured data (such as numeric measures and coded attributes) and free-form annotations. CRM systems may track detailed descriptions of customer complaints, doctors may note variations in symptoms or special instructions in a patient's chart and ERP systems might track notes on problems in production runs. Free-form notes are used frequently because we cannot always determine all the attributes relevant to a business process.

In some cases, relevancy changes with time. When suits were brought against Firestone for faulty SUV tires, Fireman's Fund turned to free-form text analysis to determine if any of their claims related to the litigation. Unpredictable cases such as this are candidates for text mining-based analysis.
Fireman's Fund Matches Techniques to Problems

Mastering information is a critical competency for success in the insurance industry. As part of an internal consulting group, Ellingsworth is often faced with making new headway on old problems. These problems typically take the form of making predictions about expected claims and understanding why outcomes vary from those predictions. Only in understanding why the outcomes are unmatched can they craft a set of alternative management solutions.
Text mining helps the Fireman's Fund in at least these three ways: extracting entities and objects for frequency analysis; identifying files with particular attributes for further statistical analysis; and creating entirely new data features for predictive modeling. The first method was used in the Firestone case.

The second method was used when the insurer saw the cost of homeowners' claims soaring in a single state. When the traditional reports failed to provide clarity, the frontline staff was polled to provide suggestions. They indicated that a new type of claim was emerging which involved mold. The effect trailed the occurrence, meaning that by the time it became a serious issue, many cases were already on the books.

Once the company realized the potential liability, it began to examine past claims in an effort to identify claims that required central tracking. Unfortunately, no structured code existed for categorizing and tracking mold risk. The level of effort required to manually examine cases from the prior two years to tag them for this risk was unreasonable. However, by using a handful of known examples, analysts identified patterns in claims using text mining techniques and were able to search for additional files with those patterns. This first-pass filtering was not perfect, but it did yield a much smaller list of files that could be manually coded. While pattern matching based on unstructured data works in some cases, other business problems require more integration of structured and unstructured data.
Analysts with Fireman's Fund ran into a wall when trying to build a model to predict suspicious claims in third-party automobile accidents. After modeling with all the available structured data, the models were only marginally useful, and the team was desperate to try new approaches. During a test and validation iteration, analysts observed an interesting phenomenon. Investigators were reading the claim file in order to further categorize cases identified by the model. Then the investigators assessed the behaviors of the claimants and the facts of the claim scenario. This led to the notion that specific recurring themes in the story of the claim were their triggers for further research. That behavioral set prompted the analysts to realize that those features had to be exposed and added to the modeling process. The result was a model that could identify useful referrals that would be kept up to date as new information was added to the files in unstructured form over the life of the claim.

Lessons Learned

Text mining has succeeded at Fireman's Fund because they focused on business fundamentals. If you are hitting the wall with structured data analysis, consider these tips.

First, focus on enhancing the gains of high economic value projects that are already in place. Marginal improvements through the intelligent use of unstructured data can improve ROI. With these near-term identifiable wins, you can fund further research.

Second, consider which projects failed due to lack of detailed data. Can text mining and term extraction in particular create useful data features that allow you to discover heretofore unknown analytical insights?
Third, remember the keys to success in any information technology project: people, process, technology, philosophy and environment. This is a specialized area, and few organizations are equipped with the right talent to succeed without investing in the ongoing education of their business intelligence analysts (assuming they have them). The processes of information extraction and text categorization are supported by many software vendors. However, the creation of company-specific resources, such as a robust predictive taxonomy, requires at least several iterations with subject-matter experts and automated tools.

Fourth, look for approaches that embed ongoing feedback. Such feedback provides a chance for continued improvement and also permits monitoring for drift in vocabulary and for detecting new topics of interest.

Finally, watch for key indicators of projects to avoid. These include:

- Lack of an executive sponsor.
- Lack of a method to show the value to the sponsor.
- Lack of in-house resources.
- A determination to "do it all yourself."
- Fear of finding a qualified consulant
Text mining is a powerful technique for expanding the range of data we can analyze. Often, the information we need to understand a business process is available to us; we just are not looking in the right spot for it. As Fireman's Fund has shown, text mining complements existing techniques. Solutions to apparently impenetrable problems are found when both structured and unstructured data are used. Sometimes you need more than just a better mousetrap – you need better mice.
Summary
This article is about how Text Mining help insurance company minimise liability in suspicious claims from customers. So what did the company did was to use the basic text mining technique, text extraction, to isolate new attributes from the text and then combine the previously available structured data to expand the total amount of relevant usable information. Thus, the insurance company, Fireman's Fund benefitted from text mining in three ways: extracting entities and objects for freuqency analaysis; identifying files with particular attributes for further statistical analysis; and creating entirely new data features for predictive modeling.
Comments
The things that we learned from BI lectures and tutorials are useful because what we learned is the theory, but what the world is doing with BI is they applied it in their business to help increase their sales and decrease their liability. By looking at this article, I realised that what we learned during lesson, the Text Mining Algorithm is so helpful to business, it helps in increasing their efficiency in their operations. So BI is a technology that a business should use and not applying it. Also the application of text mining not only limit to one area but also include business transactions such as Customers Relationship Management (CRM) and clinical records. So text mining is one of the useful BI techniques for business.

Wednesday, February 10, 2010

Week 7

The article I found is "Neural Networks in Anaemia Classification".

Neural Networks in Anaemia Classification
1.0 Introduction

The pioneering work of neural network in the modern era has started since 1943 by McCulloch and Pitts. To date, there has been an explosive growth research in this field and has attracted many investigators, including academician, physicians, psychologist, and neurobiologist beginning early 1980s. An approach to the pattern recognition problem was introduced by Rosenblatt (1958) in his work on the perceptron and they are now many successful projects and on-going projects that utilized the ability of neural networks in their applications.

The applications of neural networks are almost limitless but they fall into several main categories like classification, modeling, forecasting and novelty detection. Some examples of successful applications include; credit card fault detection (Alaskerov et al., 1997), pattern recognition (Ramli et al., 1996, Rietveld et al., 1999), handwritten character recognition (Le Chun et al, 1990; Tay & Khalid, 1997; Karim et al., 1998), colour recognition (Yaakob et al., 1999), and share price prediction system (Sanugi et al., 1996; Lim et al., 1996) and others. Many researchers have compared Artificial Neural Netwoks (ANNs) and Logistic Regression (LR) models. They have shown that neural networks are able to make a better generalization over the traditional statistical methods such as regression techniques (Lapuerta et al., 1995; Shanker, 1996; Lapuerta et al.; 1997, Armoni, 1998)

2.0 Neural Networks in Medical

One of the major goals of observational studies in medicine is to identify patterns in complex data sets. Literatures have shown that medical has benefited much from this technology. It has been successfully applied to various areas of medicine to solve non-linear problems. The applications include prediction of diagnosis such as cancers (Astion et al., 1992; Wilding et al, 1994), the onset of diabetes melitus (Shanker, 1996), survival prediction in AIDs (Ohno-Machado, 1996), eating disorders (Buscema et al., 1998) and others. Applications in signal processing and interpretation involve EEGs or electroencephalogram analysis (Makeigh et al., 1996), ECGs or electrocardiograms (Bortolan et al., 1991), EMGs or electromyelogram (Chiou et al., 1994), and EGGs or electrogastrograms classifications (Lin et al., 1997).

Performance of the neural network strategy has shown higher performance than Cox regression models in predicting clinical outcomes of the risk of coronary artery disease (Lapuerta et al., 1995). In addition to this study, Lapuerta et al. compared the prediction of survival of neural networks and logistic regression models on alcoholic patients with severe liver disease. The study reveals that neural networks were more successful in classifying patients into low and high-risk group.

A similar study carried out by Armoni (1998) shows that neural network prediction was more accurate than linear regressions for prediction the diagnostic probabilities of insulin-dependent diabetes mellitus. The results suggest the use of a neural network should be considered whenever prediction of diagnosis is required. In the area of medical image processing, Doffner et al. (1996), demonstrated that neural network can be effectively be used as a tool in medical decision-making. They applied neural network in the interpretation of planar thallium-201 scintigrams for the assessment of coronary artery disease.

3.0 Application in Haematology

There is an attempt to create an expert system to diagnose classes of anaemia and report presumptive diagnoses directly on the haematology form (Birndort et al., 1996). The purpose is to simulate the processes of human experts that can reliably achieve diagnostic separability by pattern analysis. In doing this, they constructed a hybrid expert system combining rule-based and artificial neural network (ANN) models to evaluate microcytic anaemia in a 3-layered program using haematocrit (HCT), mean corpuscular volume (MCV), and coefficient of variation of cell distribution width (RDWcv) as inputs. These measurements are available as standard output on most haematology analyzers. Three categories of microcytic anaemia were considered, iron deficiency (IDA), haemoglobinopathy (HEM), and anaemia of chronic disease (ACD). The performance of the model was evaluated with actual case data. The results show that the model was successful in correctly classifying 96.5% of 473 documented cases of microcytic anaemia and anaemia of chronic disease. This result exhibits sufficient accuracy to be considered for use in reporting microcytic anaemia diagnoses on haematology forms.

The leukocyte-vessel wall interactions are studied in post capillary vessels by intravital video microscopy during in vivo animal experiments (Egmont-Petersen et al., 2000). Sequences of video images are obtained and digitized with a frame grabber. A method for automatic detection and characterization of leukocytes in the video images is developed. Individual leukocytes are detected using a neural network that is trained with synthetic leukocyte images generated using a novel stochastic model. This model makes it feasible to generate images of leukocytes with different shapes and sizes under various lighting conditions. Experiments indicate that neural networks trained with the synthetic leukocyte images perform better than networks trained with images of manually detected leukocytes. The best performing neural network trained with synthetic leukocyte images resulted in an 18% larger area under the ROC curve than the best performing neural network trained with manually detected leukocytes.

4.0 Anaemia Classification

"Anaemia" is a common medical problem. The word anaemia is composed of two Greek roots that together mean "without blood" (Ed-Uthman, 1998). Signs and symptoms of anaemia include weakness fatigue, palpitation, light-headedness, difficulty in swallowing, loss of appetite, nausea, constipation, diarrhea, stomatitis and others. The patient looks pale, the nail may be dry and brittle, and tongue may be inflamed. In severe anaemia, heart failure and swelling of both limbs can occur. In mild anaemia, none of the above signs and symptoms may appear (Orkin, 1992). According to him, patients with anaemia have a significant reduction in red cell mass and a corresponding decrease in the oxygen-carrying capacity of the blood. In General Medical Officer's manual (Luiken et al., 1999), anaemia is define as a decreased level of haemoglobin more than two standard deviations below the expected mean for age and sex. Anaemia itself is not a disease but a sign of disease (Rapaport, 1987; DeLoughery, 1999). This means underlying disease is presents that demand an explanation.

This paper presents an empirical evaluation on medical, particularly classification of anaemia patients using neural networks approach. The number of hidden units, learning rate and momentum are varied, so that more appropriate classification model is obtained. The information regarding anaemia cases were collected from haematology form which was used in the government hospital. Seventeen attributes were indentified and used in the training model. A total of seven hundred raw data of anaemia patients has been collected and preprocessed that includes data cleansing, data selection and data preprocessing. Data is partitioned into three data sets namely training set (80%), testing set (10%) and validation (10%) set. The training data is used to train the model while the validation data is used to monitor neural network performance during training. The test data is used to measure the performance of a trained model. The motivation here is to validate the model on a data set that is different from the one used for parameter estimation. Figure 1 shows the schematic diagram of multilayer perceptron with with 17 units of input layers, 15 units of hidden units and 8 units of outputs.



Figure 1: Schematic Representation of the model

5.0 Discussion and Conclusion

The highest performance was obtained when the number of hidden units is 15, learning rate is 0.7 and momentum is 0.1. The testing and generalization correctness is 71.56 and 72.78 respectively. This result has demonstrated the ability of multilayer perceptron for predicting classes of anemia and can be used by haematologist and other medical staff.

For future work, an improvement can be tackled in the following aspects to intensify the usefulness and the generalization of the model. In the pre-processing phase for example, the proper method was not used to remove the unequal distribution of the data set. The data was randomly removed using the ordinary statistical software. The data that has been removed may contain important information. A better method such as hierarchical neural networks (HNNs) as suggested by Ohno-Machado (Ohno-Mahado, 1996a) can provide a way of enhancing the sensitivity to rare categories without decreasing its specificity.



Summary

This article is about how neural network help in the medical by predicting and successfully diagnose illness such as cancers, AIDS, eating disorders etc. And when compared neural network with logistic regression, neural network is more successful in classifying alcoholic patients with severe liver disease into low and high-risk grup. The use of a neural network should be considered whenever prediction of diagnosis is required.

Comments

After reading this article, I think that using BI tools and techniques can help the society by predicting and diagnose disease to prevent a high number of people suffereing and dying from the disease when actually this can be prevented. In the past, I always thought that BI is something to help company increase their sales by predicting who are the potential buyers. Now I was truly fascinated such that technology can help human too from dying early. Learning regression can help us predict but neural network is even more accurate after I read this article. So I learned a lot also from finding articles too.


http://www.generation5.org/content/2004/NNinAnaemia.asp

Thursday, December 3, 2009

Week 6

I learned about that not all graph may be suitable to display certain information. There is a need to choose the appropriate graph to put in a dashboard. Also display the suitable kind of information on a dashboard whether it is text, graphics or both and organising the dashboard.

Below are some useful links:

This link shows the effective widgets that can be displayed on dashboard, also what are the common types of dashboard design.
http://www.tdwi.org/research/display.aspx?ID=7487


This link shows an example of a dashboard and commented on the good and bad of it.
http://www.dashboardzone.com/bad-dashboard-design


This link is the book that tells on the bad designs of dashboard and the graphs shown were also used in our lab exercise.
http://books.google.com.sg/books?id=XqvYS6KGFKEC&pg=PT59&lpg=PT59&dq=bad+dashboard+design&source=bl&ots=vnviJu0h-i&sig=PpHdi8Yxjq2CB6EF6CwWQalt6bo&hl=en&ei=x04qS5_kGMqLkAW-w4T2CA&sa=X&oi=book_result&ct=result&resnum=8&ved=0CCgQ6AEwBw#v=onepage&q=bad%20dashboard%20design&f=false

Tuesday, December 1, 2009

Week 5

This week I learned about the different types of visualisation that are useful for developing graphs and dashboard. The different visual attributes that can help us differentiate or highlight the important information in the dashboard at one glance.

Here are some of the useful links that guide us to have an appropriate dashboard:

http://www.enterprise-dashboard.com/2008/07/17/4-bi-worst-practices/

http://www.enterprise-dashboard.com/

http://dashboardsbyexample.com/

http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1362732,00.html