Dec/090
Data Mining & its cousin – Inferential Statistics
Data mining as a field of significant academic research is on an exponential curve. Though the theoretical underpinnings of this concept have been around for a while the practical use of this field quickly reached a level where researchers and businesses are finding great value. Information explosion due to the success of the world wide web, general purchasing power of the common man, growing population, Globalization of commerce are only a few of the various reasons for this field to gain such popularity in such a short time.
Initially there was a lot of confusion about the difference between the fields of Inferential Statistics and Data Mining and researchers around the world immediately started working on producing tons of literature to address this issue. Though there is no single universally accepted definition for either of them, many researchers have defined these fields through their individual application perspectives. From what I’ve gathered and known about these fields, this is how I draw a line between them.
Difference between Data mining & Inferential Statistics:
- Data Mining is a field where you discover hidden patterns from already existing large data sets. These hidden patterns discovered are later used for analysis and decision making scenarios in the area of concern. The process of using the discovered hidden patterns is also called as ‘Knowledge Discovery’. Inferential Statistics is the field where you prove or refute a pre-conceived hypothesis (or a null hypothesis) by performing classical statistical methods on a sample of a given population size.
- Data mining starts at an already existing database (usually large datasets) and Inferential Statistics generates its own database using sampling methods on data set.
- Data mining methods employed like classification, clustering, etc., scan the entire dataset in search of hidden patterns while classical statistical methods are run over only a small section of the dataset (the sample).
The above point also infers that Data mining methods are more computer intensive as they have to run through large data sets and hence should be used only when really needed.
So when do you use data mining over inferential statistics? Well the answer is simple. If you don’t know what you are looking but want to make the best sense of the data you have then use data mining. If you know what you are looking for and want to back it up with proof by checking the data you have you should use inferential statistics.
Sep/091
Prime Minister curious about Twitter
Apparently our beloved Prime Minister Dr.Manmohan Singh wants to know more about the twitter phenomenon and has asked the Minister of State for External affairs minister Dr. Sashi Tharoor, who by the way is one of the celebrity power users of twitter in India. Though its interesting how twitter is attracting celebrities to signup, what is more important is that by this means scores of people are getting to talk to these celebrities and like wise the celebrities get to connect to people more on a one-one setting. This is truly a powerful medium for communication and I hope more celebrities, especially politicians take a leaf out of the Dr.Singhs book and start thinking about adopting and using this powerful tool. I've complied a small list of some of our Indian celebrities on twitter, feel free to add to the list if I missed someone.
Indian celebrities on twitter: