/home/leansigm/public_html/components/com_easyblog/services

SigmaWay Blog

SigmaWay Blog tries to aggregate original and third party content for the site users. It caters to articles on Process Improvement, Lean Six Sigma, Analytics, Market Intelligence, Training ,IT Services and industries which SigmaWay caters to

Good Statistical Practice

You can’t be a good data scientist unless you have a good hold on statistics and have a way around data. Here are some simple tips to be an effective data scientist:
Statistical Methods Should Enable Data to Answer Scientific Questions - Inexperienced data scientists tend to take for granted the link between data and scientific issues and hence often jump directly to a technique based on data structure rather than scientific goal.
Signals Always Come with Noise - Before working on data, it should be analysed and the actual usable data should be extracted from it.
Data Quality Matters - Many novice data scientists ignore this fact and tend to use any kind of data available to them, if always a good practice to set norms for quality of data.
Check Your Assumptions - The assumptions you make tend to affect your output equally as your data and hence you need to take special care while making any assumption as it will affect your whole model as well as results.
These are some of the things to keep in mind when working around with data. To know more you can read the full article by Vincent Granville athttp://www.datasciencecentral.com/profiles/blogs/ten-simple-rules-for-effective-statistical-practice

 

Rate this blog entry:
2744 Hits
0 Comments

A Guide to Choosing Machine Learning Algorithms

Machine Learning is the backbone of today’s insights on customer, products, costs and revenues which learns from the data provided to its algorithms. And hence algorithms are the next most important thing in data science after data.
Hence , the question which algorithm to use ? Some of the most used algorithms and their use cases are as follow :

1) Decision Trees - It’s output is easy to understand and can be used for Investment decision ,Customer churn ,Banks loan defaulters,etc.

2) Logistic Regression - It’s a powerful way of modeling a binomial outcome with one or more explanatory variables and can be used for Predicting the Customer Churn, Credit Scoring & Fraud Detection, Measuring the effectiveness of marketing campaigns, etc. ,

3) Support Vector Machines - It’s a supervised machine learning technique that is widely used in pattern recognition and classification problems and can be used for detecting persons with common diseases such as diabetes, hand-written character recognition, text categorization, etc. ,

4)Random Forest: It’s an ensemble of decision trees and can solve both regression and classification problems with large data sets and used in applications such as Predict patients for high risks, Predict parts failures in manufacturing, Predict loan defaulters, etc.


Hence based on your need and size of your dataset , you can use the algorithm that is best for your application or problem.
You can read the full article by Sandeep Raut at http://www.datasciencecentral.com/profiles/blogs/want-to-know-how-to-choose-machine-learning-algorithm

 

Rate this blog entry:
3175 Hits
0 Comments

Importance of Data Preparation

Data is the backbone of analytics and machine learning and hence one of the most important tasks in analytics is to get the right kind of data and in the required format.The importance of data can be understood by the fact that around 60 to 80 percent of the time of an analyst is spent in preparing the data.
What exactly is data preparation? In a nutshell, it is the process  of collecting, cleaning, processing and consolidating the data for use in analysis. It enriches the data, transforms it and improves the accuracy of the outcome.
How is it done? It is mostly done through analytics or traditional extract, transform and load (ETL) tools. ETL tools include self-service data preparation tools, data cleansing and manipulation tools, etc.
Since data is the foundation of the analytics, right data will helps in analysing the situation better and help organizations in reacting positively to the market shifts.
To know more read the full article by Ashish Sukhadeve (business analytics professional) at: http://www.datasciencecentral.com/profiles/blogs/why-data-preparation-should-not-be-overlooked

 

Rate this blog entry:
2970 Hits
0 Comments

Are You Careful Enough

Analytics is one of the of the most hot topic of the 21st century and it’s starting to become the second currency to  various organisation, but despite having so much knowledge we prone to create some blunders , they are broadly categorised as Data Visualization Errors (Erroneous Graphs) and Statistical Blunders.
Data Visualization Errors (Erroneous Graphs): This is one area that can give a nightmare to both the presenter as well as the audience. Incorrect data presentation can screw the intuition and can also lead to  misinterpretation of data by the audience and can leave the organisation with results which are practically useless for them.
Statistical Blunders Galore: This is probably a “no blunders zone” where one would not want to make false assumptions or erroneous selections and is easily one of the most error prone section. Statistical errors can be a costly affair to both the organisations as well as the audience, if not checked or looked into it carefully and hence must.
To know more read the full article by Sunil Kappal (author) at :http://www.datasciencecentral.com/profiles/blogs/the-most-common-analytical-and-statistical-mistakes

 

Rate this blog entry:
2879 Hits
0 Comments
Sign up for our newsletter

Follow us