By Nusrat Rabbee (Statistician & Analytics Consultant, Rabbee Analytics) Big data consists of datasets that have grown so large that it is not feasible to use regular database management tools or desktop packages to make any sense out of them.
Since the beginning of the computer era, the fields of astronomy, physical sciences, biological sciences, genomics, classified research and many other areas have been using sophisticated techniques to analyze large amounts of data.
However the explosion of data generated by today’s population and transmitted by various mobile devices - is changing the very way we do business and interact with each other. The health sector, social networks, e-commerce, city and transportation authorities and ordinary businesses are emerging into the new era of big data. Big data is a relative term depending on the size of your business – but mostly it means giga/tera/petabytes of data coming in real-time to your servers.
The usual data mining tools and pre-programmed static reports of the previous decades are no longer sufficient to make sense of what is happening in the customer space in real time.
At the FutureMed2012 conference in the Silicon Valley, attendees talked about the tsunami of personal health information. "How does a doctor treat a patient who walks into the office with their genome?" asked Stephen Quake, professor of bioengineering at Stanford University.
Businesses and professionals need to leverage the tremendous power in the information contained in various technologies and deliver the best quality of products or services to their clients. By the same token these technologies are completely changing how we do business with our clients. Companies need to keep up with their competition and are understandably keen to retain their customers. Some businesses are focused on pinpointing areas of potential failures or risks in their big data way ahead of time!
Data analytics professionals can help in all these areas.
Data analytics is a rapidly emerging, interdisciplinary world facilitated by modern science and technology. The science and art of data analytics is booming. Advances in science and technology have transformed businesses and there is a growing appetite for statistically sophisticated algorithms to inform decision-making. You will hear positions advertised for information designers, data scientists, data analytics professionals, human factors engineers and such.
Lurking under many of these job functions are practitioners of statistics, mathematics, operations research, computer science or physics. Data analytics has many applications - such as text analytics, web analytics, business analytics, and more. You may recall the movie Moneyball, featuring sports analytics principles – championed by annual conferences on the topic at MIT.
Data analytics is really the business of studying the structure of historical data, understanding the business context of the data, developing predictive models for future behavior and then using the model to inform decision making for the enterprise.
The three power drills of data analytics are:
- Advanced analytics – New data types like location tags from your phone, Facebook or Google or the “Like” button in Facebook drive a massive amount of information within an organization. Conventional reporting tools can no longer retrieve this information and make sense out of it. New computational and statistical methods are needed for modeling this type of large data in real time. Advanced analytics are used to answer questions like:
- Who will buy more? Customer acquisition, development and retention are important.
- Detect short or long term cycles and the beginnings/ending of each.
- Where should we expand as business?
- Predictive Modeling - Predictive modeling can build the profile of a “normal” pattern and predict anomaly. The models can match your genome (biological, music, spending or social) by studying what already is known about you and match you to what you might like as well! The point of this exercise is not simply to build sophisticated models and identify patterns – but to actually deliver actionable intelligence for the business or the customer.
- Data Visualization - This is a huge area of growth in the era of big data. Journalist David McCandless has contributed much to the visualization of big data networks (like the billion dollar-o-gram, campaign donation maps) with his creative endeavor. This has really been an area where the creativity of science and art coming together in social network analysis. Text art like Wordle may use frequency analysis of text – but the genius lies in the artistic rendition of the chart.
Some key applications of data analytics today are marketing, strategic business planning, risk analysis, finance, audit and fraud detection, health care, city transportation, product design – the psychology, art and scientific aspects of product development require statistical/mathematical modeling of data.
The basic steps of a data analytics project involves (a) business requirements or problem definition, (b) data exploration and preparation, (c) deriving new variables (e.g., customer compliance rate), (d) predictive model development and deployment (e) analysis of results with the outcome of this step should be actionable result. There are a large number of statistical and machine learning methods, such as linear and non-linear regression, support vector machines, random forest and neural networks, available for use. There is no one size fits all and here is where data science is also an art and you need to consult an expert.
Consider hiring a qualified data analytics professional or assemble a team of analytics professionals in-house to develop and use those power drills needed to frame your big data. Many organizations may find it sufficient to hire an analytics consultant to come in and help with the development and deployment of the predictive analytics models as needed. Be sure to provide context for your business and data and ask for actionable results for enabling decision making at the end of the project.
If you are considering entering this field as a practitioner - you have made a good choice. Data analytics is not rocket science.
The US bureau of labor statistics has forecasted an increase in demand of 13% for statisticians, 22% for operations research analysts, and 24% for management analysts. You will need to:
- Learn methods and tools to deal with databases for big data, as well as take a few courses in applied statistics and concepts in probability.
- R, SAS, Matlab are some of the flexible, analysis frameworks for big data.
- New tools are continually emerging in this field. Finally the skills of business communication, human interaction and project management are essential for success here.
Many business schools and statistics departments are offering concentrated programs of specialization, such as the masters in applied statistics at the University of California, the master of science in analytics at North Carolina State University, the master of science in predictive analytics at Northwestern University – to name a few.
Editor's note: Got a question for our guest blogger? Leave a message in the comments below. About the guest blogger: Dr. Nusrat Rabbee is an expert statistician and analytics consultant. Nusrat specializes in developing sophisticated algorithms for data in higher dimensions for the purposes of predictive modeling. Nusrat holds a doctorate degree in Biostatistics from Harvard University and completed an NSF VIGRE postdoctoral fellowship at the University of California at Berkeley in the department of Statistics. Follow her on Twitter at @nrabbee.