Data Science Is Much More Than Statistics

Until recently, statisticians were the de facto data doctors. Data were not widespread as much as it is now, it arrived clean, in orderly format, and only required the standard analysis and visualization. Now, this is no longer the case. Data availability, generation, format and usage are bringing new challenges and requiring a new super set of skills that traditional statisticians are unable to handle. The standard set of tools and formulas used by typical statisticians are unable to handle the shear volume of incoming data at its speed of generation. Moreover, analysis, visualization and knowledge discovery are not possible using the old fashioned set of tools and skills. Data scientists, the new and hot profession on job hunting lists are trying to tackle the data tsunami and turn the piles of raw data into actionable knowledge before it’s too late or just in time to beat competition or to help in money-saving decision making processes.data-science

Data Science does include, among other skills, statistical and mathematical knowledge but does not stop there. There is more science to it as it tries to extract meaningful data from various sources and format, and generate a knowledge product. In addition to domain knowledge, computing, visualization, modeling, data processing and management, analytics and machine learning are part of the essential skills that are mandatory for those who want to approach this new profession. Some universities are already offering course or even full degrees in data science.

Data science applications are not limited to security and safety. They span all disciplines from biology and medicine to sports and entertainment. Analysis and predictions are as good as the data scientists who produce them, so a skilled and experienced professional with a sense of data dynamics may see more in relevant data sets and by choosing meaningful visualizations over pretty displays. Companies that live on data, ex. Google, use machine learning and crowd sourcing to improve translation and natural language processing by inferring lots of information from human input and interaction. Even by typing in CAPTCHA challenges you may contributing to text processing!

For a data scientist, data discovery, acquisition and cleaning are just the first few steps in a long and resource demanding journey. You have to devise your own computational scripts and rely on well-founded algorithms to scrape and clean data as well as tell erroneous data or outliers. Imagine a tabulation on human heights and weights collected on a national scale for school health – you may not have the luxury to have all data points in centimeters and kilograms. You have to automate the cleanup process to account for most expected scenarios and use statistics to either isolate outliers or treat them manually.

Python is the language of choice for data scientists not only due to its features and easiness but probably for other trendy reasons – for example, the Google effect and job requirements. Python has many data and numeric features baked within the language and the community is providing great contributions to enrich the experience. The IPython notebooks with online rendering makes it a great choice over others (try the Anaconda distribution). R is also a good choice as well as any other comparative language. You need some scripting and data wrangling tools and skills as well. Usually, the source and size of data may influence your preference for one tool over the other.

For data visualization, several academic and commercial packages are available. Python itself is capable of analyzing and representing your data but you should not ignore standard tools like MS Excel. With Power Query (Excel 2010 or newer) and Power View (Excel 2013), you can acquire data from various sources and present it in various ways. Of course, data science is not necessarily akin to Big Data (how big is big?) – adequate data for the case at hand, that you can handle, is a good rule of thumb.

To be a data scientist, you need to be a data doctor with a special feel for sound data and ill-formatted data, what is likely to affect what and how to slice and dice the data. Data presentation and visualization, with an interactive touch is equally important. Sometimes, data interpretation and knowledge derivation depends on how you look at data and what possible hints you may have as to what may be hidden in data terrain.

It is ironic how we choose to participate in this datafication effort (providing data and allowing data collection about our activity online and offline) and end up paying for data products resulting from raw data we offer at no cost. There are moral and ethical issues around such activities that may surface and haunt us in the near future. What are your thoughts?

* Illustration from Berkeley Science Review. Check page for symbol explanations.

Posted in Big Data, Business Intelligence, Data Science, Power Query | Tagged | Leave a comment

Is Couponing Coming To The MENA Region?

It has been almost 13 decades since Coca-Cola issued the first known coupon ever back in 1887 – a free glass of genuine Coca-Cola. It took over 20 years for the next coupon to appear (one penny off Grape Nuts cereal). Now, with over 90% of Americans involved in couponing benefiting from around $4 billions in savings, there is no doubt we are looking at a significant business that has infiltrated the culture so deeply – even movies and shows are dedicated to this practice, not to mention a dedicated full month to celebrate couponing and spread awareness.

Market dynamics promoted the couponing industry; whether that was the great depression or chain supermarkets draining customers from local stores. And with the availability of the Internet in 1990s, both distribution and format of coupons were affected. A few major sites are now involved in coupon administration and distribution, digital coupons are accepted and mobile couponing is on the rise.

However, A few major concerns were brought about by couponing: extreme couponing and/or fraud and low redemption rates. Extreme couponing is usually fraudulent activity even if it may be possible on technical grounds. Fake coupons are easy with digital distribution and the many players, including bloggers, who promote and market coupons for a living. Several outlets charge merchants by coupon download rather than by actual transaction and low redemption rates (normal trend) result in losses to coupon issuers.

Since coupons are the means and not the end, one should ask the more fundamental question: what do both merchants and consumers want? Certainly, the coupon or voucher are just the evidence while the ultimate goal is to save the customer some money and both promote and increase the revenue for the merchant. Is there a better and easier way to do just that?

How did other cultures handle this? Well, many copied the American model at different stages but many preserved other ways of bargaining, promotion and money savings. We will not explore the reasons but suffice to say it usually had to do with market dynamics and cultural or social structure.

In the MENA region (Middle East and North Africa), which is mostly a similar but fragmented market, old business models prevailed until very recently. Small family businesses, personal relations, loyalty, peer recommendation and price haggling were, and are largely still, the norm. The best you would see are sale signs on occasions or when inventory is low or out of date (or fashion).

However, things started to change in the last decade as Internet penetration increased, satellite TV became widespread and branches of Western chain stores started to open in the region. You can easily spot offers in fast food outlets and supermarkets. Sale and offer promotions are distributed by all available media including social platforms. A few of the major coupon distributors are also expanding to the region with modest success and some local experiments are trying to take advantage of this marketing/savings tool.

Among the recent players with a special twist is Ezy Discount (http://www.ezydiscount.com/) which tries to capture the essence of discounts and offers in a way that appeals to both regional and global cultures. By addressing consumer privacy, adding location awareness and introducing simplicity and centralized control (by merchants), it eliminated many, if not all, of the concerns like extreme couponing, fraud and redemption. It also gave merchants the flexibility to advertise their own offers and discounts.

Couponing as a concept is still new (and unfamiliar) to this region. There is a lot to be done to shift the society from the traditional business of floating prices and bargaining to fixed prices and discounts. The education and awareness to be spent in this process is well worth it.

Disclaimer: the author is the founder of Ezy Discount.

Posted in Couponing, Local Offers, Nearby Deals | Tagged | Comments Off

The Internet of Things – Smart Objects and Dumb Users

The Internet of Things (IoT) is the thing that will change everything, including humans. Interconnecting uniquely identifiable embedded computing devices within the existing Internet infrastructure and allowing interaction and data exchange is said to be the recipe to facilitate life, improve business and raise efficiency.IoTs

Data is already the gold mine and the competition is between those who try to derive applicable knowledge from it in close to real time. It’s not only about how you collect data but when and where you do that. A practically good answer is around the clock and everywhere. You do need a lot of things to do that; and to be able to have the data at your finger tip immediately or sent to a particular thing, there is no better medium than the Internet.

At face value this looks so great and obviously, there are endless cases for peaceful and innocent uses for IoT. Imagine being able to receive early warning of natural disasters, receiving in-time health advice based on condition and location, tracking a lost car (or tank), an expensive machine able to fix itself, being able to watch your kids or pets from work and protect your home while on vacation or your fridge ordering healthy food only and specific for your taste – to name a few. The question worth asking is this: is thingful life the way to go?

The data flow remains at the center of this discussion because this is what IoT is all about. There are technical and legal aspects to this issue. Who has the right to access, process and utilize your data in the first place? And who can do that, not only by law, but through either technical abilities or security holes and other leaks in the systems?

You have a medical smart bracelet and soon you find out that not only your medical insurance retrieves all your medical history and other related habits, but also your local store, your employer, the person who competes with you in the next round and the handful of major advertisers who target all your devices and sensors with highly targeted (personalized) commercials.

Although the main driving force behind IoT is financial (commercial), the fear from falling into the wrong hands may outweigh the financial gains very soon. It may not take long before beneficiaries try to make it part of the everyday life everywhere – reduced cost of ownership, rules and regulations; and forcefully by channeling major services through dependent channels.

Major catastrophes may occur when IoT becomes the next generation of warfare. We have witnessed the development of unconventional weapons of the past, including biological and genetic ones. After laser guided missiles to target a specific coordinate, death merchants may have already found ways not only to target specific individuals but probably have choices of how to neutralize them, change their way of thinking or reverse engineer their plans. The same could even apply to certain races en masse (taking into account that many conflicts are racial and the various cases of genetic and biological experiments getting out of control). This assumption does not rule out economic conflicts or even financial ones – after all, many nations are practically bankrupt and traditional enemies are holding the keys to their financial respirators.

Before we are fully thingified, we should stop and reconsider carefully before we jump on this wagon. It’s not only our privacy that is at stake, but our health, possessions, values and personal lives. It’s not just a question of receiving uninvited solicitation or having your envious neighbor spoil your smart dish. Before even thinking about the future of our thingfully raised children, we should notice that the drums of the Third World War are already beating and very loudly. You can be tracked by the tweet you send, the photo you upload to your timeline, the smart watch around your wrist, a tiny chip in your belt, the chat you do online, the car you ride or even the dirty wall you pass by.

Of course, technology itself is not evil but those who abuse it are. You should outsmart any smart device you use or come close to. If not, you are just a victim waiting its turn. The new sensor is the next generation drone!

Posted in Internet of Things | Tagged | Comments Off