Tuesday, 16 April 2019

Leveling the E-Commerce Pitch

Author: Abhishek Sharma, Business Analyst at Quadratyx



Are you accustomed to huge discounts, exclusive deals, cashback and flash sales? Then it might not be good news for you. The government in its new policy on FDI in E-commerce has made announcements which may provide relief to the competitors but may not be that beneficial for the consumers.

In the recent announcement dated 26.12.2018, the Ministry of Commerce and Industry has announced that E-commerce entities would engage only in Business to Business (B2B) e-commerce and not in Business to Consumer (B2C) e-commerce. This means that going forward, the E-commerce platform will serve only as a platform between the buyer and the seller.


An E-commerce site works on two main models:
  1. Inventory based model of e-commerce: An e-commerce activity where inventory of goods and services is owned by e-commerce entity and is sold to the consumers directly.
  2. Marketplace based model of e-commerce: This model provides a platform by an e-commerce entity on a digital & electronic network to act as a facilitator between buyer and seller.
The leaders like Amazon and Flipkart are operating on inventory-based model where they have major stakes in their in-house vendors Cloudtail and W.S. Retail respectively. It would be interesting to see how these sites will operate when they cannot exercise ownership or control over the inventory. However, there is a limit of 25% on this control.

Now, some of you may be wondering why it is done. Is government discouraging purchases from E-Commerce? Certainly, the answer is ‘No’. The government is trying to level the field for all players in the playground and restricting the big players from gaining monopoly. So, those of you, who are wondering about the huge discounts and a low price compared to the physical market, certainly have a myopic view. If you look at the profitability analysis, then you must know that all these big players are yet to be profitable and are running for losses.


So, how are these players able to survive even with continuous losses? The answer lies in the FDI which acts as a backbone and prevents the players from phasing out from the market. I have deliberately taken a picture of 2013-2014 profitability analysis to emphasize on their past profitability and make you aware that they are still unprofitable. Now, two questions arise:(a) Why are these E-Commerce companies running in continuous losses and still surviving? and (2) Why are there FDI inflows by the investors, knowing this fact?

 The answer to the first question is Predatory Pricing, which hinders the entrance of niche players in the market. So, they are operating from a long-term perspective, trying to achieve a monopolistic structure. Also, though their losses are increasing, their revenue from operations are also increasing. Also, there are no free lunches in economics and everything has a price. We don't even realize how habitual we have become of ordering things online and how difficult it would be for us to make offline purchases standing in long queues, visiting a physical store and spending a good amount of time.

 To answer the second question, let us understand, what these companies have achieved through all the promotional cashback, flash sales, exclusive deals, etc. They have a huge customer base - with segmented customers with their preferences - which is growing day by day. In the modern-day data-generating era, it is like discovering a gold mine. Targeted advertisements alone can fetch millions of rupees or even more. And what is now seen as losses should be considered as the cost of customer acquisition. Amazon spends nearly Rs 1.90 on every rupee spent on its customer, whereas for Flipkart, it stands at Rs 2.23. Also, with a market like India where the population is comparatively young, and more than 13 crore (approx 10.1%) people have Internet connection, it is quite lustrous to invest in such a market.

 The implementation of this policy will allow small retailers to exist and operate with these huge businesses, but the question still remains about their profitability. The policy may provide them with a level pitched but might not be able to provide them with equal conditions - early entrance, customer base, customer loyalty, customer service, etc. - all these require initial investments which reap benefits over time.

It would be interesting to note the changes that these E-commerce companies will bring to their revenue models, as a maximum limit of 25% is set on the inventory. Are they going to use advertising for revenue or are they going to cut down on their company costs with laying off employees?

Although, being the early entrants, they definitely have an upper edge over its competitors. We can also expect them to come up with some exciting feature which may give them advantage over others like, how Flipkart introduced the cash-on-delivery feature in India, realizing India as a cash economy. All this is much anticipated and whether these measures will achieve their intended objectives will show up in the long run.


Thursday, 17 December 2015

Data Scientist VS Data Analyst

"We live in a data-driven world. Increasingly, the efficient operation of organizations across sectors relies on the effective use of vast amounts of data. Making sense of big data is a combination of organizations having the tools, skills and more importantly, the mindset to see data as the new "oil" fueling a company. Unfortunately, the technology has evolved faster than the workforce skills to make sense of it and organizations across sectors must adapt to this new reality or perish."

     --Andreas Weigend, Ph.D Stanford, Head of the Social Data Lab at Stanford, former Chief Scientist Amazon.com



Data Analysts:

Data analysts translate numbers into plain English. Every business collects data, whether it's sales figures, market research, logistics, or transportation costs.
A data analyst's job is to take that data and use it to help companies make better business decisions.

Tasks:
  • Analyze the data provided.
  • Build dash boards and generate reports.
  • Use descriptive statistics to summarize the data.
  • Use inferential statistics to do sample to population inference; if it is survey data.
  • Traditional statistical techniques like logistic regression is used for classification and ARIMA for time series forecasting.
  • They predominantly work on enterprise commercial software(Tableau, qlikview, SAS).
  • In most of the cases, they deal with clean data.
  • Role is restricted to a particular task or particular domain.

Data Scientist:

A data scientist represents an evolution from the business or data analyst role.

Data science is, in general terms, the extraction of knowledge from data. The key word in this job title is "science," with the main goals being to extract meaning from data and to produce data products.

Tasks:
  • Since they work across various domains, first step in a project is understanding domain.
  • Visualizing the data using tools like R, Python, Tableau, QlikView.
  • Data Understanding by plotting on outcome variables, and using descriptive statistics.
  • Data Pre-processing for cleaning and transforming data in meaningful way.
  • Feature Engineering to extract the features out of data.
  • Feature Selection to extract the important features using machine learning and statistics.
  • Train, test and validation data sets creation.
  • Machine learning algorithms to build a prediction model
  • Design data products for helping customers make right decisions.
  • Work more with open source softwares like R for machine learning and python for machine learning and text analytics.
  • Data Scientists even work with Big Data.
  • They are familiar with hadoop and related stack.
  • The data scientist role has been described as “part analyst, part artist.
  • Data scientist will most likely explore and examine data from multiple disparate sources.
  • A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends insights generated from data.
  • Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.
  • Creativity with common sense are the strengths of data scientist.
  • Out of box thinking.
  • In short Data Scientist are unicorns in this data world.





Author: Saikumar Allaka

Time Series Interactive Plotswith R

Time Series Interactive Plots with R dygraphs
R is a leading statistical analysis software which comes with many new packages. I recently got to know about “dygraphs” which create beautiful time series visualizations.
Plots can be zoomed, highlighted, and many more can be done.
These interactive plots provide users to play with plots and understand in a better way.
I see this plots are equally dazzling as Tableau and Qlikview does.
I choose walmart sales data which you can download from:

Data Processing

#### load the needed libraries####
library("plyr")
library("reshape")
## 
## Attaching package: 'reshape'
## 
## The following objects are masked from 'package:plyr':
## 
##     rename, round_any
library("dygraphs")
library("xts")
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
setwd("D:\\Personal\\Blogs\\Jan_05_2015")

#### load the datasets####
walmartSales <- read.csv("WalmartSales.csv")
features <- read.csv("features.csv")


#### look of the data#####
head(walmartSales)
##   Store Dept       Date Weekly_Sales IsHoliday
## 1     1    1 2010-02-05     24924.50     FALSE
## 2     1    1 2010-02-12     46039.49      TRUE
## 3     1    1 2010-02-19     41595.55     FALSE
## 4     1    1 2010-02-26     19403.54     FALSE
## 5     1    1 2010-03-05     21827.90     FALSE
## 6     1    1 2010-03-12     21043.39     FALSE
#### number of stores and departments#####
unique(walmartSales$Store)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
unique(walmartSales$Dept)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 16 17 18 19 20 21 22 23 24
## [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 40 41 42 44 45 46 47 48 49
## [47] 51 52 54 55 56 58 59 60 67 71 72 74 77 78 79 80 81 82 83 85 87 90 91
## [70] 92 93 94 95 96 97 98 99 39 50 43 65
#### Average sales in each store across all departments ###

AvgSalesStore<-ddply(walmartSales,.(Store,Date),summarise,sum(Weekly_Sales))

##Change the colname
names(AvgSalesStore)[3]<-"sales"

## Merge the data sets

mergedSales <- merge(AvgSalesStore,features,by = c("Store","Date"))

## glance of AvgSalesStore
head(mergedSales)
##   Store       Date   sales Temperature Fuel_Price MarkDown1 MarkDown2
## 1     1 2010-02-05 1643691       42.31      2.572        NA        NA
## 2     1 2010-02-12 1641957       38.51      2.548        NA        NA
## 3     1 2010-02-19 1611968       39.93      2.514        NA        NA
## 4     1 2010-02-26 1409728       46.63      2.561        NA        NA
## 5     1 2010-03-05 1554807       46.50      2.625        NA        NA
## 6     1 2010-03-12 1439542       57.79      2.667        NA        NA
##   MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 1        NA        NA        NA 211.0964        8.106     FALSE
## 2        NA        NA        NA 211.2422        8.106      TRUE
## 3        NA        NA        NA 211.2891        8.106     FALSE
## 4        NA        NA        NA 211.3196        8.106     FALSE
## 5        NA        NA        NA 211.3501        8.106     FALSE
## 6        NA        NA        NA 211.3806        8.106     FALSE

Store 3 time series plot

## Select store 3
    store3 <- mergedSales[mergedSales$Store==3,]
    names(store3)[3] <-"store3"
    
    head(store3[100:106,])
##      Store       Date   store3 Temperature Fuel_Price MarkDown1 MarkDown2
## 3246     3 2011-12-30 410553.9       48.29      3.129   1449.16  14356.07
## 3247     3 2012-01-06 398178.2       52.42      3.157   3824.22   9389.98
## 3248     3 2012-01-13 367438.6       51.86      3.261   3927.76   3323.64
## 3249     3 2012-01-20 365818.6       56.20      3.268    720.82   1629.26
## 3250     3 2012-01-27 349518.1       58.06      3.290    774.55    349.99
## 3251     3 2012-02-03 424960.7       59.33      3.360   9667.50    268.29
##      MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 3246     66.66     81.43    777.35 223.0091        7.197      TRUE
## 3247     15.00    415.36   1476.85 223.1923        6.833     FALSE
## 3248        NA    226.97   1725.19 223.3755        6.833     FALSE
## 3249      1.46     16.05    799.50 223.4701        6.833     FALSE
## 3250        NA      4.00    328.72 223.5646        6.833     FALSE
## 3251      0.60   8368.15    804.24 223.6591        6.833     FALSE
    ## Create xts object
    store3_xts<-xts(store3[,-2],order.by=as.POSIXct(store3$Date))
    
    class(store3_xts)
## [1] "xts" "zoo"
## Time Series plot
    dygraph(store3_xts[,2],col)
## Warning in FUN(X[[4L]], ...): converting an R function to JSON as null. To
## change this, define a method for toJSON() for a 'function' object.

Store 16 time series plot

## Select store 16
    store16 <- mergedSales[mergedSales$Store==16,]
    names(store16)[3] <-"store16"

    head(store16[100:106,])
##      Store       Date  store16 Temperature Fuel_Price MarkDown1 MarkDown2
## 1101    16 2011-12-30 665861.1       23.91      3.119    921.67  11778.68
## 1102    16 2012-01-06 564538.1       26.49      3.095    443.59   5524.49
## 1103    16 2012-01-13 508520.1       19.55      3.077   1353.58   2551.73
## 1104    16 2012-01-20 474389.8       29.30      3.055   2087.19   1399.47
## 1105    16 2012-01-27 453979.2       28.17      3.038    612.01   1011.32
## 1106    16 2012-02-03 475905.1       25.53      3.031  32702.46    747.82
##      MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 1101    101.87    272.51   1673.25 196.3586        6.232      TRUE
## 1102     15.70     35.25   3296.96 196.5458        6.162     FALSE
## 1103      0.24    289.67   1899.49 196.7331        6.162     FALSE
## 1104      5.58     75.83   4742.74 196.7797        6.162     FALSE
## 1105        NA      8.41   2482.91 196.8263        6.162     FALSE
## 1106      2.39   8063.07   2940.68 196.8729        6.162     FALSE
    ## Create xts object
    store16_xts<-xts(store16[,-2],order.by=as.POSIXct(store16$Date))
    
    class(store16_xts)
## [1] "xts" "zoo"
## Time Series plot
    dygraph(store16_xts[,2])

Store 20 time series plot

    ## Select store 20
    store20 <- mergedSales[mergedSales$Store==20,]
    names(store20)[3] <-"store20"

    head(store20[100:106,])
##      Store       Date store20 Temperature Fuel_Price MarkDown1 MarkDown2
## 1816    20 2011-12-30 2043245       36.35      3.389   6918.65  97740.99
## 1817    20 2012-01-06 1964702       33.42      3.422   9273.46  45971.43
## 1818    20 2012-01-13 1911511       37.79      3.513   3627.81  18632.14
## 1819    20 2012-01-20 1892776       27.65      3.533   2465.98   5030.40
## 1820    20 2012-01-27 1761017       37.19      3.567   1023.60   3044.15
## 1821    20 2012-02-03 2203523       39.93      3.617  58928.52   8766.77
##      MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 1816    270.38   1648.03   2852.19 212.4036        7.082      TRUE
## 1817    207.60   1365.38   5460.86 212.5711        6.961     FALSE
## 1818    108.48    847.70   3614.04 212.7386        6.961     FALSE
## 1819    106.54    999.54   3380.92 212.8336        6.961     FALSE
## 1820    106.08     52.87   2447.73 212.9286        6.961     FALSE
## 1821    226.27  53603.99   8301.25 213.0236        6.961     FALSE
    ## Create xts object
    store20_xts<-xts(store20[,-2],order.by=as.POSIXct(store20$Date))
    
    class(store20_xts)
## [1] "xts" "zoo"
## Time Series plot
    dygraph(store20_xts[,2])

Store 8 time series plot

## Select store 8
    store8 <- mergedSales[mergedSales$Store==8,]
    names(store8)[3] <-"store8"
    
    head(store8[100:106,])
##      Store       Date   store8 Temperature Fuel_Price MarkDown1 MarkDown2
## 6249     8 2011-12-30 858572.2       36.33      3.129   3701.08  36381.83
## 6250     8 2012-01-06 872113.2       43.47      3.157   3093.39  26001.46
## 6251     8 2012-01-13 817661.8       36.46      3.261   6725.29  12764.99
## 6252     8 2012-01-20 813954.8       46.81      3.268   1643.08   1055.47
## 6253     8 2012-01-27 778178.5       45.52      3.290   6065.01    730.16
## 6254     8 2012-02-03 927610.7       45.56      3.360  41524.03   1412.41
##      MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 6249    164.34    178.40   1508.56 223.0577        6.123      TRUE
## 6250     96.35    500.91   4468.63 223.2410        5.825     FALSE
## 6251     15.98    299.73   3851.69 223.4243        5.825     FALSE
## 6252      9.02    411.88   3161.92 223.5188        5.825     FALSE
## 6253     15.18     30.28  24241.12 223.6134        5.825     FALSE
## 6254     83.15  18723.56   3886.31 223.7079        5.825     FALSE
    ## Create xts object
    store8_xts<-xts(store8[,-2],order.by=as.POSIXct(store8$Date))
    
    class(store8_xts)
## [1] "xts" "zoo"
## Time Series plot
    dygraph(store8_xts[,2])

Store 6 time series plot

## Select store 6
    store6 <- mergedSales[mergedSales$Store==6,]
    names(store6)[3] <-"store6"

    head(store6[100:106,])
##      Store       Date  store6 Temperature Fuel_Price MarkDown1 MarkDown2
## 5963     6 2011-12-30 1598081       46.80      3.129   6098.54  82881.16
## 5964     6 2012-01-06 1395340       50.82      3.157   3220.17  32056.43
## 5965     6 2012-01-13 1344243       48.33      3.261   4779.48  11967.81
## 5966     6 2012-01-20 1326256       55.37      3.268   3413.91   1540.99
## 5967     6 2012-01-27 1315611       53.95      3.290   1510.59    630.92
## 5968     6 2012-02-03 1496306       57.45      3.360  36848.38   1772.68
##      MarkDown3 MarkDown4 MarkDown5      CPI Unemployment IsHoliday
## 5963    326.68    814.58   2728.99 221.1283        6.551      TRUE
## 5964    108.24    985.76   6650.63 221.3088        6.132     FALSE
## 5965     65.72    575.13   4512.18 221.4893        6.132     FALSE
## 5966     33.02    449.06   6587.63 221.5831        6.132     FALSE
## 5967     37.10     17.00   2641.90 221.6769        6.132     FALSE
## 5968    161.55  28199.34   7151.33 221.7707        6.132     FALSE
    ## Create xts object
    store6_xts<-xts(store6[,-2],order.by=as.POSIXct(store6$Date))
    
    class(store6_xts)
## [1] "xts" "zoo"
## Time Series plot
    dygraph(store6_xts[,2])

Comparision of 5 stores sales

## let us compare sales of 5 walmart stores


AllStores <- cbind(store3_xts[,2],store16_xts[,2],store20_xts[,2],store8_xts[,2],store6_xts[,2])

dygraph(AllStores)  %>%
  dyLegend(width = 200)%>%
  dyHighlight(highlightCircleSize = 3, 
             highlightSeriesBackgroundAlpha = 0.2,
             hideOnMouseOut = FALSE)

Data set contains many stores, and many departments. You can explore more with the basic code and data provided.

Author: Saikumar Allaka