Data Analytics Architectures for E-Commerce Platforms in Cloud

Today, organizations not only need to manage larger volumes of data, but also generate insights from existing data. These insights help them understand better about their customers and predict market trends. With this initiative, they can take advantage of the cloud platform to achieve this goal because it manages higher data volume, speed and variation. This cloud platform enables them to provide elasticity and efficient computing and storage resources. They also provide many ready-to-use tools for building data analytics in various stages. Additionally, an on-demand pricing model allows organizations to pay for what they consume. It changes the organizational consumption model from capital expenditure to operational expenditure. It greatly minimizes initial capital investment to build data analytics solutions and implement other innovative ideas. This paper highlights the main reasons for encouraging organizations to build data analytics in the cloud. It also shows how to articulate data analytics frameworks for ecommerce platforms in the cloud and how to integrate machine learning models into data analytics processes, to create more sophisticated analyzes. AWS Amazon Web Services' premier public cloud platform is adopted to demonstrate these concepts and practices with real-life business cases.


Introduction
Over the past two decades, consumer shopping habits have shifted from in-store to online. This creates massive business changes or opportunities for retailers to be present online and businesses to drive sales and growth, especially as they expand their business globally [1]. However, as their online businesses start to grow with more concurrent transactions, they will face some distinctive technical challenges in terms of enabling platform scalability, changing cost models, and addressing data analytics challenges [2][3].
In this paper, we would firstly analyze the common challenges of E-Commerce platforms from the technical perspective. Secondly, we explain how cloud computing technologies can be leveraged to address some of these challenges and come up with new business values [4]. Furthermore, to understand customer behaviors by analyzing data in the E-Commerce platform is one of practical approaches to forecast the market trend [5]. It is considered as a key success factor for market players to have competitive advantages against other competitors. Some cloud platforms provide advanced technologies like Artificial Intelligence AI and Machine Learning ML for customers to integrate with their existing business systems or processes We would cover the methodology of how to leverage these AI and ML services to integrate with the data analytics process to formulate a more comprehensive analytics platform in the cloud [6][7][8]. A practical real-life case would be explained to strengthen this model.

Common Challenges of E-Commerce Platform
As online businesses start to grow with more simultaneous transactions, these e-commerce players will face the following typical technical challenges: Challenge 3: Dealing with the rapidly growing volume and variety of data for data analytics Challenge 4: Make use of more complex analytical models to get accurate results

Platform Scalability and Agility
Cloud computing enables E-Commerce platforms to handle the dynamic demands and scenarios of the market. It enables these platforms having the elasticity to upscale or downscale the services, i.e. compute and storage, in order to meet the actual demands, and seasonal spikes [9][10][11]. In the traditional environment, i.e. on-premise data centers, people always make oversize provisions on hardware and software capacities in order to prevent from un-sufficient resources to meet these seasonal spikes. This results in wasting a lot of resources in the normal operation most of the time.

Changing the Cost Model
E-Commerce platform owners need to make an upfront investment in hardware and software infrastructure before they can earn positive income in such a business. With cloud computing, this changes the cost model from Capital Expenditure (CAPEX) to Operating Expenditure (OPEX) [12,16]. Thus, the cost of ownership [1] can be very flexible according to actual business needs. Even though the business grows, operating costs will not change dramatically. This protects business discovery on the platform infrastructure, and shifts business focus to more valuable areas such as understanding customer behavior and market trends [13].

Challenges of Handling Big Data
Ecommerce players need better, faster, and more relevant insights from their data to stay ahead of the competition. As data volume, variety, and volume increase, they need to have more sophisticated tools to collect, categorize, and turn data into valuable insights [14]. Advanced analytics related products have been the key to generating value. The data lake concept was introduced to the market. It serves as a solid basis for storing large amounts of data in a storage center within an organization [15]. Data lakes need to be very cost-effective, scalable and secure. Done right, organizations can open the door to creating advanced analytics, facilitating data science and machine learning.

Demanding for more Complicated Analysis Models
The adoption of data science is creating significant business value to E-Commerce players [16]. Business decisions can be more data-driven, and new product lines can be created using data insights unveiled by Business Intelligence BIand Machine LearningML [17].

Data Analytics Architecture on Cloud
As shown in figure 1, the entire data analysis architecture basically consists of four stages: collecting, storing, processing and consuming. Each stage will have a different cloud service specifically handling specific tasks. These services can work together to streamline the entire process, namely orchestration.

Data Collecting Stage
We can classify data as structured, semi-structured and unstructured types. Structured data is highly normalized by general schema and stored in relational databases, which support transactional lines of business applications [17]. This data is easily accessible via SQL or a data extraction tool. Semi-Structured Data contains identifiers without following a predefined schema, often stored in NoSQL databases such as JSON and XML. This data is easily accessible but requires some preparation to be ready for data analysis. Unstructured data does not fit into the data model and is usually stored as individual files. Some examples are text, image, audio and video documents.
After identifying the nature of the data, we then decide on the data collection method as Batch Load or Streaming. Batch Load periodically extracts data from various data sources and moves it to the Data Lake. This process usually involves querying the database and includes several transformation processes including Extracting, Transforming, and LoadingETL.

Data Storing Stage
Data Lake is a centralized response for storing all data types. In contrast to the Data Warehouse which is a database that is optimized for analyzing relational data sourced from transactional applications. The data structure and schema in the Data Warehouse are well defined in advance to optimize SQL query performance. The results are typically used for operational reporting and analysis via several business intelligence tools. Cleaned, modified and stored database table data.
On the other hand, aData Lake stores relational data from transactional applications and non-relational data from mobile applications, Internet of Things IoT devices, or social media platforms. The data structure or schema is not determined when the data is collected. Different types of analysis such as SQL queries, big data analytics, full-text search, and machine learning can be used to find insights from data lakes.

Data Processing Stage
Data processing includes cleansing, transforming, sorting and aggregating data. A typical example is the ETL process. Sometimes, the whole data processing may have several iterations between the storing data stage and the data processing stage.
In addition, more enterprises are exploring the approaches of deploying machine learning algorithms to spot patterns and catch more insights based on perceived data. One of common applications in the E-Commerce industry is recommendation engines. Customer behaviors like purchasing history and browsing preferences are recorded in the platforms. These data would then be handled by some analysis processes with ML models deployed. This approach makes the whole mechanism more interactive and sophisticated enough to extract more in-depth analysis.

Data Consuming Stage
This is the final stage of the whole data pipeline. It is about how to have insights from data with visualization tools like Business Intelligence software or provide data to other entities or applications by API calls.

Integrating ML to Data Analytics on Cloud
Machine learning, as a branch of Artificial Intelligence, was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks [2]. The iteractive aspect of machine learning is important because models are exposed and adapt to new training data . It is supposed that models would be more "mature" to produce more reliable and predictive result when keep learnings from previous computations.
In our research, we aim to integrate ML technologies to the data analytics processes and recommend a more comprehensive pattern for the E-Commerce industry. In our example, we are using Amazon SageMaker as the platform to build, train and deploy ML models on Cloud. The whole ML process typically involves the steps in Figure  2

Conclusion Remarks and Future Works
In this study, the researchers highlighted the common challenges of E-Commerce platforms and reasons why Cloud addressed them and brought additional values for future developments in areas of data analytics and machine learning. With respect to future work of this study, the researchers propose: • gathering reference cases to prove the models mentioned in this paper • setting up some demonstrations to illustrate the whole architecture.