Full-service software development company




SDH Digital solutions blog

How Big Data and Machine Learning are connected? How to create Machine Learning model?

AI technologies in simple words
Dmitry Drigo
CEO of SDH Digital Solutions, LLC
In the previous article, we discussed the differences between Artificial Intelligence (AI), Machine Learning (ML), and Data Science (DS). We examined the terms AI, ML, and DS, a neural network, how to interpret the results of ML-models.

In this article, we will discuss the interconnections between Big Data and ML and the process of creating an ML model.
Artificial Intelligence
Machine Learning
Data Science
Subset of AI technique which use statistical methods to enable machines to improve with experience.
A technique which enables machines to mimic human behaviour.
Deep Learning
Subset of ML which make the combination of multi-layer neural network feasible.
An umbrella term that covers a wide range of domains, including Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning.
Differentiation of concepts of Artificial Intelligence and Data Analysis.

Machine Learning and Big Data — how are they related?

Artificial Intelligence uses models to base its decisions on, to obtain quality results. Such a model must be created.

And to build specific machine learning models various types of data, including numbers, texts, photos, videos, sound, and audio recordings, may be required.

All collected data must be stored somewhere and be properly analyzed. This place/storage is Big Data. This is a separate area of technology, based on which Data Lake was created.

Digital twin

To model the behavior of real processes or objects for machine learning, one can use a virtual copy of a process or a digital twin.

This approach makes it possible to learn and see in advance how after the introduction of a new method will change the chemical composition of the final product or a sales schedule.

The digital twin builds all forecasts based on the information, it has previously accumulated, and then it models possible situations using the machine learning method.

The forecast algorithm is created and basic data is studied with the help of the Data Science, which builds hypotheses and models based on the Data Set.

Data Science is a true scientific discipline whose goal is to find the truth and use the obtained data to generate new ideas or four specific results:
  • evaluation of a product or a business;
  • sale of an optimized (improved) product or service;
  • forecast of the result or effectiveness of the production system;
  • development of a strategy or roadmap for a product.

For Data Science, three main skill groups are important:
  • IT proficiency;
  • expertise in math and statistics ;
  • experience.

What does Machine Learning consist of?

Machine Learning includes three important steps:

1. Data acquisition (sales plans or work schedules, exchange rates, calendars, etc.)

2. Specification of the model parameters using groups of experts in each specific area.

3. Creating Machine Learning model - ML. Without models, AI will not be able to find the right solution to the problem, because Artificial Intelligence requires not only information and experience but also sample models of the desired results.

Each task requires the creation of a specific model or algorithm.

How to create Machine Learning models?

A process or algorithm of the model creation is similar to Machine Learning process:
1. Collection and analysis of the data.
2. Specification of the model parameters.
3. Creation of a model.
4. Assessment of the result.

If the obtained results are satisfactory, then one can proceed to the model testing on a real process. If the results are not satisfactory, one should go back and start again with the data collection step.
The possibilities of machine learning of Artificial Intelligence are endless, but without the creation of models, they are not achievable.

Where does the AI ​​machine learning begin?

1. The origin of the hypothesis
By analyzing problematic processes, the experience of the workers, or the production line operation, the hypothetical probability is assumed on the possibility to improve or to change the process to increase outcome indicators (income, production output, sales, etc.).

So, the hypothesis of a process assumes that people themselves cannot physically take into account many factors and nuances at the same time since they are naturally inclined to cut the corners, assume, and work in their usual rhythm.

The hypothesis in machine learning is based on a greater amount of data when making decisions, which directly leads to a better-quality result in the end.

The advantages of ML with a well-developed hypothesis are in minimizing such human factors as injuries, stress, loss of concentration, etc.

2. Assessment of the hypothesis
A selection of the collected and initial data sets based on the hypothesis, as well as an assessment of the data suitability, is carried out together with the definition of the user and possible achievements for the subsequent integration of the model into the production process.

So, one should determine: for whom the results are created, who and how will use the results, what is the probability of achieving the desired indicators, based on the information collected.


3. ROI - economic effect and return on investment calculations
Specialists from the relevant departments (finance, efficiency, etc.) evaluate in cooperation of the economic effect.

This is what the hypothesis of the newly implemented solution was conceived and created for. Does it make sense to start new developments or projects? Will they be in demand and when it will be possible to count on profit and return on investment? At this stage, the metrics are identified (calculations of the number of potential customers, growth in the production output, expenses, and cost-effectiveness of consumables, etc.).

All parameters listed above form the goal that must be achieved.

4. The mathematical formulation of the problem
It is not enough to define and understand the necessary business results. The results must be converted into mathematical objects (graphs, tables, intersections) to define boundaries and dimensions beyond which the model cannot extend.

This stage is completed in collaboration with the customer so that the customer sets the limits or the thresholds (budget, upper and lower sales limits, product volumes, etc.).

5. Data collection and analysis
All information is collected from one place and then is analyzed with various statistical methods. Significant time is spent at this stage, but as a result, one obtains an understanding of the structure and the hidden relationships between various parts of the data for the accurate formation of the model features.

6. A prototype development
The essence of this paragraph in testing a hypothesis - to see if it works.
The model is built from the primary data of the results of the hypothesis testing. This is a simple and affordable way to find out if a problem can be solved or not.

The prototype clarifies the scope of the project and the possibilities for the solution implementation, and therefore finds its economic justification.
While a hypothesis is being created and the prototype is built, changes in the initial data may occur, such as a new product in the production line or new instrument in production equipment. In this case, the model should additionally be trained.

End-to-end processes and collaboration between different services with the Data Science team occur through DataOps and DevOps. This is very convenient, since they make corrections and additions at any stage, without interruption and loss of the result.

7. Creating a solution
Based on the results of the prototype, conclusions are drawn, and if they demonstrate good performance, then this is the first step to creating a solution.

The turnkey solution is being integrated into production. But to start, you will need to conduct training and retraining of employees, prepare equipment, etc.

8. Pilot and industrial operation
During the first few launches of the system, it should operate through the established test time in a team with a teacher-specialist.

This mode implies that the feedback between a human and the system will help to improve the accuracy of forecasts and make the necessary improvements to the system while normal operations are performed by a specialist.

If everything is fine then the test runs smoothly transfer into normal operation process. The next step is the transition to automatic maintenance and this is the final part.

The possibilities of machine learning of Artificial Intelligence are endless, but without the creation of models, they are not achievable.

And any model must be tested and verified before it is allowed to be introduced to operation.
Connected articles