To main content
 
SDH Digital solutions blog

How Big Data and Machine Learning are connected? How to create Machine Learning model?

AI technologies in simple words
Dmitry Drigo
CEO of SDH Digital Solutions, LLC
In the previous article, we discussed the differences between Artificial Intelligence (AI), Machine Learning (ML), and Data Science (DS). We examined the terms AI, ML, and DS, a neural network, how to interpret the results of ML-models.

In this article, we will discuss the interconnections between Big Data and ML and the process of creating an ML model.
Artificial Intelligence
Machine Learning
Data Science
Subset of AI technique which use statistical methods to enable machines to improve with experience.
A technique which enables machines to mimic human behaviour.
Deep Learning
Subset of ML which make the combination of multi-layer neural network feasible.
An umbrella term that covers a wide range of domains, including Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning.
Differentiation of concepts of Artificial Intelligence and Data Analysis.

Machine Learning and Big Data — how are they related?

Artificial Intelligence uses models to base its decisions on, to obtain quality results. Such a model must be created.

And to build specific machine learning models various types of data, including numbers, texts, photos, videos, sound, and audio recordings, may be required.

All collected data must be stored somewhere and be properly analyzed. This place/storage is Big Data. This is a separate area of technology, based on which Data Lake was created.

Digital twin

To model the behavior of real processes or objects for machine learning, one can use a virtual copy of a process or a digital twin.

This approach makes it possible to learn and see in advance how after the introduction of a new method will change the chemical composition of the final product or a sales schedule.

The digital twin builds all forecasts based on the information, it has previously accumulated, and then it models possible situations using the machine learning method.

The forecast algorithm is created and basic data is studied with the help of the Data Science, which builds hypotheses and models based on the Data Set.

Data Science is a true scientific discipline whose goal is to find the truth and use the obtained data to generate new ideas or four specific results:
  • evaluation of a product or a business;
  • sale of an optimized (improved) product or service;
  • forecast of the result or effectiveness of the production system;
  • development of a strategy or roadmap for a product.

For Data Science, three main skill groups are important:
  • IT proficiency;
  • expertise in math and statistics ;
  • experience.

What does Machine Learning consist of?

Machine Learning includes three important steps:

1. Data acquisition (sales plans or work schedules, exchange rates, calendars, etc.)

2. Specification of the model parameters using groups of experts in each specific area.

3. Creating Machine Learning model - ML. Without models, AI will not be able to find the right solution to the problem, because Artificial Intelligence requires not only information and experience but also sample models of the desired results.

Each task requires the creation of a specific model or algorithm.

How to create Machine Learning models?

A process or algorithm of the model creation is similar to Machine Learning process:
1. Collection and analysis of the data.
2. Specification of the model parameters.
3. Creation of a model.
4. Assessment of the result.

If the obtained results are satisfactory, then one can proceed to the model testing on a real process. If the results are not satisfactory, one should go back and start again with the data collection step.
The possibilities of machine learning of Artificial Intelligence are endless, but without the creation of models, they are not achievable.

Where does the AI ​​machine learning begin?

1. The origin of the hypothesis
By analyzing problematic processes, the experience of the workers, or the production line operation, the hypothetical probability is assumed on the possibility to improve or to change the process to increase outcome indicators (income, production output, sales, etc.).

So, the hypothesis of a process assumes that people themselves cannot physically take into account many factors and nuances at the same time since they are naturally inclined to cut the corners, assume, and work in their usual rhythm.

The hypothesis in machine learning is based on a greater amount of data when making decisions, which directly leads to a better-quality result in the end.

The advantages of ML with a well-developed hypothesis are in minimizing such human factors as injuries, stress, loss of concentration, etc.

2. Assessment of the hypothesis
A selection of the collected and initial data sets based on the hypothesis, as well as an assessment of the data suitability, is carried out together with the definition of the user and possible achievements for the subsequent integration of the model into the production process.

So, one should determine: for whom the results are created, who and how will use the results, what is the probability of achieving the desired indicators, based on the information collected.


3. ROI - economic effect and return on investment calculations
Specialists from the relevant departments (finance, efficiency, etc.) evaluate in cooperation of the economic effect.

This is what the hypothesis of the newly implemented solution was conceived and created for. Does it make sense to start new developments or projects? Will they be in demand and when it will be possible to count on profit and return on investment? At this stage, the metrics are identified (calculations of the number of potential customers, growth in the production output, expenses, and cost-effectiveness of consumables, etc.).

All parameters listed above form the goal that must be achieved.

4. The mathematical formulation of the problem
It is not enough to define and understand the necessary business results. The results must be converted into mathematical objects (graphs, tables, intersections) to define boundaries and dimensions beyond which the model cannot extend.

This stage is completed in collaboration with the customer so that the customer sets the limits or the thresholds (budget, upper and lower sales limits, product volumes, etc.).

5. Data collection and analysis
All information is collected from one place and then is analyzed with various statistical methods. Significant time is spent at this stage, but as a result, one obtains an understanding of the structure and the hidden relationships between various parts of the data for the accurate formation of the model features.

6. A prototype development
The essence of this paragraph in testing a hypothesis - to see if it works.
The model is built from the primary data of the results of the hypothesis testing. This is a simple and affordable way to find out if a problem can be solved or not.

The prototype clarifies the scope of the project and the possibilities for the solution implementation, and therefore finds its economic justification.
While a hypothesis is being created and the prototype is built, changes in the initial data may occur, such as a new product in the production line or new instrument in production equipment. In this case, the model should additionally be trained.

End-to-end processes and collaboration between different services with the Data Science team occur through DataOps and DevOps. This is very convenient, since they make corrections and additions at any stage, without interruption and loss of the result.

7. Creating a solution
Based on the results of the prototype, conclusions are drawn, and if they demonstrate good performance, then this is the first step to creating a solution.

The turnkey solution is being integrated into production. But to start, you will need to conduct training and retraining of employees, prepare equipment, etc.

8. Pilot and industrial operation
During the first few launches of the system, it should operate through the established test time in a team with a teacher-specialist.

This mode implies that the feedback between a human and the system will help to improve the accuracy of forecasts and make the necessary improvements to the system while normal operations are performed by a specialist.

If everything is fine then the test runs smoothly transfer into normal operation process. The next step is the transition to automatic maintenance and this is the final part.

The possibilities of machine learning of Artificial Intelligence are endless, but without the creation of models, they are not achievable.

And any model must be tested and verified before it is allowed to be introduced to operation.
Connected articles

Schedule A Personal Consultation With Our Solutions Architect

We can help you identify optimal technologies, calculate cost and timeline estimate for your project, or simply consult you on any IT question at no cost.
Keep in touch.
+1
Afghanistan (افغانستان)
+93
Albania (Shqipëri)
+355
Algeria (الجزائر)
+213
Andorra
+376
Angola
+244
Armenia (Հայաստան)
+374
Antigua and Barbuda
+1 (268)
Argentina
+54
Australia
+61
Austria (Österreich)
+43
Azerbaijan (Azərbaycan)
+994
Bahamas
+1 (242)
Bahrain (البحرين)
+973
Bangladesh (বাংলাদেশ)
+880
Barbados
+1 (246)
Belarus (Беларусь)
+375
Belgium (België)
+32
Belize
+501
Benin (Bénin)
+229
Bhutan (འབྲུག)
+975
Bolivia
+591
Bosnia and Herzegovina
+387
Botswana
+267
Brazil (Brasil)
+55
Brunei
+673
Bulgaria (България)
+359
Burkina Faso
+226
Burundi (Uburundi)
+257
Cambodia (កម្ពុជា)
+855
Cameroon (Cameroun)
+237
Canada
+1
Cape Verde (Kabu Verdi)
+238
Caribbean Netherlands
+599
Cayman Islands
+1
Central African Republic (République centrafricaine)
+236
Chad (Tchad)
+235
Chile
+56
China (中国)
+86
Colombia
+57
Comoros (جزر القمر)
+269
Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)
+243
Congo (Republic) (Congo-Brazzaville)
+242
Cook Islands
+682
Costa Rica
+506
Cote d’Ivoire
+225
Croatia (Hrvatska)
+385
Cuba
+53
Cyprus (Κύπρος)
+357
Czech Republic (Česká republika)
+420
Denmark (Danmark)
+45
Djibouti
+253
Dominica
+1 (767)
Dominican Republic (República Dominicana)
+1
Ecuador
+593
Egypt (مصر)
+20
El Salvador
+503
Equatorial Guinea (Guinea Ecuatorial)
+240
Eritrea
+291
Estonia (Eesti)
+372
Ethiopia
+251
Fiji
+679
Finland (Suomi)
+358
France
+33
Gabon
+241
Gambia
+220
Georgia (საქართველო)
+995
Germany (Deutschland)
+49
Ghana (Gaana)
+233
Greece (Ελλάδα)
+30
Grenada
+1 (473)
Guatemala
+502
Guinea (Guinée)
+224
Guinea-Bissau (Guiné Bissau)
+245
Guyana
+592
Haiti
+509
Honduras
+504
Hong Kong (香港)
+852
Hungary (Magyarország)
+36
Iceland (Ísland)
+354
India (भारत)
+91
Indonesia
+62
Iran (ایران)
+98
Iraq (العراق)
+964
Ireland
+353
Israel (ישראל)
+972
Italy (Italia)
+39
Jamaica
+1
Japan (日本)
+81
Jordan (الأردن)
+962
Kazakhstan (Казахстан)
+7
Kenya
+254
Kiribati
+686
Kosovo (Republic)
+383
Kuwait (الكويت)
+965
Kyrgyzstan (Кыргызстан)
+996
Laos (ລາວ)
+856
Latvia (Latvija)
+371
Lebanon (لبنان)
+961
Lesotho
+266
Liberia
+231
Libya (ليبيا)
+218
Liechtenstein
+423
Lithuania (Lietuva)
+370
Luxembourg
+352
Macao
+853
Macedonia (FYROM) (Македонија)
+389
Madagascar (Madagasikara)
+261
Malawi
+265
Malaysia
+60
Maldives
+960
Mali
+223
Malta
+356
Marshall Islands
+692
Mauritania (موريتانيا)
+222
Mauritius (Moris)
+230
Mexico (México)
+52
Mexico (México)
+521
Micronesia
+691
Moldova (Republica Moldova)
+373
Monaco
+377
Mongolia (Монгол)
+976
Montenegro (Crna Gora)
+382
Morocco (المغرب)
+212
Mozambique (Moçambique)
+258
Myanmar (Burma) (မြန်မာ)
+95
Namibia (Namibië)
+264
Nauru
+674
Nepal (नेपाल)
+977
Netherlands (Nederland)
+31
New Caledonia
+687
New Zealand
+64
Nicaragua
+505
Niger (Nijar)
+227
Nigeria
+234
Niue
+683
North Korea (조선 민주주의 인민 공화국)
+850
Norway (Norge)
+47
Oman (عُمان)
+968
Panama
+507
Pakistan (پاکستان)
+92
Palau
+680
Palestinian Territory
+970
Papua New Guinea
+675
Paraguay
+595
Peru (Perú)
+51
Philippines
+63
Poland (Polska)
+48
Portugal
+351
Qatar (قطر)
+974
Romania (România)
+40
Russian Federation (Российская Федерация)
+7
Rwanda
+250
Saint Kitts and Nevis
+1 (869)
Saint Lucia
+1 (758)
Saint Vincent and the Grenadines
+1 (784)
Samoa
+685
San Marino
+378
Sao Tome and Principe (São Tomé e Príncipe)
+239
Saudi Arabia (المملكة العربية السعودية)
+966
Senegal (Sénégal)
+221
Serbia (Србија)
+381
Seychelles
+248
Sierra Leone
+232
Singapore
+65
Slovakia (Slovensko)
+421
Slovenia (Slovenija)
+386
Solomon Islands
+677
Somalia (Soomaaliya)
+252
South Africa
+27
South Korea (대한민국)
+82
South Sudan (جنوب السودان)
+211
Spain (España)
+34
Sri Lanka (ශ්‍රී ලංකාව)
+94
Sudan (السودان)
+249
Suriname
+597
Swaziland
+268
Sweden (Sverige)
+46
Switzerland (Schweiz)
+41
Syria (سوريا)
+963
Taiwan (台灣)
+886
Tajikistan
+992
Tanzania
+255
Thailand (ไทย)
+66
Togo
+228
Tonga
+676
Trinidad and Tobago
+1 (868)
Tunisia (تونس)
+216
Turkey (Türkiye)
+90
Turkmenistan
+993
Tuvalu
+688
Uganda
+256
Ukraine (Україна)
+380
United Arab Emirates (الإمارات العربية المتحدة)
+971
United Kingdom
+44
USA
+1
Uruguay
+598
Uzbekistan (Oʻzbekiston)
+998
Vanuatu
+678
Vatican City (Città del Vaticano)
+39
Venezuela
+58
Vietnam (Việt Nam)
+84
Yemen (اليمن)
+967
Zambia
+260
Zimbabwe
+263