ML Beginners

 Machine Learning (ML) for Beginners (Part 1)



  Why Machine Learning?  

Trending theme:

  • -         Artificial Intelligence, "deep learning", "big data"...

Epistemological reason:

  • -         We don't know how to model complex problems but there are many examples representing the Variety of situations.
  • -         "Data driven" vs. "Model Based".

Scientific reason:

  • -         Learning is an essential faculty of life.

Economic reason:

  • -         Collecting data is easier than developing expertise.


  Technical areas using ML  

ML as a design tool:

  • -         Vision & Pattern Recognition
  • -         Language processing
  • -         Speech processing
  • -         Robotics
  • -         "Data Mining"
  • -         Search in BDD
  • -         Recommendations
  • -         Marketing...

ML as an explanatory tool:

  • -         Neuroscience
  • -         Psychology
  • -         Cognitive Science


  Data = ML fuel (BIG DATA)  


The LargeHadron Collider | CERN   ~70 PB/year




NOTE: A petabyte is equal to 1000 terabytes, or 1 million gigabytes. So that's a very large amount of data.

To give you an idea, 70 petabytes per year corresponds to:

-          Approximately 140 million movies in high definition (1080p)

-          Approximately 280 billion photos

-          Approximately 560000 hours of music

This amount of data is enormous and represents a challenge for the storage and processing of information.


Google   24 PetaBytes/day



NOTE: 24 petabytes is 24 million gigabytes or 24 billion megabytes.

It would take modern high-speed Internet speeds years to download this amount of data.

Google’s ability to process such volumes of data is critical to its services, e.g.

-          Search Engine: Google processes billions of search queries every day and relies on its vast data centers to deliver relevant results.

-          YouTube: YouTube is the world’s largest video sharing service and handles billions of hours of video uploads and views per day.

-          Gmail: Gmail is one of the most popular email services, processing millions of emails every day.

-          Google Maps: Google Maps is a widely used mapping service that relies on a wealth of data to provide accurate directions and information.


Copernicus   > 1PB/year


NOTE: Copernicus' data processing capacity is estimated to exceed 1 petabyte per year. This means that the organization is able to process and analyze a significant number of cases each year.

Some of the things that could make Copernicus more capable of handling data are:

-          Satellite data: Copernicus is a series of Earth satellites that collect a wealth of information about the planet’s weather, oceans, land and ice

-          In-situ data: Copernicus collects data from ground-based sensors, buoys, and other sources.

-          Complex analysis: Copernicus uses sophisticated systems and models to analyze his data and gain valuable insights.


GermanClimate Computation Center (DKRZ)  500 PB


NOTE: The storage capacity of the German Center for Meteorological Statistics (DKFZ) is approximately 500 petabytes.This large repository is necessary to accommodate the large data sets generated by climate models and simulations.

The DKFZ supercomputers process these datasets to understand climate models, predict future climate conditions and develop climate adaptation strategies.


SquareKilometre Array  1376 PB/year (in 2024)


NOTE: The Square Kilometer Array (SKA) is expected to generate approximately 1376 petabytes of data per year until 2024. SKA's large radio telescope network will produce this vast amount of data, making it the largest and most sensitive radio telescope array ever built ever.

SKA will be used to study a wide range of astronomical phenomena, including the origin of the universe, the nature of dark matter and dark energy, and the search for extraterrestrial life.

 

 

Next Post Previous Post
No Comment
Add Comment
comment url