Have you ever considered how much data there is on the Internet about you? Most likely more than it seems.
The Internet allows us to search for information on various topics. However, by consuming information from a web source, we often supplement it with information about ourselves. Global datafication is becoming a subject of ethical controversy. Many people don’t want corporations to know that much about them.
Today we will talk about the data that companies collect and discuss the ethical aspects of datafication. We will also talk about professions associated with data management.
What does datafication refer to?
Before we start discussing worldwide data collection, let’s define what datafication is.
For the first time, the term ‘datafication’ appeared in 2013. Datafication is a process of transforming certain aspects of our daily life into useful data. This data needs to be stored, monitored, and continuously optimized.
Datafication touches our lives directly. Various sources monitor our activity to convert it into data for it to gain a tangible value.
‘The one who owns the information, owns the world’ – said Nathan Rothschild back in 1815. Even back then, the wealthiest hinted to their descendants about the recipe of becoming rich and powerful.
Data about the private life of people has always been in demand. One of the clearest examples is the population censuses conducted by the states. However, with the advent of the Internet, data collection has reached previously unthinkable proportions.
Therefore, in 2013, a new philosophical direction appeared called dataism. David Brooks first mentioned Dataism in a New York Times column. Yuval Noah Harari later developed the term in Homo Deus: A Brief History of Tomorrow. He presented dataism as a new religion in which the flow of information is considered the highest value of humanity. The high point of dataism is the ability of a person to transfer his life and decisions to algorithms.
In modern society, the debate about the ethics of data collection never stops. Some users understand and accept that algorithms aim to improve their lives. Other users want to keep sensitive information to themselves.
But what kind of information do companies collect about us?
The major types of data that are stored and monitored
What kind of data do companies collect about us on the Internet? Practically any kind you can imagine. The most popular types of user data collected are a person's first and last name and phone number. The Internet most likely knows what videos users like, the purchases they make most oftenly, where they live, and the most optimal route of getting to work. Almost every interaction with content and product is recorded, every search is counted, and all entered information about yourself is saved.
Let’s look at using Google as an example. The company doesn’t even try to hide that it collects user data.
- Profile. The user entered all the data when registering an account: first and last name, phone number, and email address. If the client has filled in more information in his profile, the company knows his hobby, university, and marital status.
- Search history. Google is aware of the questions users enter into the search bar, the sites they visit after that, and the time they spend on the pages. The company also records which advertisements interest users.
- YouTube history. Google knows which videos users like and tracks views and interactions.
- Data synchronization. If the client has set up synchronization, the company will save all browser activities, bookmarks, and passwords.
- Device information. Google knows what phone model users have and what devices they use with it.
- Movement history. By turning on the geolocation function on the phone, the user gives Google access to information about their movements, home addresses, places of work, and favorite places.
If the user has given access to Google account data for other applications, other companies will also obtain all this information.
Google collects the most considerable amount of user data (39 types). The second place holds Twitter (24 data types). Amazon takes the third place in information collection (23 types of data).
The good guy’, if you can call it that, is Apple in this situation. The company claims to collect only the information needed to maintain user accounts (12 types of data). ‘Privacy is a fundamental human right,’ claims Apple. But still, who knows? We witnessed too many cases where big money caused big lies.
Therefore, many of the latest developments of the company help the user independently choose what data to share with companies:
- Browser. Built-in Safari browser intelligently blocks trackers that want to read data. You can get a report on which sites have requested access to your confidential information for advertising.
- Maps. Apple doesn’t keep a history of your movements and doesn’t associate location data with your account.
- Messages. The encryption technique used in iMessage doesn’t allow the company to see and save your correspondence. The entire conversation will remain between you and the interlocutor.
Ethical disputes about the correctness of data collection have long been ongoing between corporations and the media. For example, the issue has been discussed on the TV shows Parks and Recreation, Silicon Valley, South Park, and many others.
How do we produce data?
Humanity creates 2.5 quintillion bytes of data every day. Growth accelerated especially strongly with the advent of the Internet of Things. Over the past two years, society has created 90% of all the data that exists on the Internet. How exactly do we produce data?
- Social media
Every minute in 2017, users watched 4,146,600 videos on YouTube, posted 46,740 photos on Instagram, and tweeted 456,000 on Twitter. And these are only popular social networks; there are thousands of them. Every day, 1.5 billion users log into Facebook and generate a lot of data: post about 510,000 comments and upload more than 300 million photos.
Every minute in 2017, Spotify added 13 new songs, people made 600 changes to Wikipedia, and Uber clients took 45,788 rides. All these actions lead to the generation of information that specialists will later analyze and consider.
- Internet of Things
The Internet of Things is all about your smart devices. From smartwatches to smart refrigerators, from voice assistant speakers to electronic locks. All these devices constantly generate information. All devices with built-in Wi-Fi collect and transmit information.
Okay, companies have collected information about users. But what will they do with it?
What do companies do with the data produced by users?
Many conspiracy theorists portray corporations only as greedy and evil data collectors. But what do companies really do with user data? Let’s figure it out.
- Study the client’s interests
Companies learn about the interests of users in different ways. These can be opinion polls on social media pages, click analysis, and counting the time spent on the page. Based on this data, the site will offer you more relevant information. And soon, you won’t have to dig through thousands of articles on a news site to find an interesting one. Valuable materials will try to catch your eye.
- Set up ads
Advertising will hardly ever vanish, and it’s useless to fight it. By hiding data from companies, you will still receive promotional offers. But are they interesting personally for you? Companies want to advertise products to an interested target audience. And the user is always pleased to see an advertisement for the desired item, especially with a favorable price.
- Satisfy user needs
By collecting data, companies know the directions they should develop. Therefore, datafication is a way to meet the needs of users. Customers demonstrate by their actions what features or tips they like. And accordingly, there are more quality products.
- Fight against fraud
By examining user data, companies can detect fraudsters and stop them. Thus, the online leisure of customers becomes more secure.
The collection of user data plays a positive role in our lives. But for ethical reasons, companies must comply with the main conditions: confidentiality and security. Ensuring that data doesn’t fall into the wrong hands resolves an ethical issue. However, not all companies yet guarantee total security for users.
The problem is especially acute in countries with authoritarian political regimes. Corporations, working with the government, can transfer data for political persecution. And this is one of the main problems of data collection.
In addition, the community had repeatedly encountered scandals when massive databases were leaked in the public domain. And companies should do their best to resist such leaks. No user wants their address or phone number to be known to everyone.
Let’s talk a little bit on how large amounts of data are stored.
What is Big Data?
Complex, massive databases are called Big Data. They can be structured, semi-structured, and unstructured. Big Data is generated from various sources, which we will discuss shortly. Companies collecting this information must store, analyze, and structure all the information. But most importantly, they must protect it.
Let’s talk about the three V’s: Big Data problems.
Big Data is a massive amount of information. We are talking about petabytes and zettabytes of data. According to statistics, by 2025, humanity will produce 180 zettabytes of information. Big Data requires an architecture that’s capable enough to allow the processing of such volumes of information.
Information comes in very quickly. And companies have to process and analyze all these large amounts of data rapidly. And if some data can lie a little in the archives, then some information needs to be analyzed immediately – for example, signals from sensors of medical devices.
User data is very diverse and has variable attributes. 80 to 90% of the information isn’t structured, so using them is extremely difficult.
All problems are solvable, but they require the involvement of many specialists in the IT industry.
Who works with data: 5 Key roles in a data management team
- Data Analyst
Data analysts perform statistical analyses on large sets of data to extract insights. They translate big data into meaningful business information. The main task of a data analyst is to get required business information by analyzing large datasets. Data analysts also design systems and databases to help them perform these actions.
- Data Processing Engineer
Data engineers build data pipeline ecosystems for business by integrating, consolidating, and cleaning data for the future use. The engineer must thoroughly test the ecosystem and make it scalable. This specialist can supplement existing systems with new technologies to improve the efficiency of data management. The responsibilities of a specialist include designing and implementing data pipelines and looking for patterns and trends.
- Database administrator
The administrator ensures that the enterprise databases function stably. These specialists are responsible for backing up and restoring databases.
- Machine learning engineer
These engineers develop machine learning systems, research algorithms, perform testing, and extend existing frameworks.
- Data architect
Architects create data management blueprints. Data architects define company data collection, organization, storage, and access policies. It’s the data architect who is responsible for the information security measures. It means that a talented data architect can save the company from a large-scale scandal and litigation related to data leakage.
It’s not a complete list of specialists needed for the normal functioning of databases. If you need the help of such specialists to improve your business using Big Data, contact Vilmate. We have qualified engineers well-versed in Big Data and data management.
for monthly digest