Defining and explaining some of the key terms used by data scientists.
Nexis® Data as a Service
Algorithms, fuzzy logic, zettabytes—understanding the language involved in artificial intelligence (AI) and big data can be daunting when you aren’t a data scientist. But the potential to leverage data-driven insights to inform strategies, support growth and manage risk means getting familiar with the lingo.
This glossary aims to demystify the process by defining and explaining some of the key terms used by data scientists. In the process, we demonstrate how Nexis® Data as a Service supports companies to take advantage of the opportunities offered by AI and big data.
Big Data & AI Terms
Intelligent machines and software that can perceive their environment and act on it, often learning from those actions. AI can be applied in a wide range of fields, including risk and fraud detection, purchase and investment predictions,logistics and supplier management, news and entertainment creation, and online customer support interactions using chatbots. Read more
The discovery of insights based on data. There are three types:
- Descriptive analytics summarises data to create an overall narrative.
- Predictive analytics analyses historical and current data to predict future behaviour based on probabilities. For example, it can use trends in consumer preferences or in the stock market to inform buy-sell decisions.
- Prescriptive analytics builds on predictive analytics by analysing its outcomes to decide the best action to take. It is the next evolution in deep learning to support decision-making without human interaction.
A mathematical formula that performs an analysis on a set of data, often embedded in technology. Read more
A software programme that performs a specific function for the person (or application) using it.
An API provides a way to deploy the features of a specific application or service, which lets two applications interact with each other. For example, an API may specify how to retrieve data from an application.
- Bulk API is used to process large amounts of data in batches.
- RESTful API can handle multiple types of data in different formats.
Nexis Data as a Service lets companies use flexible and easy-to-integrate APIs to tap into our unrivaled content universe to support predictive analytics, risk screening and other data-driven use cases.
Usually a store of historical data that is no longer actively used. Data archives should be indexed for easy location and retrieval of files.
Using computers to efficiently process high volumes of data over a period of time.
Very large data sets that can be analysed by computing technologies to reveal patterns and trends. Big data is the fuel for a wide range of AI applications. Read more
Categorising a data point based on traits it has in common with other data points. This allows the user to extract important and relevant information from a big dataset more quickly and easily. Read more
Our award-winning automated classification system, SmartIndexing Technology™ , analyses our news data and applies metadata related to more than 7,000 subjects and industries. This enables users to cut through the noise and discover the data needed for predictive analytics and other big data applications.
An open-source web application that allows researchers to combine software code and can be used with a number of different programming languages such as Python. The Jupyter Notebook is particularly popular, dubbed “data scientists’ computational notebook of choice” by Nature. Read more
Leveraging the best-in-class Jupyter Notebook environment, Nexis® Data Lab for Academic enables users to search, refine and analyze our extensive collection of enriched licensed data.
Analysing data to determine a positive or negative relationship between different variables. Read more
Our comprehensive news data allows users to identify correlations between, for example, a company’s actions and its reputation.
A system or strategy used by a company to manage its sales and business processes, which can be informed by big data. Read more
Integrate negative news, company, or legal data into a CRM system to provide additional context that empowers Sales.
Providing data to users over a network on-demand. This allows users to acquire and use external datasets, often in combination with their own data. Data as a Service that uses big data is growing rapidly, and Gartner predicted its market value would nearly quadruple from 2019-2025.
Nexis Data as a Service (DaaS) offers APIs and on applications for delivering highly relevant, archival, and current datasets to power an organisation’s big data projects.
An employee with the data and statistical skills to interpret and analyse data for insights. This job role is in high and growing demand from companies. Read more
Reviewing data to see if it is still valid, as well as correcting errors, eliminating duplicates, and standardizing data formats for greater consistency. Read more
The behind-the-scenes work to build systems that allow data scientists to do their analysis more quickly and efficiently.
We have decades of experience in managing and engineering data for optimal use by data scientists and other executives within companies.
A stream of data, for example an RSS feed or a social media feed. Read more
The set of rules and processes for how data is organised, aggregated and managed. Read more
Using data to tell stories and identify patterns and trends. Data journalists have gained prominence with analysis of topics ranging from the impact of political ads in the media to spread of global COVID-19 pandemic and effectiveness of various responses.
Media companies can and do analyse our extensive news data to find trends and stories. Read more
A way of storing a vast amount of raw data, whether structured, semi- structured or unstructured. This data can be stored within an organisation’s data centre or using cloud services. Read more
Communicating data visually, often using infographics, colour- coded graphs, or data dashboards. Read more
Taking raw data and formatting and restructuring it to make it useful. Data scientists often spend more than half of their time on data wrangling.
Nexis Data as a Service lets users move more easily from data wrangling to data analysis and interpretation. This frees up highly skilled data analysts from doing the more mundane work of cleaning and tagging data to focus on providing new insights.
Using very large neural networks to solve complex problems, such as facial recognition. Read more
The process of making a raw dataset more useful and insightful by normalizing the format and apply tags that make it easier to search and use.
Nexis Data as a Service complements its comprehensive content coverage with a data fabrication, classification, and enrichment process unmatched in the industry.
An approach to logic that is widely used in Artificial Intelligence. Rather than judging whether a statement is true or not, it judges how close to the truth it is. Read more
Using supercomputers to solve very complex, advanced computing problems. Read more
Interrelated computing devices, machines and physical objects that exchange or transfer data with each other over the internet. The term is commonly used to describe ‘smart homes’ in which thermostats, lighting and security cameras can be controlled by connected devices like smartphones. Read more
An application of AI that means computer systems that are able to learn, adapt and improve through experience and without following express instructions. These systems use algorithms and statistical models to analyse patterns of data and draw insights. Read more
Data that describes and gives information about other data - known as “data about data”. By summarising basic information, it makes it easier to find and use the data.
Nexis Data as a Service’s enrichments cover 125 descriptive metadata applied to our news content including headline, topic, index time, publisher, country language, editorial source rank, source topic and news category.
A type of AI concerned with the interactions between computers and the human language, particularly how to programme computers to process and analyse large volumes of natural (ordinary) language data. The technology can ‘understand’ text documents, including nuances in the language, and accurately extract information and insights from them. NLP is an example of machine learning. Read more
We have been fine-tuning our NLP for many years to improve search relevance across our platforms and Data as a Service APIs.
A system of connected nodes like neural connections in the brain that are used as a method of Machine Learning. Connections between the layers lead to outputs and a prediction. Read more
The process of reorganising data in different databases to make comparisons between the data easier and more meaningful. Read more
Identifying patterns in data, usually via algorithms, which allows predictions to be made when similar data is encountered. Read more
Using algorithms to find insights from large amounts of quantitative data. This is particularly useful in the financial sector, where trading decisions are often made by quantitative analysis of high volumes of numerical, financial data. Read more
Software that is programmed to do repetitive and often mundane tasks. RPA deploys robots to improve efficiency and free human resources for more high value tasks. It can have a dramatic impact on productivity, efficiency, and accuracy within business processes, such as fraud detection and risk mitigation.
Our targeted datasets are designed to fuel Robotic Process Automation which can optimize the efficiency and effectiveness of supplier and risk management processes.
A classification technology that helps researchers to find relevant information from large volumes of data by tagging documents. This is particularly useful for research. Read more
Nexis Solutions uses SmartIndexing Technology on our content universe, which allows searches based on topics to surface the most relevant results.
Data that has not been organised in a pre-defined manner. It is often full of text, dates, numbers, and facts, and requires a lot of effort to make it useful. Read more
Data that does not have a structured format and cannot be contained in a database of rows and columns, but a hierarchy has been established using tags or other markers.
The extent to which data is (or isn’t) accurate and correct, which in turn determines how effective analysis performed on the data is. Read more
A way of tagging data to describe it. Read more
A measurement for an enormous amount of data - bigger than an Exabyte and a Terabyte, but smaller than a Yottabyte. It was estimated that the digital universe comprised 44 zettabytes by the end of 2020. Data is being created at an exponential rate - 90% of the world’s data has been generated in the last two years alone. Read more