Term Archives - Page 14 of 16 - Data Ideology

Apache Pig

Pig is a platform for creating query execution routines on large, distributed data sets. The scripting language used is called Pig Latin (No, I didn’t make it up, believe me). Pig is supposedly easy to understand and learn. But my question is how many of these can one learn?

Connection Analytics

Connection analytics is the one that helps to discover these interrelated connections and influences between people, products, and systems within a network or even combining data from multiple networks.

Apache Sqoop

A tool for moving data from Hadoop to non-Hadoop data stores like data warehouses and relational databases.

Apache Storm

A free and open source real-time distributed computing system. It makes it easier to process unstructured data continuously with instantaneous processing, which uses Hadoop for batch processing.


You have SaaS, PaaS and now DaaS which stands for Data-as-a-Service. DaaS providers can help get high quality data quickly by by giving on-demand access to cloud hosted data to customers.

Data virtualization

It is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details of where it stored and how it is formatted etc. For example, this is the approach used by social networks to store our photos on their networks.


1 followed by 27 zeroes and this is the size of the digital universe tomorrow.

Dirty Data

Now that Big Data has become sexy, people just start adding adjectives to Data to come up with new terms like dark data, dirty data, small data, and now smart data. Come on guys, give me a break, Dirty data is data that is not clean or in other words inaccurate, duplicated and inconsistent data. […]

Business Intelligence (BI)

Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.