Data Overload – A Beginner’s Guide to Handling Big Data in Construction and Engineering | Jobs Vox


Construction projects are rich in data. It’s common to find schedules, spreadsheets, letters, reports and photos even on the smallest of projects. This is set to increase with the widespread adoption of mobile and cloud-based technologies on construction projects. For organizations, this presents a challenge to effectively manage the growing amount of data and provides an opportunity to derive valuable insights from it.

What is Big Data?

Big data has become a topic of discussion in recent years. This is likely to be influenced by the explosive growth in data generated globally. World Economic Forum1 reports that we generate 294 billion emails, 65 billion whatsapp2 Message and 720,000 hours of new content on YouTube3 Everyday.4

As the global ‘datasphere’ continues to grow, we will see a corresponding increase in the number and size of datasets we encounter on construction projects. But is it “big data”, and if so, how do we analyze it?

Data scientists do not define big data in terms of a fixed number of bytes or records, but in terms of the technology used to analyze it. Large datasets are often too large to analyze using traditional databases on a single computer, so the data is distributed across multiple networked computers. The use of the collective resources of these computers allows for greater efficiency when performing computations. Big Data is therefore not synonymous with large databases or Excel sheets; It is a field of its own with specialized tools that depend on advanced programming skills.

While we may not all deal with big data in the strict technical sense, the frequency with which we interact with large and complex datasets is likely to increase. Effective management and analysis of this data will determine the extent to which we can benefit from it.

data = information

The value of data is found in its ability to reveal patterns, trends and risks, which can be used to inform strategic operational decisions. An article in The Economist5 Said in 2017 that “the world’s most valuable resource is no longer oil, but data”.6 Oil is an apt analogy for the data given that both go through a process of extraction and refinement before reaching their maximum value. For data, this process transforms it from its raw form into something useful: actionable. The steps involved in this transformation are described in more detail in the following paragraphs.

The steps involved in this transformation are described in more detail in the following paragraphs.

data gathering.

Data exists in various formats and the method of collection may be different for each.

Structured data is information that has been organized in a prescribed format or hierarchy. Examples include relational databases and Excel spreadsheets. These are usually easy to sort, filter and analyze. Collection of these datasets may involve exporting from planning or accounting software, or compiling a set of Excel files.

Unstructured data is the opposite; It has no clearly defined format and can be difficult to filter and organize. Examples include visual records such as photographs and videos, or text documents such as monthly reports. If recorded on a GPS-enabled device (such as a mobile phone or tablet), information relating to precise location and time may be embedded in the record. These files are often called ‘semi-structured’ because the metadata is structured but the recording is not.

While it is often difficult to analyze unstructured files, tools such as Aconex7 This process can be simplified by structuring the files and allowing efficient content searching of a large corpus of documents. Image recognition tools that rely on sophisticated machine learning algorithms are also becoming increasingly powerful, although they often require a ‘training set’ of images to teach them how to recognize and label certain objects.

data cleaning.

Most people are familiar with the saying “garbage in, garbage out”. We can use the most sophisticated tools and data-mining techniques but if errors and inconsistencies are not identified and corrected, the analysis will produce spurious results. This is why data cleaning can be the most important but time-consuming step.

Typical errors include incorrectly formatted data (for example, dates stored as text), duplicate or missing values, or multiple variants of the same value. Distilling the data into a common set of clusters will allow the analyst to effectively summarize the data, which will ultimately lead to more robust conclusions.

The process of filtering for anomalies and errors is almost impossible to do with the naked eye or even using filters. use access8 database or pivot table in excel9 Will allow more thorough examination of the data.

Data exploration and analysis.

The data analysis step should be relatively straightforward if the data is properly prepared. Ironically, the bulk of the work is done before the data is even analyzed.

Exploratory Data Analysis (EDA) is a technique for interrogating data and will vary according to the desired outcome of the analysis. EDA may involve a numerical analysis of the data, for example computing distributions or averages, or a graphical approach to reveal trends or interrelationships in the data.

Power BI10 There is free data visualization software that can be used to explore data. It was developed by Microsoft1 1 so integrates easily with office12 Allows visual inspection of relationships between variables in applications and datasets.

data submission.

The findings of the analysis should be communicated as clearly and efficiently as possible. This may require the use of a summary table, chart, or other data visualization.

Information design is both an art and a science, but too little consideration is given to either aspect, often to the detriment of the audience. As a rule of thumb, less is often more when presenting information. In other words, graphics that are cluttered and heavily annotated require more time to understand and interpret. This reduces the impact of that graphic and the clarity of any insights derived from it.

A better practice is to minimize graphical furniture (such as lines, dashes, axes, etc.) that can distract the human eye. This will ensure that key data points are the focus of the audience.


Tradesmen are trained to use their tools, teachers are taught how to teach, but there is little training in how to manage and analyze large datasets, despite the frequency with which we use them. Let’s face it. Construction organizations have historically been slow to adopt new technologies, often resulting in overloaded spreadsheets and poorly structured data.

By training employees to efficiently manage and analyze data, companies can proactively adapt to growth in data volumes – and benefit from it, too. It can also help in delivery of projects and maintaining key records in case of any dispute.

Organizations may also consider hiring professional data analysts. a linkedin13 Survey recently found that data scientist was the fastest growing job in Singapore.14 This reflects a wider trend of companies recognizing that they are data rich but information poor.

Either way, doing things the way they’ve always been done is no longer a viable option for construction firms.


1: The World Economic Forum is an international non-governmental organization that brings together political, business, cultural leaders to shape the global, regional and industry agenda. It is headquartered in Geneva, Switzerland (

2: WhatsApp LLC

3: YouTube is an online video sharing and social media platform headquartered in San Bruno, California (

4: Melvin M. Wopson, “The World’s Data Explained: How Much We Are Produced and It’s All Stored,” World Economic Forum (May 7, 2021), 05 /world-data-produced-stored-global-gb-tb-zdb/.

5: The Economist is a weekly newspaper that focuses on current affairs, international business, politics, technology and culture (

6: “The world’s most valuable resource is not oil, but data,” The Economist (May 6, 2017), The resource is no longer oil but data.

7: Aconex is a mobile and web-based collaboration platform for project information and process management based on Software as a Service (SaaS) for customers in the construction, infrastructure, power, mining and oil and gas sectors. (

8: Microsoft Access

9: Microsoft Excel

10: Microsoft Power BI

11: Microsoft Corporation

12: Microsoft Office suite of applications including Word, Excel and Access.

13: LinkedIn Corporation

14: A commentary on the survey’s key findings is available at: Claudia Chong, “Here are the 5 fastest-growing jobs in Singapore, says a LinkedIn survey,” The Straits Times (September 7, 2018), https://www. . It is based on LinkedIn’s Emerging Jobs Report: Singapore (2018) Emerging-Jobs-Report.pdf)


Source link

Implement tags. Simulate a mobile device using Chrome Dev Tools Device Mode. Scroll page to activate.