Why is Data Engineering a big deal?


Over the past 5 years there has been a lot of management meetings involving Big data implementations and discussions about the huge potential gains that could be made for the enterprise as it managed complex sets of information to improve profitability. (In the context of this article, “BIG DATA” refers to the process of applying serious computing power – the latest in machine learning and artificial intelligence to highly complex, detailed data that has been stored in legacy and new database systems. Which then is presented back to decision makers, i.e. a consumer at a retail department or an executive wanting to review a sales report.)

What is Big Data?

Big data can be comparing utility costs with meteorological data to spot trends and inefficiencies. Big data can be comparing ambulance GPS information with hospital records on patient outcomes to determine the correlation between response time and survival. Big data capture can also come from the tiny device you wear to track your movement, calories and sleep to track your own personal health and fitness.

Susan Hauser, corporate vice president of Microsoft’s Enterprise and Partner Group.

“Big data absolutely has the potential to change the way governments, organizations, and academic institutions conduct business and make discoveries, and its likely to change how everyone lives their day-to-day lives,”

“Our daily lives generate an enormous collection of data,” said Dan Vesset, program vice president of IDC’s Business Analytics research. “Whether you’re surfing the Web, shopping at the store, driving your smart car around town, boarding an airplane, visiting a doctor, attending class at university, each day you are generating a variety of data,” he continues.

“The benefit of the data depends on where and to whom you’re talking to. A lot of the ultimate potential is in the ability to discover potential connections, and to predict potential outcomes in a way that wasn’t really possible before. Before, you only looked at these things in hindsight.” (Quote dated Feb 2012)

Fast Froward to date:

So what is the current vibe for all things “Big Data”? Well there has been an awakening to the fact that all of this talk of enterprise wide implementations didn’t really take into consideration that you need seriously smart “techies” who can manage these projects. Often the brains behind the implementations are not speaking the language of the executive team that understands the business objectives and business processes that maximize profitability. So how do they exploit this opportunity with limited resources and bandwidth? Well it’s a big deal. It’s a really big deal and it requires skilled professionals who not only understand the business model and industry challenges they reside in but who also have a deep grasp on the technology or computer code needed to bring all strings of data to the same ball park in order to play nice together.

Bring in the Data Engineer.

The data engineer is someone who understands the complexities of disparate systems and legacy arms of an organization and is able to map out the relationships that will have a positive impact on the delivery of needed information. They are someone who is a specialist in their field and have had 1000’s of hours of experience managing data tables, DB administration, reports, management, front end and back end developer teams and having an awareness of the infrastructure that needs to be in place to be proficient.

Eron Kelly, General Manager of product marketing for Microsoft SQL Server, said “Big data is important, yet the real gap is going to be in skills and ability. In the next few years millions of big data-related IT jobs will be created worldwide. In the years to come, businesses that successfully harness the power of big data will outperform and outcompete competitors.”

However according to the McKinsey Global Institute, there is a major shortage of the “analytical and managerial talent necessary to make the most of big data.” The United States alone faces a shortage of more than 140,000 workers with big data skills as well as up to 1.5 million managers and analysts needed to analyze and make decisions based on big data findings.

Technical Knowledge needed.

There are obvious cost restraints attached to Big Data implementations and that is another thing that organizations were not really prepared to invest in due to the fact that all things being considered they were still able to be successful without the all new powerful data driven system and its huge price tag.

Introducing Hadoop:

Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in today’s hyper-connected world where more and more data is being created every day, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless.

Hadoop can handle all types of data from disparate systems: structured, unstructured, log files, pictures, audio files, communications records, email – just about anything you can think of, regardless of its native format. Even when different types of data have been stored in unrelated systems, you can dump it all into your Hadoop cluster with no prior need for a schema. In other words, you don’t need to know how you intend to query your data before you store it; Hadoop lets you decide later and over time can reveal questions you never even thought to ask.

By making all of your data useable, not just what’s in your databases, Hadoop lets you see relationships that were hidden before and reveal answers that have always been just out of reach. You can start making more decisions based on hard data instead of hunches and look at complete data sets, not just samples.


The bubble has burst for the benefits of big data system implementation and the hype has gone away, but the benefits for the enterprise which embraces a system that uses big data is not going to burst anytime soon. Today more than ever if you are a developer who has a “Big Data” mindset who wants this to become part of your long term career path, you will do well to look into the Data Engineer developers class here at Atlanta Code.

Contact us to see how learning Hadoop could be the next best decision you make in 2015[/vc_column_text][/vc_column][/vc_row]