Wim Stoop is the Senior Director of Product Marketing at Cloudera where he leads the strategic vision for the company's mission of helping organizations turn data into business value at scale. He has over 20 years of experience in helping companies like IBM, BP, and HSBC solve their data-intensive challenges. Wim also attends various industry events where he regularly talks about big data strategy and direction.
From presentations and design documents to videos and photos, businesses today more than ever have content-rich assets that they need to save and store.
And with the growing amount of enterprise files and documents, managing data has become a challenge in companies’ own data centers, especially after the rise of remote work and online collaboration.
This is why almost all (92%) IT decision-makers in the EMEA region are planning to move more data to the cloud, according to a recent report.
Data accessibility (48%), optimized data storage and backup (44%), and reduced costs (38%) are cited as the main reasons for organizations not moving more of their data to the cloud.
As an expert in the field of big data strategies, for this interview topic we are having Wim Stoop, Senior Director of Product Marketing at Cloudera, a leading enterprise data management platform focused on accelerating digital transformation for the world’s largest enterprises.
Wim delves into the major drivers of the shift to cloud environments and discusses some of the biggest data management concerns that organizations face today.
Spotlight: Research shows that 60% of global enterprise data was stored in the cloud as of 2022 — surpassing the enterprise data stored on-premises for the first time. What drives this transition to cloud environments?
Wim Stoop: The main benefits of the cloud are flexibility, scalability, and agility. If companies have been looking to shift data to the cloud to save money, they may have experienced that there are drawbacks as it can become increasingly expensive as they scale their usage.
While almost all businesses (92%) in the EMEA region plan to move more data to the cloud going forward, according to a recent Cloudera study, 76% of companies also plan to repatriate some data back to on-premises environments. However, getting governance and data sovereignty wrong can be costly, and organizations are right to be concerned about its financial implications.
Additionally, the cost of the cloud itself is another consideration; careless cloud usage can quickly lead to costs for data and processing far in excess of what the data center equivalent costs. Companies require greater visibility into their data and workloads ahead of reaching decisions on the cloud. It’s all too easy to forget the original motivation. The answer is frequently "cloud-first"; but what was the question again? We have moved beyond "cloud first" to a "workload-first" era.
Resource consumption characteristics will be able to help determine if a workload is more suitable for an on-prem environment, or cloud-native deployment in the shared public cloud. Workload analytics allows organizations to monitor how a workload performs enabling them to make a more informed decision. Workloads that are more predictable and consume a relatively stable level of resources are often cheaper to run on-prem. Whereas a customer-facing service that’s more variable may be better in the cloud because of its elasticity.
But businesses need the capability to securely move data from cloud to cloud or from on-prem to any cloud, and vice versa. For example, 76% of companies in EMEA plan to repatriate some data back to on-premises environments, for example due to privacy regulations or costs.
Until now this has been a challenge. However, thanks to emerging modern data architectures, companies can gain more value from their data and simultaneously optimize their cloud costs. Analyzing operational metadata enables decisions about where workloads are optimally placed. This is a win-win for organizations looking to drive efficiencies in an ever-changing business climate.
Hybrid and multi-cloud data management is one of the hottest topics in the world of big data right now. Can you tell us more about these?
Managing data in a single infrastructure is already difficult, especially when the analytics lifecycle to drive insight and value from data is made up of point solutions or siloed systems. At the same time, data is sitting across a variety of hybrid and multi-cloud environments, making it increasingly difficult for organizations to extract value from their data assets.
More than two-thirds (68%) of organizations currently store data in a hybrid environment, meaning they utilize both on-premises/private cloud and the public cloud. Additionally, seven out of ten organizations (72%) currently have a multi-cloud model, and are working with two or more hyperscalers.
Each infrastructure is its own silo, however, organizations must operate as an enterprise. When data is siloed, it stops organizations from making quicker decisions. Data outside of a certain silo might just be the missing piece to get a complete view to make the right decision. Therefore, it’s essential that – regardless of where their data resides – organizations have the capability to securely derive value from their data, regardless of where it is located.
And there is no denying that data compliance and governance is front of mind for many organizations, especially those operating in highly regulated sectors. The governance landscape is becoming more complex by the day. For instance, regulations like Schrems II altered the requirements around citizen data and privacy. The regulation introduced more controls with significant financial consequences for non-compliance. With this in mind, many organizations are choosing to play it safe and revert their data back on-prem to have control over where it exists and ensure it doesn’t leave their control. As major cloud providers are based in the US, data sovereignty is less of an issue there, However, data sovereignty is an increasing worry for those based in EMEA and APAC. Also while managing data in the US for the US business, multinational organizations must ensure compliance with the local and regional directives elsewhere. This means that one company has several data privacy and sovereignty issues to consider.
To amplify control over data, it’s imperative that organizations adhere to cohesive security policies and implement these across all infrastructures and environments. They must ensure governance is consistently applied 'always and everywhere'. By doing this, it is easier for a company to follow regulations and establish compliance. Good governance and a proper understanding of data alone does not make a company compliant. But without such a foundation in place, it is neigh on impossible to retrofit. Every regional compliance becomes a reinvention of the wheel for just that region. Good governance and proper understanding of data has to be baked in from the start, in strategy but also in technology. With a single set of globally defined policies, enterprises can repeat security standards across all cloud and on-prem environments, thereby reducing risk, saving time, and mitigating human error.
Cloudera, a leading enterprise data management platform, aims to make hybrid and multi-cloud easier. Can you give us an overview of the platform and how it achieves this?
Cloudera Data Platform (CDP) is a hybrid platform with portable, interoperable data analytics for the full data lifecycle. It supports distributed data management across public clouds, on-premises, and at the edge.
- We deliver cloud-native data analytics across the full data lifecycle – data distribution, data engineering, data warehousing, transactional data, streaming data, data science, and machine learning
- CDP embodies a hybrid data platform's write-once, deploy-anywhere capabilities, making data application development faster, easier, and more cost-effective
- Cloudera’s common security, governance, metadata, replication, and automation enables CDP to operate as an integrated system
- With CDP organizations can accelerate their insight and value by implementing modern data architectures and paradigms: data mesh, data fabric, and data lakehouse
Managing data protection in a cloud environment has become complex due to the shift to a hybrid work model and accelerated cloud transformation. How does Cloudera help organizations navigate data security challenges?
Shared Data Experience (SDX) is a fundamental part of Cloudera Data Platform architecture and delivers comprehensive security right out of the box for both data lakes as they are deployed and data as it is used.
Covering both compute and storage layers, SDX delivers an integrated set of security and governance technologies built on metadata and delivers persistent context across all analytics as well as public and private clouds. Consistent data context simplifies the delivery of data and analytics and reduces risk and operational costs. IT can deploy secured and governed data lakes quicker — subsequently giving more users access to more data.
In addition, at Cloudera, we’re big on certification: we continually strive to apply industry best practices, validated through third-party audits and certifications. Our ISO 27001 and SOC 2 Type II certifications makes sure that CDP is developed, reviewed, tested, and released adhering to the ISO and AICPA Trust Services Principles.
Cloudera's mission is to empower organizations to transform complex data into actionable insights faster and easier. How valuable is this process in making informed business decisions and gaining a competitive advantage in the market?
Generated data is growing faster than ever. Our customer’s data doubles every 12 to 18 months, but a lot of data is never used in decision-making. At the same time, the world is changing faster than ever as seen during the past pandemic years, with organizations having to react ever faster to innovate and differentiate.
A critical metric for this is the time to value in deploying new use cases. For a large part, this is achieved by unlocking and understanding the available data faster and also enabling more users to use both data and analytics in a self-service manner. Just think of low-code and no-code approaches: they accelerate driving insight and value from data by making the tools to do so available to more, less technical users. Low and no-code unlocks data and analytics to more users, Large Language Models now unlock it to anyone who can formulate a question. However, the foundation, regardless of whether low and no-code or LLMs, remains the same: data.Trusted data in the right context.
All organizations today want to move faster, as they need to stay ahead of the curve. Those that can harness their data in a swift, cost-effective manner – irrespective of where it’s located – will have a competitive advantage.
How does Cloudera differentiate itself from rivals such as Amazon Redshift, Azure Synapse Analytics, and Microsoft SQL Server?
Those rivals are only part of the solution. If all you needed to get to value was a better database warehouse, the challenges of going from data to value would have been solved. Data can no longer be tamed just by throwing a data warehouse at it and calling it good. Businesses simply need analytics for the whole analytics lifecycle that work anywhere, with data at any scale, and with consistent security and governance. That’s what we do, that’s how we’re different.
Based on customer experience, what are the biggest data management concerns that customers approaching Cloudera face?
By leveraging all available data across enterprise systems and multiple online touchpoints, companies can generate a comprehensive view of customers that can strategically drive business growth. This integrated method allows companies to better understand customers and strengthen personalized marketing, reduce customer churn, and improve marketing automation. It can also improve the overall customer experience.
It takes time to become a truly data-driven organization as the process of creating a hybrid multi-cloud infrastructure to control all data is complex – even for highly competent IT teams. However, organizations can ill afford to stand still if they want to remain competitive in today’s hyper-competitive environment.
To become truly data-driven companies, must realize that an ongoing commitment is required.
The biggest problem is mostly not technology itself, but data strategy and organizational change. Organizational change is necessary in most circumstances to launch a data-driven culture and essentially a new paradigm for working. But this can create problems. Therefore, it’s vital to implement a change management process that encourages data-driven decision-making. To be successful, it also needs to appeal to employees at all levels to work towards this goal.
Why do you believe data architecture is one of the biggest organizational challenges?
Sometimes different parts of the organization make different choices. Sometimes the cart is being put before the horse, meaning a cloud strategy before a data strategy.
Essentially, the cloud is a delivery model – and even though it is adaptable, agile, and scalable, it is still a method of delivery. Data, as a strategic asset, demands its own strategy, such as an Enterprise Data Strategy. Early cloud adopters have realized that without an Enterprise Data Strategy a cloud strategy alone can hamper the management, access, security, and governance of their data. In fact, early cloud adopters found that the move to the public cloud created data and analytics silos that were difficult to manage and far more expensive to run.
With modern data architectures, organizations can derive more value from their data and simultaneously improve cloud costs. As to the previous question: it’s a big change process.
With the deployment of 5G, the volume of data being generated and processed is expected to increase significantly. How is Cloudera assisting companies with these efforts?
5G reduces latency and increases bandwidth, and the Internet Protocol standard IPv6 allows more things to be connected. Ergo, a bigger opportunity to instrument and connect EVERYTHING, especially in the manufacturing and logistics sectors but also in other industries.
But now companies need to deal with the data, which is now more and more real-time and streaming, and that’s a different kettle of fish.
Companies will be required to merge data at rest alongside data in motion in new and flexible ways, while simultaneously keeping their data in hybrid data environments. This can be either on-premises or across multiple public clouds. This calls for analytical tools and platforms that streamline operations and adequately sustain secure hybrid deployments.
The elevated need for real-time decision-making is requiring organizations to embrace modern data architectures that allow agility, security, and enhanced governance of enterprise data. A platform is only truly able to harness the potential of data in motion when it can integrate data of different types and sources and covers every stage of the data life cycle, from the edge to AI.
Cloudera offers a complete Data in Motion platform as part of our stack. Not only to just capture and handle streaming data but also to make decisions on streaming data with tools that are easy to use like the standardized programming language SQL.
The question also touches on another interesting part of our organization. Yes, we do provide the only hybrid data platform for modern data architectures with data anywhere yet in addition to our technology, we also provide the assistance to make the most efficient use of it. This assistance comes in the form of professional services, customer support, and also training.
Lastly, what do you think are some of the biggest trends in the hybrid cloud space this 2023?
Hybrid cloud is a trend itself with many companies planning to repatriate data from the public cloud back to on-premises or to the private cloud. As mentioned, hybrid is the new de facto standard with 68% of organizations currently storing both on-premises/private cloud and the public cloud. And companies are finding it difficult to fully extract value from their data assets sitting across a mosaic of hybrid and multi-cloud environments. Almost three-quarters (72%) of respondents agree that having data sitting across different cloud and on-premises environments makes it complex to extract value from all the data in their organization.
We are likely to witness more interest and focus on federated secure data platforms. An example of this is Gaia-X – a federated project being worked on by multiple countries in Europe – that is aiming for modernization through data sovereignty. Instead of functioning as a single cloud, a federated system connects multiple cloud services and users in a much more transparent way.
A unified data fabric approach to delivering disparate data sources intelligently and securely in a self-service manner across multiple clouds and on-premises is in essence a federated data management system at scale, helping businesses navigate the data challenges they face.
Thank you for your time, Wim Stoop. Best of luck to you and Cloudera!