To help put this question into context, I want to frame my view of big data, with one use case.
“Understanding our customers with the purpose of connecting them with relevant products, content, advertising by combining known and predictive data from a multitude of information sources”
Firstly, there is NO binary answer to the “build or buy” question, just a number of things that should be taken into account when making the decision. To further confuse things, I am going to divide “buy” into two options. Buy technology that can collect, enrich and predict targetable attributes across the existing customer base (1st party data), or buy data from other sources (3rd party data).
Before moving onto the question posed by this blog, let us consider the 1st party and 3rd party options. The first question we must ask ourselves is “is it possible to collect sufficient information, either directly or indirectly, with a high degree of accuracy to rely on a 1st party approach?”.
Build or buy data?
I have spent most of my career working with publishers (newspapers, magazines, trade, academic). Publishers have a wealth of direct (or known) customer information (subscriptions, classified adverts, product purchases, site registrations, competition entries …). They also have a wealth of indirect (predicted or derived) information. This is available due to the level of customer interaction within their digital solutions (from content sites to careers). From these interactions sociodemographic factors (age, language, gender) and status (income, education) as well as interests, such as sports & travel can be predicted for each visitor.
If you have a high level of interaction with your customers then it is likely that capturing and generating 1st party data will be the right choice, otherwise I would suggest you look at purchasing 3rd party data (which has inherent limitations – these will be covered in a future blog). The options are not mutually exclusive, as 1st party data can be enhanced with 3rd party data.
Build or buy technology to collect and generate 1st party data
So that leaves us with the question “Build or Buy Technology”. Again, this is not a simple question. Some of the main questions that need to be answered include:
- What level of accuracy do you need when launching your solution?
The reason this question is at the top of the list is because it is the most important. If you accept that it will take years to achieve results that deliver 80+% accuracy, you must accept that you will have to live with lower results for a long time.
A simple view is “the technology, or rather the data that it produces, will continually improve the connection between buyer & seller, by personalising the offer”. The fact is, the solution will not know when it is getting the predictions wrong (initially). Personalising the offer (or making it more specific), by definition means you are making it less relevant to customers who do not have the predicted attributes (such as age and income). Inaccuracy means you could have a negative impact on sales, ad targeting etc.
- Do you have access to the skills required to build the solution?
There are many skills required, in addition to the traditional developers, testers, architects, analysts, product managers… You will also need staticians, mathematicians, data analysts & other highly skill technicians that understand cognitive computing.
- Will you be able to retain the expertise indefinitely to maintain the solution?
As mentioned, these are highly specialised skills; therefore, if you expect investment to reduce, it is likely to be difficult to retain sufficient knowledge & capabilities for the times when you really need it.
- Are you prepared to wait to obtain optimum accuracy?
Every vendor has improved the quality of their solution over time. Main years. While it may be possible to get acceptable results within a relatively short time period, to get good results it is likely to take significant time, effort, research & development.
- Are you prepared to continue to enhance the technology to work with emerging web standards?
Things change on the internet. 3rd party cookies are now being blocked by some browsers by default. (a tacking mechanism employed by some solutions). The vendors who want to support the publishers across multiple domains, needed to work around this, or accept that they would only be able to understand a subset of the customers. These solutions are not a “one off” development.
- Are you prepared to maintain metadata (librarian services) to allow the technology to perform optimally?
Some solutions obtain a good/high level of accuracy though require extensive and up to date metadata (data about data), rather than using “data science” (machine learning, artificial intelligence, cognitive computing). These solutions, while quicker to develop and gain reasonable accuracy, have a significant cost of ownership.
- Are you prepared to invest in a solution that will continue to learn, or are you going to solve today’s problem?
Frequently in-house teams focus on the immediate problem that you are trying to solve, without designing for the future, the unknown. Software vendors must deal with the unknown with every implementation, which forces them to produce flexible solutions.
If your answer to any of these questions is no, then I suggest the answer is to buy, not build.
If you have answered yes to these questions & others, there are still ways to engage with a vendor to increase the likelihood of success. The solution is likely to have many parts (collection, capturing the click streams, ETL, surveys/panels, entity extraction & semantic enrichment, data science, machine learning, data management platforms …).
Some vendors can provide some of these parts as well as the complete solution.
If you are going to embark on this type of project I hope this blog highlights some of the considerations.