To help put this question into context, I want to frame my view of big data, with one use case.

“Understanding our customers with the purpose of connecting them with relevant products, content, advertising by combining known and predictive data from a multitude of information sources”

Firstly, there is NO binary answer to the “build or buy” question, just a number of things that should be taken into account when making the decision.  To further confuse things, I am going to divide “buy” into two options.  Buy technology that can collect, enrich and predict targetable attributes across the existing customer base (1st party data), or buy data from other sources (3rd party data).

Before moving onto the question posed by this blog, let us consider the 1st party and 3rd party options.  The first question we must ask ourselves is “is it possible to collect sufficient information, either directly or indirectly, with a high degree of accuracy to rely on a 1st party approach?”.

Build or buy data?

I have spent most of my career working with publishers (newspapers, magazines, trade, academic).  Publishers have a wealth of direct (or known) customer information (subscriptions, classified adverts, product purchases, site registrations, competition entries …).  They also have a wealth of indirect (predicted or derived) information.  This is available due to the level of customer interaction within their digital solutions (from content sites to careers).  From these interactions sociodemographic factors (age, language, gender) and status (income, education) as well as interests, such as sports & travel can be predicted for each visitor.

If you have a high level of interaction with your customers then it is likely that capturing and generating 1st party data will be the right choice, otherwise I would suggest you look at purchasing 3rd party data (which has inherent limitations – these will be covered in a future blog).  The options are not mutually exclusive, as 1st party data can be enhanced with 3rd party data.

Build or buy technology to collect and generate 1st party data

So that leaves us with the question “Build or Buy Technology”.  Again, this is not a simple question.  Some of the main questions that need to be answered include:

  • What level of accuracy do you need when launching your solution?

The reason this question is at the top of the list is because it is the most important.  If you accept that it will take years to achieve results that deliver 80+% accuracy, you must accept that you will have to live with lower results for a long time.

A simple view is “the technology, or rather the data that it produces, will continually improve the connection between buyer & seller, by personalising the offer”.  The fact is, the solution will not know when it is getting the predictions wrong (initially).  Personalising the offer (or making it more specific), by definition means you are making it less relevant to customers who do not have the predicted attributes (such as age and income).  Inaccuracy means you could have a negative impact on sales, ad targeting etc. 

  • Do you have access to the skills required to build the solution?

There are many skills required, in addition to the traditional developers, testers, architects, analysts, product managers…  You will also need staticians, mathematicians, data analysts & other highly skill technicians that understand cognitive computing.

  • Will you be able to retain the expertise indefinitely to maintain the solution?

As mentioned, these are highly specialised skills; therefore, if you expect investment to reduce, it is likely to be difficult to retain sufficient knowledge & capabilities for the times when you really need it.

  • Are you prepared to wait to obtain optimum accuracy?

Every vendor has improved the quality of their solution over time.  Main years.  While it may be possible to get acceptable results within a relatively short time period, to get good results it is likely to take significant time, effort, research & development.

  • Are you prepared to continue to enhance the technology to work with emerging web standards?

Things change on the internet.  3rd party cookies are now being blocked by some browsers by default.  (a tacking mechanism employed by some solutions).  The vendors who want to support the publishers across multiple domains, needed to work around this, or accept that they would only be able to understand a subset of the customers.  These solutions are not a “one off” development.

  • Are you prepared to maintain metadata (librarian services) to allow the technology to perform optimally?

Some solutions obtain a good/high level of accuracy though require extensive and up to date metadata (data about data), rather than using “data science” (machine learning, artificial intelligence, cognitive computing).  These solutions, while quicker to develop and gain reasonable accuracy, have a significant cost of ownership.

  • Are you prepared to invest in a solution that will continue to learn, or are you going to solve today’s problem?

Frequently in-house teams focus on the immediate problem that you are trying to solve, without designing for the future, the unknown.  Software vendors must deal with the unknown with every implementation, which forces them to produce flexible solutions.

If your answer to any of these questions is no, then I suggest the answer is to buy, not build.

If you have answered yes to these questions & others, there are still ways to engage with a vendor to increase the likelihood of success.  The solution is likely to have many parts (collection, capturing the click streams, ETL, surveys/panels, entity extraction & semantic enrichment, data science, machine learning, data management platforms …).

Some vendors can provide some of these parts as well as the complete solution.

If you are going to embark on this type of project I hope this blog highlights some of the considerations.

6 thoughts

  1. interesting but I think you have missed a big point. the reason you have to build is you can not rely on a supplier to stay up to date with our data requirements. it continues to change – pretty much every time we launch a new service – or the company brings in a new system that has customer information. also it is not the case of creating the actionable data but tracking outcomes. for these reasons you have to build

    1. Mark,

      You are indeed correct in what you say. You need to be able to stay up to date with data requirements; that is both storage and modelling. Not all vendor solutions create a “vendor lock”. It is possible to find a supplier that allows you to provide additional data, add it into the DMP and change, add to, enhance the predictive capabilities.

      When doing so, it is also important to “retest” the predictions, so selecting or building a solution that can check its predictions against known data is important; and should be a regular part of the refinement process.

      Also an area that is rarely considered is how new predictions are used by the systems (websites, marketing tools, CRM systems …). If you are going to build, make sure this area is extremely flexible and allows new data to be surfaced without rework.

      A great point!

  2. My view is from a slightly different perspective – having operated as a product developer for many years, I see in-house teams continually and profoundly underestimating the complexity of delivering any given solution, resulting in a massively sub-optimal delivery that sets the program behind the curve on a permanent basis.

    Assuming that the product developer brings proven code and experience to the table then any alternative can only ever be inferior and will waste time/effort/money simply trying to reach the start point that the product vendor set off from.

    The features of most off-the-shelf solutions have been forged in the heat of the commercial marketplace, undergone real world testing and improvement over different deployments and many iterations.

    The in-house team simply cannot provide a substitute for this – its so much more than just backsides on seats.

    However I would say that the in-house team brings much to the table in terms of long-term instance specific knowledge and ability to manipulate augment and utilise the data that is generated by whatever local systems are in place.

    1. Jim, there are a number of excellent points here. When I wrote the article I didn’t comment too much on things from a software vendors perspective (as I am one), though you clearly are one too, or have been. Yes, we need to to be the “best” amongst our peers so that we can compete. Good enough isn’t good enough.

      We need to be able to track talent, so respect amongst our peers is really important. The team continually strives to improve & not just stay up to date. You also make an interesting point about buying in the knowledge & experience. As you say, that is at a “point in time” and not necessarily the evangelists who work with the engineers to steer the future direction. However, as you say, they may be able to help you to replicate what the vendors are doing, or will they? In many areas of software I would say yes, though in this area, the skills required to even replicate are diverse. Being a good engineer and “understanding” how the solution in your last engagement worked is not enough. There are so many moving parts there is not one or even two people who fully grasp it all.

      The next problem is, when you have managed to replicate the old solution, you are out of date. The vendors have moved on & you are now detached from their pool of experience. It may have taken you a number of years to get to a similar point.

      In my opinion, there are a few instances when you should build instead of buy (very few). The development investment a company makes around big data should be “using it”, not “building it”. When you have reliable data you are in a position to exploit limitless use-cases and drive really meaningful connections with buyers/customers. I wouldn’t waste time on trying to solve what has already been solved.

  3. I work for a large online retailer, we have been reviewing our use of data and analytics capability and we are currently reviewing the business case to buy Vs build. We have the capability capacity to build but my question is why? What benefits would a purchased solution offer against the outlay of building your own.

    1. Michael, thanks for responding. I would ask a question first, as profiling the audience/buyers based upon behaviour/buying patterns is of course possible. My specific knowledge is based upon the behaviour around reading content. That said, retailers like most industries are now publishing content – so, I am going to assume that this is the case.

      You say that you have the capability & capacity to build. It is hard for me to question that, though I point to the diversity of skills required to make this work. It is not just engineering. You will need high end statisticians as well. The risky area in the development is deriving accurate data to base your customer interactions on. This problem has been solved by a number of vendors, so I would question why try and solve the same problem, probably paying more, with a longer time to market, with the risk of failure. If you can afford to fail then build might be an option.

      Don’t forget a significant point. If you are going to use the data to refine/narrow the communication with the customer, bad or inaccurate data is worse than no data.

      Going back to your capability and capacity statement. I would suggest you use it on developing the customer engagement piece, not deriving the data. Rely on well founded data & use it to drive relevant offers, improved navigation, simplified buying, relevant targeting of content/advertising …

      In this blog I have assumed that the owner of the 1st party data wants to use it for their own purposes. That is not always the case. Some owners want to sell/commercialise the data; turning it into 3rd party data others can buy. That may not be the case with your company, though if it is – do not risk generating low quality data. If buyers of 3rd party data do not get immediate positive results, they will move on to another supplier.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.