CRM Software

How complex is CRM data?

Excel is probably the first thing that comes to mind when it comes to data collection but enterprise data needs more than that. Therefore, although Excel has been excellent at managing personal data, Excel still reluctantly withdraws from the game. So how complex is CRM data?
Written by
Published on
June 28, 2024

Every business has a different status and maturity in its data strategy. Let's temporarily divide it into five levels: level 1 is the lowest, roughly speaking, only Excel and customer data is managed in a personal way. Level 5 is the highest, the data is centralized, organized, and managed methodically, solving all professional and business problems, in general, it cannot be better.

Excel is probably the first thing that comes to mind when it comes to data collection: the ability to store up to a million lines, extremely flexible data processing with thousands of functions and formulas, diverse chart design along with the magic of Macros, and above all, the cost of a mosquito when compared to other solutions.

But enterprise data needs more than that. It requires the interaction of multiple departments and teams, with decentralization of functionality (create, delete, edit) – and data retrieval (who is seen, which data is manipulated, when). Therefore, although Excel has been excellent at managing personal data, Excel still reluctantly withdraws from the game.

Breaking up with Excel early so that we don't have regrets and waste time bouncing about "should or should not Excel", we go straight to building a real strategy. Any business can divide customers into two groups: non-customers and already customers. - The "non-customer" group is usually located on the CRM and spread across three modules: Marketing, Sales, and Service.

The group "has become a customer" is more important, so if you invest in software, it will be invested first: Core banking of banks, Core Insurance of insurance enterprises, electronic medical records of hospitals, eCommerce of e-commerce businesses, POS, warranty systems, loyal customers, etc.

The need to connect data between these two groups is completely natural, bringing a lot of value, so the connection must be taken into account. Technically, it can be divided into two connectivity trends: Sustainability and Flexibility.

Sustainability Trends

The goal of building a "central repository" - Golden record of Data, or System of Record, is a place to connect all the data of all sources into a centralized place, clean and beautify to the point that the data becomes exemplary and ready to serve other purposes. Related technologies include:

  • Data lake or data pool: data that is centralized but has not been processed or grouped.
  • Data Warehouse: data is centralized and partially processed, divided into data groups but there is no connection between these groups.
  • MDM - Master Data management is the ultimate form, data reaches the asymptote of perfection. The data is connected, but the perspective is aggregated from many different sources, not just customer data, so in terms of customer data, it is still basic.
  • At the perfect level of customer data, it is CDM (Customer Data Management) or CDP (Customer Data Platform). At this time, at CDM/CDP, customer data is divided into subsets, small, accurate and tightly connected.

Illustrate the relationship between data solution teams

The process of building common data includes steps: collection, filtering, standardization, unification and use. Although the names are different, all the steps have in common that they are complicated.

Aggregation: this process has three basic steps: ETL. Extract - export from the old system; Transform - convert formats; and Load - Enter the new system. Due to its "Sustainable" nature, the real-time factor is usually not emphasized. Data is poured into the central warehouse at defined frequencies, typically daily – late at night or when the system is not busy.

Filtration: this process seems easy but difficult. There are two stages: filtering duplicate data before putting it into the system, and filtering new data that matches the data already in the system. The difficulty lies in how to define "overlapping" it is too complicated. A data with many fields: Matching all fields is definitely a match, but partially matching is also a match


(1) Nam Nguyen - Ho Chi Minh City

(2) Nam Nguyen - Ho Chi Minh City

(3) Nguyen Nam - Ho Chi Minh City

As in the above 3 cases, which one is the same?

At a basic level, the system can only define a match according to certain criteria: the same name and phone number are the same - that is, 100% match. According to this mechanism, all three of the above data are not identical.

But at a higher level, the system can allow the encoding of field information into sequences of numbers and use a fuzzy threshold, for example, 80% for comparison. According to this mechanism, all three data are identical.

So what should be done after detecting a duplicate? This problem is also the most difficult and the solution has never been easy for the solver.

  • At a basic level, if two records are duplicated, keep one, and delete one. That's it!
  • At an advanced level, the system allows merging records to form a composite record based on predefined laws. For example, prioritizing new/older data, prioritizing source A over source B, prioritizing longer data...
  • At the "universe" level, the system observes the user's handling and offers flexible handling for each case.

Of course, every option has errors, but with a data set that is too large, you have to make trade-offs and accept a certain percentage of errors. These ratios are often referred to as risk thresholds.

For good systems, it will allow for the flexibility to set this ratio for both duplicate testing and pooling processing. For example, if the coincidence rate is over 80%, the system is allowed to filter and merge by itself, if it is less than 80%, it will send an Email to the Admin for manual processing. After being cleaned, the data is returned to the data regions (Datamart), either classified, or no need to divide anything... but wait for another system to come to the query.

So far, we have taken a look at some of the main features of the "Sustainability" trend, let's take another step towards the "Flexible" trend, a trend that is becoming more and more popular and has many outstanding advantages.

Trend Flexibility

The main trend is the basis of Micro-service deployment, which has three main pillars: mapping, decentralization, and information exchange.

Mapping: Instead of gathering all the data in one place like "Sustainable", the trend of "Flexible" does not require that. The data just saves where it is, just declare the address. There will be a data mapping mechanism with corresponding addresses to retrieve when necessary.

As follows: Customer information is specified name, email and home address in the eCommerce application, phone number is taken from the Delivery application, Facebook account is from the customer care application. When any system needs to retrieve customer information, the system will automatically retrieve it to the corresponding applications for aggregation.

Decentralization: For a business with many applications, regulating which accounts can access which data/applications/systems with what permissions (write/delete/read/change) is very complicated and frequently changes. A conductor solution that coordinates resources and access will take care of this. The most commonly used solution is Kubernetes, an open-source application used to manage applications and services.

Information exchange: After completing the decentralization and mapping, applications will begin to retrieve data and need a coordination mechanism to ensure performance when the amount of data is large and the information is responsive in real time or near real. The most commonly used solution is Kafka, Apache's open-source event streaming platform. The duo of Kubernetes and Kafka is one of the top choices of building architectural infrastructure of the recent trend of flexibility.

This section is going a bit far in terms of technology, especially for non-technical readers, so the article will stop here. If you don't understand much of the above, just remember that "data on CRM is also complicated", and you're done.

Weekly newsletter
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.