Clean up data upstream of a CRM


How to aggregate data from different sources and prepare them for insertion into a CRM?

An internationally active company that has used multiple CRMs so far wants to unify its customer information so that it no longer uses just one saleforce CRM. The different data formats require cleaning and standardization in line with the new CRM.

The mission is to clean up and standardize the information in the new format. Where applicable, a status will indicate whether the information requires human verification. All the data will be returned in a CSV file.


Raw data input
There are many difficulties:
- the way in which the information was entered differs according to the sources;
- the fields used are not identical between the sources;
- international activity complicates the standardization of certain values (eg the telephone).

(the names and contact details are fictitious for confidentiality purposes)

Structured data output
Following various treatments, it was possible to recover more than 90% of the customer base and to deliver a file ready to be integrated directly into the CRM.

(the names and contact details are fictitious for confidentiality purposes)



Classification of surnames and first names

Separation of first and last names and gender recognition via first name


Address standardization and control

Assignment of missing data and restitution based on the standard S42


Standardization and control of phone numbers

Allocation of prefixes by country and standardization of values


Checking the validity of emails

Checking the validity of addresses and updating the newsletter database

Return on investment

Successful suite of complex operations

Very short lead times

Preservation of an essential asset: the customer database