Question : What does ETL stand for ?
Wikipedia has an entry on ETL which is worth a look.
ETL stands for Extract, Transform and Load.
• Extract means Extracting data from Data Sources.
• Transform covers many tasks, including –
o Selection of the data of interest
o Validation and clean-up of the selected data
o Changing the format and content of the data
o Loading into the designated Target.
In practice, there are three options for implementing ETL:-
• Develop bespoke SQL
• Use a commercial package, such as Informatica or Microsoft’s Integration Services
• Some combination of these two.
For example, developing basic SQL to clarify the Requirements and
then looking for a commercial product to meet the Requirements.
Question : How do I establish a Strategy for Data Quality as an Enterprise Issue ?
A successful Strategy must include both organization and technical aspects.
Typical Organization aspects are :-
• Commitment from senior management
• Establishing the slogan “Data Quality is an Enterprise Issue” as a top-down edict.
• Identification of the ‘Top 20’ Applications and Data Owners across the Enterprise
• Agree sign-off procedures with Data Owners and Users
• Establish Key Quality Indicators (‘KQIs), for example Duplicate Customers records
• Agree target Data Quality percentage
• Define KQI Reports and dashboards
• Develop SQL to measure KQIs
• Define procedures to improve KQIs
Question : How do I handle multiple types of Database, such as SQL Server and Oracle ?
The key to handling multiple types of Database is to thing of them in terms of an Integrated Data Platforms,
where all types of data are presented in a common fashion.
This then defines the logical requirement.
There is a then a number of options to physically meet this logical requirement.
The Enterprise-level option is to use an appropriate commercial product, such as Informatica
Question : How do I obtain a Single View of the Customer ?
This requires a method of matching Customers, de-duplication and the maintenance of a Customer Master Index,
(‘CMI’) supports a Single View of a Customer.
When there are many sources of similar data, such as Customers, there are frequently duplicate records.
For example, in the US, John Doe could be also called Jon Doe, Johnny Doe, Mr.J..Doe and so on.
In the UK, Joe Bloggs could also be called Joseph Bloggs, Joey Bloggs, Mr.J.Bloggs and so on.
The rules for recognizing and resolving this kind of problem has led to the development of software for De-duplicating records.
This process is informally referred to as ‘de-duping’, especially by people who do a great deal of it.
Best Practice is to look for a commercial product, rather than to write your own bespoke software because it usually takes longer
than expected and commercial products can be quite cheap.
This page on the Database Answers Web Site is an excellent starting-point for reviewing "De-duping"
Question : What is Data Lineage ?
Data Lineage can be defined as being able to the trace the derivation of all items of data that
appear in any important Performance Reports and Management Information.
That includes :-
• Who owns the original source data
• What validation and transformations are applied to the data in its life cycle
Question : How do I verify the derivation of all data (the Data Lineage’) in the Reports ?
This requires the use of an Information Catalogue that will record the source, processing steps and final delivery in a Report.
The transformations in the processing Steps must be specified in both the processing language, for example SQL, and in plain,
unambiguous English so that the Data Owner can sign-off on the Data Lineage.
Question : What is Master Data Management (MDM) ?
One of the major components in Master Data Management (‘MDM’) is Customers.
MDM can be defined a ‘Providing a Single View of the Things of Importance within an organisation’
Master Data Management applies the same principles to all the ‘Things of Interest’ in an organisation.
This can typically include Employees, Products and Suppliers.
We have discussed ‘A Single View of the Customer’ and MDM involves the same kind of operations as a CMI.
That is, identification and removal of duplicates, and putting in place to eliminate duplicates in any new data loaded into
There is a wide choice of software vendors offering MDM products.
De-duplication and Address validation is a niche market in this area.
On the Database Answers Web Site, there is a Tutorial on Getting Started in MDM
There is a sister Web Site devoted to the topic of MDM-As-a-Service
Click to go back.