Data is the representation of facts as text, numbers, graphics, images, sound or video. Information is data in context. Without context, data is meaningless. We create meaningful information by interpreting the context around data. Context includes the business meaning of data elements, the format in which data is presented, the timeframe represented by the data and the relevance of the data to a given usage.
Knowledge
Knowledge is information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience. It may include assumptions and theories about causes. Knowledge may be explicit – what an enterprise or community accepts as true – or tacit, inside the heads of individuals. We gain in knowledge when we understand the significance of information.
Data -> Information -> Knowledge
Meta-data
Meta-data is data that provides information about data. It is to data what data is to real-life. Data reflects real life transactions, events, objects, relationships, etc. Meta-data reflects data transactions, events, objects, relationships, etc.
A good analogy for meta-data is the card catalog in a library. The card catalog identifies what books are stored in the library and where they are located within the building. Users can search for books by subject area, author or title. Without the card catalog resource, finding books in the library would be difficult, time consuming and frustrating.
Is data truth?
Not necessarily! Data can be inaccurate, incomplete, out of date, and misunderstood. On a practical level, truth is information of the highest quality – data that is available, relevant, complete, accurate, consistent, timely, usable, meaningful and understood. Organizations that recognize the value of data can take concrete, proactive steps to increase the quality of data and information.
What is DRM?
. . . the process of managing, controlling and protecting an organisation’s data asset, while supporting organisational goals. This makes the ‘data asset’ sound very well demarcated and defined. Is this generally the case?
Data resources exist throughout the “modern” organisation They are considered to be assets as they have value.
How might we measure the value of data?
- Both tangible and intangible
- Both current and future
Also, the data resource may have value added to it through processes such as
- Relating
- Discovering/exploring
- Modelling
DRM may then be seen to be about maximising the value associated with the organisation’s data resources This may take many forms!
Some DRM issues to consider…
- All organisations generate data
- All data is useful in some way or another
- Re-purposing of data is a must, not something to talk about
- Data has many perspectives and contexts
Everyone has a view…
- “It’s a cost and should be outsourced…”
- “We have all this data but we get nothing from it…”
- “This data is incorrect…”
- “The reports take 4 hours to generate…”
Data exist in heterogeneous formats…
- Agreement on common formats can be problematic
- Conversion between formats can be lossy
- But must be able to be made!
…and in multiple locations
- Data islands
- Redundant data
An issue caused by heterogeneous data formats
NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency’s team used metric units. Lockheed Martin provided navigation commands for thrusters in English units although NASA uses the metric system.
The thrusters on the spacecraft, which were intended to control its rate of rotation, were controlled by a computer that underestimated the effect of the thrusters by a factor of 4.45. The software was working in pounds force, while the spacecraft expected figures in newtons; 1 pound force equals approximately 4.45 newtons.
Heterogeneous data sources
Data source | Relational, flat file, web… |
Data types | Salaries stored as integer, text? |
Units | Salaries stored per week, per month? |
Concepts | Are retired employees still ‘employees’? |
Data may not conform to fixed schema | Semi-structured information, e.g. spreadsheet |
Relational database
A database, structured to recognise relations between stored items of information, e.g., the information is presented to the user through a number of tables and there are relations between the data.
When data is saved as a flat file, then this is more or less what it is. Everything is in the file. There is no index, no relations between the data, so in order to check where something is, you need to read the whole file. And this, of course takes too much time and gives very little information.
Example of structured and semi-structured data
If you have a semi-structured model, then all records have a unique ID, and are referenced with pointers to their location on the disk. So if you want to do a search in your database, then you need to go through all the pointers and this is not efficient because it takes too much time. That’s why relational databases are so popular.
Some DRM issues to consider…
The data resource is used by different users for varying uses
- CEO
- CFO
- End User
- Manager
- Data Manager
- Client
…who will have different (and often conflicting) requirements of the data resource.
The data resource needs to be readily available
- Unavailability of the data resource may be costly
- Think about the foreign exchange dealer who cannot access current prices and misses out on 5 minutes trading
- Availability may be outside the organisation’s control
…it is also important to realise that availability does not always equal fulfilment of user requirements e.g., Relevance.
Repurposing/recycling of data:
- To maximise the value of the data resource it must be able to be:
- Reused
- Repurposed
- This has implications for data modelling and design.
DRM becomes even more important in the context of the modern organisation:
- IT (and the data resource) are a potential source of competitive/strategic advantage
- The data resource needs to be managed so that changes in user and organisational needs for data can be accommodated
- Organisations are required to be reactive to external forces (economic, political, social, changed legislation. E.g. SOX).
The data life cycle
Data exists, and must be managed beyond its initial creation. The way data is managed will depend on what stage of its “life” it has reached.
A view of the data life cycle is below:
Plan -> Specify -> Enable ->
Create&Acquire -> Maintain&Use ->
Archive&Retrieve -> Purge ->
All data lifecycle stages have associated costs and risks, but only the “use” stage adds business value. The nature, use, and management of the data will determine its life cycle.
Data Life Cycle vs SDLC
SDLC is not the same as the data lifecycle. The SDLC describes the stages of a project, while the data life cycle describes the processes performed to manage data assets. But: the two lifecycles are closely related, because data planning, specification and enablement activities are integral parts of the SDLC. Other SDLC activities are operational or supervisory in nature.
Data disposal???
“…
West Australian government agencies are too laissez faire with the disposal of old computers, according to a report by the WA Auditor General.
The Auditor General’s office bought 19 second-hand PCs which looked to be ex-government. Of those, 10 proved to be so. From four of the 10 hard drives, the team was able to retrieve information, some of it sensitive, including tax file numbers, salary information, superannuation information, home addresses, dates of birth, photos, personal e-mails, letters, resumes, performance reviews, and contact details.
…” ZDNet Australia (1st April 2008)
http://www.zdnet.com/article/wa-govt-slammed-for-bogus-data-disposal-policy/
Retrieved February 5th, 2019
Impact of the life cycle on data management
The data lifecycle must be managed to ensure accuracy and trustworthiness of the data:
- Processes must be implemented to support this lifecycle to ensure the continued quality of the data resource
- Alignment between these processes and the strategic objectives of the organisation is a must at both tactical and operational levels.
Accuracy and trustworthiness of the data
ICAC investigation into alteration of student records
-> The Independent Commission Against Corruption (ICAC) reported on its 2001 investigation into the unauthorised use and alteration of student records
-> ICAC found that a combination of technical deficiencies and procedural weaknesses in the UTS student records system led to the corrupt conduct it was investigating
http://icac.nsw.gov.au/documents/about-corruption/corruption-matters-newsletter/1274-corruption-matters-issue-no-21-september-2002/file Retrieved February 5th, 2019.
Corruption and Crime Commission of Western Australia
Report of an Inquiry into Unauthorised Access and Disclosure of Confidential Personal Information Held on the Electronic Databases of Public Sector Agencies
In September 2005 it undertook an inquiry to look at aspects of:
- the legislative and policy framework for dealing with unauthorised access and disclosure of confidential personal information;
- arrangements for the selection and supervision of staff with access to personal information of a sensitive or confidential nature; and
- the awareness of staff of their responsibilities to safeguard confidential personal information.
The Commission observed:
“The exact extent of the problem of misuse of computer systems through unauthorised access and disclosure is not known and it is widely suspected that a great deal goes undetected
The anecdotal advice of those working in this area suggests that unauthorised access and disclosure occurs a great deal more than is ever officially reported or acted upon”
ACCC Investigates IELTS Test corruption at Curtin University
A former Curtin University employee is among a group of nine people whom the Corruption and Crime Commission has charged with 59 bribery offences in April 2011.
The charges relate to the manipulation of the International English Language Testing System (IELTS) conducted at Curtin University’s English Language Centre over a 12 month period to June 2010.
https://www.ccc.wa.gov.au/sites/default/files/IELTS%20bribery%20charges.pdf
Retrieved February 5th, 2019.
The CCC investigation will also determine whether the Curtin English Language Centre had policies in place to detect misconduct and examine whether the IELTS had been compromised at other testing centres.
”Only in knowing precisely how the manipulation of the IELTS testing was carried out and how it was concealed for so long will it be possible to properly examine those systemic issues” (CCC Counsel).
The “human element” of DRM
Much of the organisational data resource resides in so-called “organisational memory”
- The memory of organisation
- Source of the tacit component of knowledge
- May be mined for rules and knowledge bases – real source of business rules
- It is the most evanescent of corporate assets (goes when people go)
- Yet most vital – minutes, reports, documentation.
The “Intelligent Enterprise”
The intelligent enterprise makes use of its data resource for competitive/strategic advantage
- Business Intelligence
Most general definition is that of strategies that help managers in decision making by presenting/crunching/summarising data:
- Enterprising Resource Planning suites
- Customer Relationship Management (CRM)
Business intelligence tools
Originally Decision Support Systems (DSS) and more recently, Executive Information Systems (EIS) -> initially conceived for upper management
Now no clear distinction:
- Management roles less defined – all levels need access to corporate data
- Move to client/server and Internet platform
- Help managers make decisions about problems that may be quickly changing
Business intelligence (Gartner)
Gartner classify BI projects into four different styles:
- Departmental
- Enterprise
- B2B (Business to business)
- B2C (Business to consumer)
However, it is not just a matter of buying the latest Business Intelligence Tool (BIT) and “plugging it in” Intelligent use of the data resource must be planned and integrated into the overall DRM strategy
- Possible disruption to existing systems (see next)
- Costly to implement
- Often difficult to justify
- Must be able to demonstrate benefit outweighs cost
Business-driven methodology and project management Clear vision and planning
Committed management support & sponsorship
Data management and quality
Mapping solutions to user requirements
Performance considerations of the BI system
Robust and expandable framework
(Tarun K Vodapalli (2009). “Critical Success Factors of BI Implementation)
Data resources management:
- Must involve all parts of IT discipline
- Must be reflexive & holistic
- Must be continuous
- Must be agile and adaptive within the strategy of the organisation
- Must involve people
- Must be driven by organisational need, rather than IT
What about the Real World?
- Badly managed
- Poorly understood as a resource
- Bottom up perspective – we have all this data now what do we do ?
- Disparate data sources, different systems all storing the same data
- What is the value of the data and how do you “sell” it at different levels within organisations?
Summary
Data resources management is about conservation and exploitation of all the resources that have come into an organisation:
- Make sure that data isn’t lost through neglect, error or throwing aside
- Recycle and repurpose existing data where possible – make use of the structure and order that has been created
- Design new tasks to value-add or recontextualise existing data
- DRM must coexist with and extend existing systems