Monday, December 28, 2009

DBMS--{HELPFUL FOR BTech CSE STUDENTS & MTech ALSO}

DBMS

Unit – II
Data Base Management Systems – Information as a Resource – Meaning, Types and Component of Database – DBMS and RDBMS – Basic Concepts

Introduction
Data are a vital organisational resource that needs to be managed like other important business assets. Today’s E-business enterprises cannot survive or succeed without quality data about their internal operations and external environment. Hence is the importance of Data Resource Management.
Data Resource Management is a managerial activity that applies information system technologies like database management, data warehousing, and other data management tools to the task of managing an organisations data resource to meet the information needs of their business stakeholders.

Foundation / Fundamental Data Concepts:
Data may be logically organized into Characters, Fields, records, files and databases just as writing can be organized into letters, words, sentences, paragraphs and documents.












Payroll File Benefits File



Employee Record 1 Employee Record 2 Employee Record 3 EmployeeRecord 4



Name E.No Salary Name E.No Salary Name E.No Salary Name E.No salary
Field field field field field field field field field field field field

James 125 15000 Amit 321 12000 jack 435 1000 jones 456 12500





Character: The most basic logical data element is the character, which consists of a single alphabetic, numeric or other symbol. Remember that bit or byte cannot be argued as elementary data element as it represents physical storage provided by the computer hardware. From logical view, a character is most basic element that can be observed and manipulated.

Field: The next higher level of data is the field, or data item. A field consists of a grouping of characters. For e.g., the grouping of alphabetic characters in a person’s name forms a name field.
Specifically a data field represents an attribute (a characteristic or quality) of some entity (object, person, place or event)

Record: Related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is the payroll record for a person, which consists of data fields describing attributes. Such as the person’s, name social security number and rate of pay.
Fixed-length records contain a fixed number of fixed length data fields’ variable length records contain a variable number of fields and field lengths.

File: A group of related records is a data file or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a payroll file or an inventory file; or the type of data they contain, such as document file or a graphical image file.
Files are also classified by their performance, a payroll master file Vs a payroll weekly transaction file.
A history file is a master file retained for backup purposes or for long term historical storage called archival storage.

Database: A database is an integrated collection of logically related records or objects. An object consists of data values describing the attributes of an entity, plus the operations that can be performed upon the data.
A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of secondary storage devices on which they are stored.
For ex. A personnel database consolidates data formerly kept in separate files such as payroll flies, personnel action files, and employee skill files etc.

The Database Management Approach
The database management approach consolidates data records and objects into databases that can be accessed by many different application programs. In addition, an important software package called a Data Base Management System (DBMS). Serves as a software interface between users and databases. This helps users easily access the records in a database.
Thus, database management involves the use of database management software to control how databases are created, interrogated and maintained to provide information needed by end users and their organizations.
For e.g., customer records and other common types of data are needed for several applications in banking Such as check processing, automated teller machines, installment loan accounting etc. These data can be consolidated into a common database, rather than being kept in separate files for each of those applications.

Thus, the database management approach involves three basic activities.

• Updating and maintaining common databases to reflect new business transactions and other events requiring changes to an organisations records.
• Providing information needed for each end user’s application by using application programs that share the data in common databases. This sharing of data is supported by the common data element as it represents physical storage provided by the computer hardware. From logical view, a character is most basic element that can be observed and manipulated.

Field: The next higher level of data is the field, or data item. A field consists of a grouping of characters. For e.g., the grouping of alphabetic characters in a person’s name forms a name field.
Specifically a data field represents an attribute (a characteristic or quality) of some entity (object, person, place or event)

Record: Related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is the payroll record for a person, which consists of data fields describing attributes. Such as the person’s, name social security number and rate of pay.
Fixed-length records contain a fixed number of fixed length data fields’ variable length records contain a variable number of fields and field lengths.

File: A group of related records is a data file or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a payroll file or an inventory file; or the type of data they contain, such as document file or a graphical image file.
Files are also classified by their performance, a payroll master file Vs a payroll weekly transaction file.
A history file is a master file retained for backup purposes or for long term historical storage called archival storage.

Database: A database is an integrated collection of logically related records or objects. An object consists of data values describing the attributes of an entity, plus the operations that can be performed upon the data.
A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of secondary storage devices on which they are stored.
For ex. A personnel database consolidates data formerly kept in separate files such as payroll flies, personnel action files, and employee skill files etc.

The Database Management Approach
The database management approach consolidates data records and objects into databases that can be accessed by many different application programs. In addition, an important software package called a Data Base Management System (DBMS). Serves as a software interface between users and databases. This helps users easily access the records in a database.
Thus, database management involves the use of database management software to control how databases are created, interrogated and maintained to provide information needed by end users and their organizations.
For e.g., customer records and other common types of data are needed for several applications in banking Such as check processing, automated teller machines, installment loan accounting etc. These data can be consolidated into a common database, rather than being kept in separate files for each of those applications.

Thus, the database management approach involves three basic activities.

• Updating and maintaining common databases to reflect new business transactions and other events requiring changes to an organisations records.
• Providing information needed for each end user’s application by using application programs that share the data in common databases. This sharing of data is supported by the common data element as it represents physical storage provided by the computer hardware. From logical view, a character is most basic element that can be observed and manipulated.

Field: The next higher level of data is the field, or data item. A field consists of a grouping of characters. For e.g., the grouping of alphabetic characters in a person’s name forms a name field.
Specifically a data field represents an attribute (a characteristic or quality) of some entity (object, person, place or event)

Record: Related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is the payroll record for a person, which consists of data fields describing attributes. Such as the person’s, name social security number and rate of pay.
Fixed-length records contain a fixed number of fixed length data fields’ variable length records contain a variable number of fields and field lengths.

File: A group of related records is a data file or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a payroll file or an inventory file; or the type of data they contain, such as document file or a graphical image file.
Files are also classified by their performance, a payroll master file Vs a payroll weekly transaction file.
A history file is a master file retained for backup purposes or for long term historical storage called archival storage.

Database: A database is an integrated collection of logically related records or objects. An object consists of data values describing the attributes of an entity, plus the operations that can be performed upon the data.
A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of secondary storage devices on which they are stored.
For ex. A personnel database consolidates data formerly kept in separate files such as payroll flies, personnel action files, and employee skill files etc.

The Database Management Approach
The database management approach consolidates data records and objects into databases that can be accessed by many different application programs. In addition, an important software package called a Data Base Management System (DBMS). Serves as a software interface between users and databases. This helps users easily access the records in a database.
Thus, database management involves the use of database management software to control how databases are created, interrogated and maintained to provide information needed by end users and their organizations.
For e.g., customer records and other common types of data are needed for several applications in banking Such as check processing, automated teller machines, installment loan accounting etc. These data can be consolidated into a common database, rather than being kept in separate files for each of those applications.

Thus, the database management approach involves three basic activities.

• Updating and maintaining common databases to reflect new business transactions and other events requiring changes to an organisations records.
• Providing information needed for each end user’s application by using application programs that share the data in common databases. This sharing of data is supported by the common software interface provided by a database management system package. Thus, end users and programmers do not have to know where or how data are physically stored.
• Providing an inquiry / response and reporting capability through DBMS software so that end users can use Web Browsers and the Internet or Corporate Intranets to easily interrogate databases, generate reports and receive quick responses to their adhoc requests for information.

Using Database Management Software:
A Data Base Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and use of the databases of an organisation and its end users.

The four major uses of a DBMS are as follows:
• Database Development
• Database Interrogation
• Database Maintenance
• Application Development

Database Development:
Database management packages like Microsoft Access or Lotus Approach allow end users to easily develop the databases they need. However large organisations with Client / Server or mainframe – based systems usually place control of enterprise wide database development in the hands of Data Base Administrator (DBAs) and other database specialists. This improves the integrity and security of organisational databases.

Database developers’ use the data definition language (DDL) ito develop and specify the data contents, relationships and structure of each database, and to modify these database specifications when necessary. Such information is cataloged and stored in a database of data definitions and specifications called a data dictionary, which is maintained by DBA.

The data dictionary
- Data dictionary is another tool of database administration.
- A data dictionary is a computer based catalog or directory containing meta data, that is data about data.
- A data dictionary includes a software component to manage meta data (about the structure, data elements, and other characteristics of an organization’s databases)
- For ex it contains the name and descriptions of all types of data records and their interrelationships as well as information outlining requirements for end users and database maintenance and security.
- Data dictionaries can be queried by the DBA to report the status of any aspect of a firm’s meta data. The DBA can then make changes to the definitions of selected data elements.

Data base interrogation:
- End users can use a DBMS for asking information from a database using a query language or report generator.
- They can receive an immediate response in the form of video displays or printed reports. No difficult programming is required.
- The query language helps to get immediate responses to adhoc data requests using few short inquiries.
- The report generator feature allows end users to quickly specify a report format for information required in a report.

SQL Queries:
SQL or structured query language is a query language found in many data base management packages. The basic form of a SQL query is
SELECT …….. FROM ….. WHERE ……

After SELECT the end user has to list the data fields to be retrieved. After FROM the end user has to list the files / tables from which the data has to be retrieved. After WHERE the end user specifies the conditions that limit the search to only those records in which end user is interested.

Ex. SELECT Emp. Name From Emp.File WHERE DESIGNATION = MANAGER

Graphical & Natural queries:
Many end users have difficulty in correctly phrasing SQL and other database language queries. So most end user database management packages offer GUI (Graphical User Interface) point and click methods, which are easier to use and are translated by the software into SQL commands.
There are also packages available that use natural language query statement similar to conversational English (or other languages).

Data base maintenance:
- The databases of an organisation have to be updated continuously and changes have to be recorded promptly.
- Transaction processing programs and other end user application packages the support of this data base maintenance process.
- End users and information specialists can also employ various utilities provided by DBMS for database maintenance.

Application development:
- DBMS packages play a major role in application development.
- End users, systems analysts and other application developers can use 4GL programming language and built-in software development tools provided by many DBMS packages to develop custom application programs.
- For e.g., End users can use a DBMS and easily develop data entry screens forms, reports or web pages of a business application
- Using DBMS, the application programmers can include data manipulation language (DML) statements in their programs to perform necessary data-handling activities.

TYPES OF DATABASES
1. Operational Databases:
These databases store detailed data needed to support the business processes and operations of E-business enterprises. They are also called Subject Area Database (SADB), transaction databases and production databases. E.g., Customer database, Inventory database etc.

2. Distributed Databases:
Many organisation replicate and distribute copies or parts of databases to network servers at a variety of sites. These distributed databases can reside on network servers on the World Wide Web, on corporate intranets or extranets or on other company networks. Distributed databases may be copies of operational or analytical databases, hypermedia or discussion databases or any other type of database. Replication and distribution of databases is done to improve database performance and security.

3. External Database:
Access to a wealth of information from external databases is available for a fee from commercial online services and with / without charge from many sources on the Internet especially WWW. Data are available in the form of statistics on economic and demographic activity from statistical databanks. It is possible to download abstracts or complete copies of hundreds of newspapers, magazines, research papers, periodicals etc., from bibliographic and full text databases.

Data Warehouse and Data Mining:
A Data Warehouse stores data that have been extracted from various operational, external and other databases of an organisation. It is a central source of data that have been cleaned transformed, and cataloged so they can be used by managers and other business professionals for data mining. On line analytical processing and other firms of business analysis, market research and decision support.

Data warehouse can be subdivided into data marts, which holds subsets of data from the warehouse that focus on specific aspects of a company such as department or a business process.

The databases are captured, cleaned and transformed into data that can be better used for analysis. This acquisition process might include activities like consolidating data from several sources, filtering out wanted data, correcting incorrect data, converting data to new data elements and aggregating data into new data subsets.

This data is then stored in the enterprise data warehouse, from where it can be moved into data marts or to an analytical data store that holds data in a more useful form for certain types of analysis. Meta data that defines the data in the data warehouse is stored in a meta data repository and cataloged by a metadata directory. Finally a variety of analytical software tools can be provided to query, report, mine and analyse data for delivery to business end users through Internet and Intranet Web Systems or Other Networks.

Data Mining is a major use of data warehouse databases. In data mining the data in a data warehouse are analysed to reveal hidden patterns and trends in historical business activity. This can be used to help managers make decisions about strategic changes in business operations to gain competitive advantages in market place.

Data mining can discover new correlations patterns, and trends in vast amounts of business data, stored in data warehouses. Data mining software uses advanced pattern recognition algorithms, as well as a variety of mathematical and statistical techniques to sift through mountains of data to extract previously unknown strategic business information.

Hypermedia Databases on the Web:
A website stores information in a hypermedia database consisting of hyperlinked pages of multimedia (text, graphic and photographic images, video dips, audio segments and soon).

The Web server software helps to access and transfer the web pages to the user. Thus web server software acts as a database management system to manage the transfer of hypermedia files for downloading.

Data Resource Management:
Data Resource Management includes database administration, data planning and data administration activities.

Database Administration:
Database administration is an important data resource management function responsible for the proper use of database management technology. Database administration includes responsibility.
• For developing and maintaining the organisations data dictionary.
• Designing and monitoring the performance of databases.
• Enforcing standards for database use and security.

Database administrators and analysts work with systems developers and end users to provide their enterprise to major systems development projects.

Data Planning:
Is a corporate planning and analysis function that focuses on data resource management? It includes the responsibility for developing an overall that architecture for the firm’s data resources that ties in with the firm’s strategic mission and plans, and their objectives and processes of its business units. Data planning is done by organisations that have made a formal commitment to long-range planning for the strategic use and management of their data resources.

Data Administration is another vital data resource management function. It involves administering the collection, storage and dissemination of all types of data in such a way that data become a standardised resource available to all end users in organisation. The focus of data administration is the support of an organisations business processes and strategic business objectives Data Administration may also include responsibility for developing policies and settings standards for corporate database design, processing and security arrangements.

Benefits of Data Resource Management:
• Database management reduces the duplication of data and integrates data so that they can be accessed by multiple programs and users.
• DBM software is not dependent on the format of the data or the type of secondary storage hardware used.
• Business professionals can use inquiry / response and reporting capabilities to easily obtain information they need from databases data warehouses or data marts without complex programming.
• Software development is simplified because programs are not dependent on either the logical format of data or their physical storage location.
• Finally, the integrity and security of data is increased, since access to data and modification of data are controlled by data management software, data dictionaries, and a data administration function.

Challenges of Data Resource Management:
• Developing large databases of complex data types and installing data warehouses can be difficult and expensive.
• More hardware capability is required.
• Longer processing times may result from additional data and software complexity.
• Finally, if an organisation relies on centralized databases, its vulnerability to errors, fraud and failures is increased.

If distributed database approach is used, inconsistency of data can arise.

Database Management:
In all information systems, data resources must be organized and structured in some logical manner so that they can be accessed easily, processed efficiently retrieved quickly and managed effectively. Thus, data structures and access methods ranging from simple to complex have been devised to efficiently organize and access data stored by information systems.

Database Structures:
The relationships among the individual records stored in databases are based on one of several logical data structures or model. Database management system packages are designed to use a specific data structure to provide end users with quick, easy access to information stored in databases. Five fundamental database structures are
• Hierarchical
• Network
• Relational
• Object-oriented
• Multi dimensional models.

Hierarchical Structure:
In this structure in which the relationships between records form a hierarchy or tree like structure.

In the traditional hierarchical model, all records are dependent and arranged in multilevel structures consisting of one root record and any number of subordinate levels. Thus, all of the relationships among records are one-to-many. Since each data element is related to only one element above it. The data element or record at the highest level of the hierarchy is called the root element. Any data element can be accessed by moving progressively downward from a root and along the branches of the tree until the desired record is located.
Network Structure:
• The network structure can represents more complex logical relationships and is still used by some mainframe DBMS packages.
• It allows many-to-many relationships among records (i.e.,) Network model can access a data element by following one of several paths, because any data element or record can be related to any number of other data elements.

Relational Structure:
• The relational model has become the most popular of the three database structures.
• It is used by most microcomputer DBMS packages, as well as by most midrange and mainframe systems.

• In relational mode, all data elements within the database are viewed as being stored in the form of simple tables.
• Database Management System packages based on the relational model can link data elements form various tables to provide information to users by using some common department number field to link tables.

Object Oriented Structure:
• The object oriented database model is considered to be one of the key technologies of a new generation of multimedia web-based applications.
• An object consist of data values describing the attributes of an entity, plus the operations that can be performed upon the data.
• This encapsulation capability allows the object-oriented model to better handle more complex types of data (graphics, pictures, voice, text) than other database structures.
• The object-oriented model also supports inheritance (i.e.,) new objects can be automatically created by replicating some or all of the characteristics of one or more parent objects.
• In the given example, the checking and savings account objects can both inherit the common attributes and operations of the parent bank account object.
• Object-oriented technology is used in CAD and many other applications.
• Object-oriented technology allows designers to develop product designs, store them as objects in an object-oriented database and replicate and modify them to create new product designs.

Multi-dimensional structure:
• The multidimensional structure is a variation of the relational model that uses multidimensional structures to organize data and express the relationship between data.
• These structures can be visualized as cubes of data and cubes within cubes of data.
• Each side of the cube is considered as dimension of data.
• Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions.
For e.g.: a single cell may contain the total sales for a production in a region for a specific sales channel in a single month.
• A major benefit of multidimensional databases is that they are a compact and easy to understand way to visualize and manipulate data elements that have many interrelationships.
• So, multidimensional databases have become the most popular database structure for the analytical databases that support on line analytical processing (OLAP) applications, in which fast answers to complex business queries are expected.

Evaluation of database structures:
• The hierarchical data structure was a natural model for the database used for the structured, routine types of transaction processing.
• Data for these operations can easily be represented by groups of records in hierarchical relationship.
• However, there are many cases where information is needed about records that do not have hierarchical relationships. For ex in some organisations, employees from more than one department can work for more than one project, which cannot be represented in hierarchical mode.
• A network data structure could easily handle many to many relationships.
• It is thus more flexible than the hierarchical structure in support of database for many types of business operations.
• However, like the hierarchical structure, because its relationships must be specified in advance, the network model cannot easily handle adhoc requests for information.
• Relational databases, on the other hand, an end user to easily receive information in response to adhoc requests.
• It is because not all of the relationships between the data elements in a relationally organized database need to be specified when the database is created.
• Data base management software (such as oracle 8, DB2, Access and Approach ) creates new tables of data relationships using parts of the data from several tables.
• Thus, relational databases are easier for programmers to work with and easier to maintain than the hierarchical and network models.
• The major limitation of relational model is that relational DBMS cannot process large amounts of business transactions as quickly and efficiently as those based on hierarchical and network models or on object oriented models.
• Object oriented databases are increasingly used in managing the hypermedia databases and java applets on WWW and corporate intranets and extranets.
• OODBMS can easily manage the access and storage of objects such as document and graphic images, video clips, audio segments and other subsets of web pages.

Accessing databases:
Efficient access to data is important. In database maintenance, records or objects have to be continually added, deleted or updated to reflect business transactions. Data must also be accessed rapidly. So information can be produced in response to end user request.

Key fields:
All data records usually contain one or more identification fields or keys that identify the record so it can be located. For e.g.: social security number of a person is often used as primary key field that uniquely identifies the data records of individuals in student, employee and customer files and databases. Other methods also identify and link data records stored in several different database files.

For e.g.: hierarchical and network databases use pointer fields. These are fields within a record that indicate (point to) the location of another record that is related to it in the same file or in another file. Hierarchical and network models use this method to link records so they can retrieve information from several different database files.

Relational database management packages use primary keys to link records. Each table (file) in a relational database must contain a primary key. This field (or fields) uniquely identifies each record in a file and must also be found in other related files.

Sequential access:
One method to access data is by sequential access.
• This method uses a sequential organisation, in which records are physically stored in a specified order according to a key field in each record.
• For e.g.: payroll records would be placed in a payroll file in a numerical order based on employee social security numbers.
• Sequential access is fast and efficient when dealing with large volumes of data that need to be processed periodically.
• However, it requires that all new transactions be sorted into proper sequence for sequential access processing.
• But this method is too slow to handle applications requiring immediate updating or responses.

Direct access:
• In direct access method, records need not be arranged in any particular sequence on storage media.
• However, the computer must keep track of the storage location of each record using a variety of direct organisation methods so that data can be retrieved when needed.
• New transactions data do not have to be sorted, and processing that requires immediate response or updating is easily handled.

Given below are 3 widely used methods to accomplish such direct access processing.

1. Key transformation:
This method performs an arithmetic computation on a key field of record (ex: product number or social security number) and uses the number that results from that calculation as an address to store and access that record.



2. Index method:
Another direct access method used to store and locate records involves the use of an index of record keys and related storages addresses. A new data record is stored at the next available location, and its key and address are placed in an index. The computer uses this index whenever it must access a record.

3. Indexed sequential access method:
Is ISAM records are stored in a sequential order on a magnetic disk or other direct access storage device based on the key field of each record. In addition, each database contains an index that references one or more key fields of each data records to its storage location address. Thus, an individual record can be directly located by using its key fields to search and locate its address in the data base index.

• If a few records must be processed quickly, the index is used to directly access the record needed.
• But when large number of records must be processed periodically, the sequential organisation method must be used.

Database development:
• Developing small, personal database is relatively easy using microcomputer database management packages.
• However, developing a large database of complex data types can be a complex task.
• In many companies, developing and managing large corporate database are the primary responsibility of the database administrator and database design analysts. They work with end users and systems analysts to model business processes and the data they require. Then they determine
a. What definitions should be included in the data base and
b. What structure or relationships should exists among the data elements?

Data planning and data design:
Database development may start with a top – down data planning process. Database administrators and designers work with corporate and end users management to develop an enterprise model. That defines basic business process of the enterprise.

Then they define the information needs of end users in a business process, such as purchasing / receiving process etc.

Next, end users must identify the key data elements that are needed to perform their specific business activities. This frequently involves developing Entity Relationship Diagrams (ERDs) that model the relationship among the many entities involved in business processes.



1. Data planning
Develops a model of business process

Enterprise model of business process with documentation

2. Requirements specification
Define information needs of end users in a business process.

Descriptions of user’s needs may be represented in natural language or using the tools of a particular design methodology.
3. Conceptual design
Expresses all information requirements in the form of a high level mode.

Conceptual data models often expressed as entity relationship models.

4. Logical design
Translates the conceptual models into the data model of DBMS.

5. Physical design
Determines the data storage structures and access methods

Logical data models e.g.: relational, network, hierarchical, multidimensional or object – oriented models.

Physical data models storage representations and access methods

Data modeling process allows for many user views where the relationships between data elements are identified. Each data model defines the logical relationships among the data elements needed to support basic business process. For ex: can a supplier supply more than one type of product? Can a customer have more than one type of account with us?

Answering such questions will identify data relationships that have to be represented in a data model that supports a business process. These data models then serve as logical frameworks (Called schemes or sub schemes) on which to base the physical design of databases and the development of application programs to support the business process of the organisation.

• A scheme is a overall logical view of the relationships among the data elements in a database.
• A sub scheme is a logical view of data relationships needed to support specific end user application programs that will access that database.

Data models represent logical views of the data and relationships of the database. Physical database design takes a physical view of the data (also called internal view) that describes how data are to be physically stored and accessed on the storage devices of a computer system.


(End of Unit II)

No comments:

Post a Comment