1、A HISTORICAL PERSPECTIVEFrom the earliest days of computers, storing and manipulating data have been a major application focus. The first general-purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and was called the Integrated Data Store. It formed the basis for the
2、network data model, which was standardized by the Conference on Data Systems Languages (CODASYL) and strongly influenced database systems through the 1960s. Bachman was the first recipient of ACMs Turing Award (the computer science equivalent of a Nobel prize) for work in the database area; he recei
3、ved the award in 1973. In the late 1960s, IBM developed the Information Management System (IMS) DBMS, used even today in many major installations. IMS formed the basis for an alternative data representation framework called the hierarchical data model. The SABRE system for making airline reservation
4、s was jointly developed by American Airlines and IBM around the same time, and it allowed several people to access the same data through computer network. Interestingly, today the same SABRE system is used to power popular Web-based travel services such as Travelocity!In 1970, Edgar Codd, at IBMs Sa
5、n Jose Research Laboratory, proposed a new data representation framework called the relational data model. This proved to be a watershed in the development of database systems: it sparked rapid development of several DBMSs based on the relational model, along with a rich body of theoretical results
6、that placed the field on a firm foundation. Codd won the 1981 Turing Award for his seminal work. Database systems matured as an academic discipline, and the popularity of relational DBMSs changed the commercial landscape. Their benefits were widely recognized, and the use of DBMSs for managing corpo
7、rate data became standard practice.In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database systems continued to gain widespread use. The SQL query language for relational databases, developed as part of IBMs System R project, is now the standard query
8、 language. SQL was standardized in the late 1980s, and the current standard, SQL-92, was adopted by the American National Standards Institute (ANSI) and International Standards Organization (ISO). Arguably, the most widely used form of concurrent programming is the concurrent execution of database p
9、rograms (called transactions). Users write programs as if they are to be run by themselves, and the responsibility for running them concurrently is given to the DBMS. James Gray won the 1999 Turing award for his contributions to the field of transaction management in a DBMS.In the late 1980s and the
10、 1990s, advances have been made in many areas of database systems. Considerable research has been carried out into more powerful query languages and richer data models, and there has been a big emphasis on supporting complex analysis of data from all parts of an enterprise. Several vendors (e.g., IB
11、Ms DB2, Oracle 8, Informix UDS) have extended their systems with the ability to store new data types such as images and text, and with the ability to ask more complex queries. Specialized systems have been developed by numerous vendors for creating data warehouses, consolidating data from several da
12、tabases, and for carrying out specialized analysis.An interesting phenomenon is the emergence of several enterprise resource planning(ERP) and management resource planning (MRP) packages, which add a substantial layer of application-oriented features on top of a DBMS. Widely used packages include sy
13、stems from Baan, Oracle, PeopleSoft, SAP, and Siebel. These packages identify a set of common tasks (e.g., inventory management, human resources planning, financial analysis) encountered by a large number of organizations and provide a general application layer to carry out these tasks. The data is
14、stored in a relational DBMS, and the application layer can be customized to different companies, leading to lower Introduction to Database Systems overall costs for the companies, compared to the cost of building the application layer from scratch. Most significantly, perhaps, DBMSs have entered the
15、 Internet Age. While the first generation of Web sites stored their data exclusively in operating systems files, the use of a DBMS to store data that is accessed through a Web browser is becoming widespread. Queries are generated through Web-accessible forms and answers are formatted using a markup
16、language such as HTML, in order to be easily displayed in a browser. All the database vendors are adding features to their DBMS aimed at making it more suitable for deployment over the Internet. Database management continues to gain importance as more and more data is brought on-line, and made ever
17、more accessible through computer networking. Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital libraries, a host of scientific projects such as the human genome mapping effort and NASAs Earth Observation System project, and the desire of com
18、panies to consolidate their decision-making processes and mine their data repositories for useful information about their businesses. Commercially, database manage- ment systems represent one of the largest and most vigorous market segments. Thusthes- tudy of database systems could prove to be richl
19、y rewarding in more ways than one!INTRODUCTION TO PHYSICAL DATABASE DESIGNLike all other aspects of database design, physical design must be guided by the nature of the data and its intended use. In particular, it is important to understand the typical workload that the database must support; the wo
20、rkload consists of a mix of queries and updates. Users also have certain requirements about how fast certain queries or updates must run or how many transactions must be processed per second. The workload description and users performance requirements are the basis on which a number of decisions hav
21、e to be made during physical database design.To create a good physical database design and to tune the system for performance in response to evolving user requirements, the designer needs to understand the workings of a DBMS, especially the indexing and query processing techniques supported by the D
22、BMS. If the database is expected to be accessed concurrently by many users, or is a distributed database, the task becomes more complicated, and other features of a DBMS come into play. DATABASE WORKLOADSThe key to good physical design is arriving at an accurate description of the expected workload.
23、 A workload description includes the following elements: 1. A list of queries and their frequencies, as a fraction of all queries and updates. 2. A list of updates and their frequencies. 3. Performance goals for each type of query and update.For each query in the workload, we must identify:Which rel
24、ations are accessed.Which attributes are retained (in the SELECT clause).Which attributes have selection or join conditions expressed on them (in the WHERE clause) and how selective these conditions are likely to be. Similarly, for each update in the workload, we must identify:Which attributes have
25、selection or join conditions expressed on them (in the WHERE clause) and how selective these conditions are likely to be.The type of update (INSERT, DELETE, or UPDATE) and the updated relation.For UPDATE commands, the fields that are modified by the update.Remember that queries and updates typically
26、 have parameters, for example, a debit or credit operation involves a particular account number. The values of these parameters determine selectivity of selection and join conditions.Updates have a query component that is used to find the target tuples. This component can benefit from a good physica
27、l design and the presence of indexes. On the other hand, updates typically require additional work to maintain indexes on the attributes that they modify. Thus, while queries can only benefit from the presence of an index, an index may either speed up or slow down a given update. Designers should ke
28、ep this trade-offer in mind when creating indexes.NEED FOR DATABASE TUNINGAccurate, detailed workload information may be hard to come by while doing the initial design of the system. Consequently, tuning a database after it has been designed and deployed is importantwe must refine the initial design
29、 in the light of actual usage patterns to obtain the best possible performance.The distinction between database design and database tuning is somewhat arbitrary.We could consider the design process to be over once an initial conceptual schema is designed and a set of indexing and clustering decision
30、s is made. Any subsequent changes to the conceptual schema or the indexes, say, would then be regarded as a tuning activity. Alternatively, we could consider some refinement of the conceptual schema (and physical design decisions affected by this refinement) to be part of the physical design process
31、.Where we draw the line between design and tuning is not very important.OVERVIEW OF DATABASE TUNINGAfter the initial phase of database design, actual use of the database provides a valuable source of detailed information that can be used to refine the initial design. Many of the original assumptions
32、 about the expected workload can be replaced by observed usage patterns; in general, some of the initial workload specification will be validated, and some of it will turn out to be wrong. Initial guesses about the size of data can be replaced with actual statistics from the system catalogs (althoug
33、h this information will keep changing as the system evolves). Careful monitoring of queries can reveal unexpected problems; for example, the optimizer may not be using some indexes as intended to produce good plans.Continued database tuning is important to get the best possible performance. TUNING T
34、HE CONCEPTUAL SCHEMAIn the course of database design, we may realize that our current choice of relation schemas does not enable us meet our performance objectives for the given workload with any (feasible) set of physical design choices. If so, we may have to redesign our conceptual schema (and re-
35、examine physical design decisions that are affected by the changes that we make).We may realize that a redesign is necessary during the initial design process or later, after the system has been in use for a while. Once a database has been designed and populated with data, changing the conceptual sc
36、hema requires a significant effort in terms of mapping the contents of relations that are affected. Nonetheless, it may sometimes be necessary to revise the conceptual schema in light of experience with the system. We now consider the issues involved in conceptual schema (re)design from the point of
37、 view of performance.Several options must be considered while tuning the conceptual schema:We may decide to settle for a 3NF design instead of a BCNF design.If there are two ways to decompose a given schema into 3NF or BCNF, our choice should be guided by the workload.Sometimes we might decide to fu
38、rther decompose a relation that is already in BCNF.In other situations we might denormalize. That is, we might choose to replace a collection of relations obtained by a decomposition from a larger relation with the original (larger) relation, even though it suffers from some redundancy problems. Alt
39、ernatively, we might choose to add some fields to certain relations to speed up some important queries, even if this leads to a redundant storage of some information (and consequently, a schema that is in neither 3NF nor BCNF).This discussion of normalization has concentrated on the technique of dec
40、omposition, which amounts to vertical partitioning of a relation. Another technique to consider is horizontal partitioning of a relation, which would lead to our having two relations with identical schemas. Note that we are not talking about physically partitioning the cuples of a single relation; r
41、ather, we want to create two distinct relations (possibly with different constraints and indexes on each).Incidentally, when we redesign the conceptual schema, especially if we are tuning an existing database schema, it is worth considering whether we should create views to mask these changes from u
42、sers for whom the original schema is more natural. TUNING QUERIES AND VIEWSIf we notice that a query is running much slower than we expected, we have to examine the query carefully to end the problem. Some rewriting of the query, perhaps in conjunction with some index tuning, can often ?x the proble
43、m. Similar tuning may be called for if queries on some view run slower than expected. When tuning a query, the first thing to verify is that the system is using the plan that you expect it to use. It may be that the system is not finding the best plan for a variety of reasons. Some common situations
44、 that are not handled efficiently by many optimizers follow:A selection condition involving null values.Selection conditions involving arithmetic or string expressions or conditions using the or connective. For example, if we have a condition E.age = 2*D.age in the WHERE clause, the optimizer may co
45、rrectly utilize an available index on E.age but fail to utilize an available index on D.age. Replacing the condition by E.age/2=D.age would reverse the situation.Inability to recognize a sophisticated plan such as an index-only scan for an aggregation query involving a GROUP BY clause. If the optimi
46、zer is not smart enough to and the best plan (using access methods and evaluation strategies supported by the DBMS), some systems allow users to guide the choice of a plan by providing hints to the optimizer; for example, users might be able to force the use of a particular index or choose the join
47、order and join method. A user who wishes to guide optimization in this manner should have a thorough understanding of both optimization and the capabilities of the given DBMS.(8)OTHER TOPICSMOBILE DATABASESThe availability of portable computers and wireless communications has created a new breed of
48、nomadic database users. At one level these users are simply accessing a database through a network, which is similar to distributed DBMSs. At another level the network as well as data and user characteristics now have several novel properties, which affect basic assumptions in many components of a D
49、BMS, including the query engine, transaction manager, and recovery manager.Users are connected through a wireless link whose bandwidth is ten times less than Ethernet and 100 times less than ATM networks. Communication costs are therefore significantly higher in proportion to I/O and CPU costs.Users locations are constantly changing, and mobile computers have a limited battery life. Therefore, the true communication costs is connection time and battery usage in additi
版权声明:以上文章中所选用的图片及文字来源于网络以及用户投稿,由于未联系到知识产权人或未发现有关知识产权的登记,如有知识产权人并不愿意我们使用,如有侵权请立即联系:2622162128@qq.com ,我们立即下架或删除。
Copyright© 2022-2024 www.wodocx.com ,All Rights Reserved |陕ICP备19002583号-1
陕公网安备 61072602000132号 违法和不良信息举报:0916-4228922