The aim of this chapter to describe the main mathematical methods and applications in the average-case analysis of algorithms and data structures (Knuthian analysis techniques). It comprises two parts: First, we present basic combinatorial enumerations based on symbolic methods and asymptotic methods with emphasis on complex analysis techniques (such as singularity analysis, saddle point, Mellin transforms). Next, we show how to apply these general methods to the analysis of sorting, searching, tree data structures, hashing, and dynamic algorithms. The emphasis is on algorithms for which exact “analytic models” can be derived.
Arithmetic coding provides an effective mechanism for removing redundancy in the encoding of data. We show how arithmetic coding works and describe an efficient implementation that uses table lookup as a fast alternative to arithmetic operations. The reduced-precision arithmetic has a provably negligible effect on the amount of compression achieved. We can speed up the implementation further by use of parallel processing. We discuss the role of probability models and how they provide probability information to the arithmetic coder. We conclude with perspectives on the comparative advantages and disadvantages of arithmetic coding.
We discuss the strategic directions and challenges in the management and use of storage systems--those components of computer systems responsible for the storage and retrieval of data. The performance gap between main and secondary memories shows no imminent sign of vanishing, and thus continuing research into storage I/O will be essential to reap the full benefit from the advances occurring in many other areas of computer science. In this report we identify a few strategic research goals and possible thrusts to meet those goals.
Video belongs to a class of information called continuous media. Continuous media is characterized by the essentially continuous manner in which the information is presented. This is in contrast to discrete media, in which there is no essential temporal component. Text, images, and graphics are examples of discrete media, while movies, sound, and computer animation are examples of continuous media. Even though a slide show is a time-based presentation of images, it is not a continuous medium since each image is viewed as an individual item. On the other hand, a video clip, while also consisting of a sequence of images, is a continuous medium since each image is perceived in the context of past and future images.
With continuous media, therefore, the temporal dimension becomes important. For example, a video sequence compressed with a constant image quality for every frame is often more desirable than one in which the image quality varies noticeably over time. However, because the compressibility of individual frames varies over time, maintaining a constant image quality results in a variation in coding rate over time. The process of controlling the coding rate to meet the requirements of a transmissions channel or storage device, while maintaining a desired level of quality, is called bit rate control. In this monograph, we focus on the rate control of compressed video. Specifically, we present a new framework for allocating bits to the compression of pictures in a video sequence.
Existing optimal rate control techniques typically regulate the coding rate to minimize a sum-distortion measure. While these techniques can leverage the wealth of tools from least-mean-square optimization theory, they do not guarantee constant-quality video, an objective often mentioned in the literature. In this book, we develop a framework that casts rate control as a resource allocation problem with continuous variables, nonlinear constraints, and a novel lexicographic optimality criterion that is motivated for uniform video quality. With the lexicographic criterion, we propose a new concept of coding efficiency to better reflect the constancy in quality that is generally desired from a video coder.
Rigorous analysis within this framework reveals a set of necessary and sufficient conditions for optimality for coding at both constant and variable bit rates. With these conditions, we are able to construct polynomial-time algorithms for optimal bit rate control. Experimental implementations of these algorithms confirm the theoretical analysis and produce encodings that are more uniform in quality than that achieved with existing rate control methods. As evidence of the generality and flexibility of the framework, we show how to extend the framework to allocate bits among multiple variable bit rate bitstreams that are to be transmitted over a common constant bit rate channel and to encompass the case of discrete variables.
On May 2-3, 2006, Purdue University, BlueCross BlueShield Association, and WellPoint, Inc. hosted 24 CEO-level healthcare executives representing a diverse cross section of the healthcare supply chain to design the U.S. healthcare-delivery system for the next generation. Participants were challenged to envision the ideal system for the future, without regard to the constraints of today’s technologies, infrastructure, or financial systems. The Regenstrief Center for Healthcare Engineering at Purdue University was tasked to present the summit discussion in the form of a white paper that represents the view of the summit participants.
The summit discussions led to three distinctive characteristics for the design of a healthcare-delivery system:
The design includes a set of innovations required to realize these characteristics as well as a set of enablers (or actions) to set the stage for the development and implementation of the new system.
This proposed design is based on a specific set of assumptions about the goals of healthcare delivery for the next generation, the attributes of the consumer and economic environment in the next generation. These assumptions may not correspond to the goals and attributes of the current healthcare system.
This design is not a detailed prescription for change -- the complex and highly-distributed nature of the healthcare system precludes the successful adoption of such a prescription. Instead, the document identifies necessary elements for the future system and aims to identify forces that can catalyze radical improvements throughout the system. Once initiated, these forces can promote the innovations and actions.
Slides for a talk (Adobe pdf format)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this book we discuss the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory.
For the batched problem of sorting and related problems like permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss prefetching, distribution, and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls) and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), when the internal data representations are compressed, or when the allocated amount of internal memory can change dynamically.
Programming tools and environments are available for simplifying the EM programming task. During the course of the book, we report on some experiments in the domain of spatial databases using the TPIE system (Transparent Parallel I/O programming Environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
This book is an expanded version of an earlier survey article.
Slides for CPM '10 keynote talk (Adobe pdf)
The field of compressed data structures seeks to achieve fast search time, but using a compressed representation, ideally requiring less space than that occupied by the original input data. The challenge is to construct a compressed representation that provides the same functionality and speed as traditional data structures. In this invited presentation, we discuss some breakthroughs in compressed data structures over the course of the last decade that have significantly reduced the space requirements for fast text and document indexing. One interesting consequence is that, for the first time, we can construct data structures for text indexing that are competitive in time and space with the well-known technique of inverted indexes, but that provide more general search capabilities. Several challenges remain, and we focus in this presentation on two in particular: building I/O-efficient search structures when the input data are so massive that external memory must be used, and incorporating notions of relevance in the reporting of query answers.
We live in challenging times -- with a sputtering economy, budget deficits, and political divide -- yet it is through such challenges that sometimes we most clearly see our potentials and the way forward. Over the past year, we in the United States have navigated a long and divisive political process in setting the stage for a new healthcare system for our citizens. The goal of the new healthcare system is an important one: to provide for the long-term health and viability of the residents of the United States. However, who is looking out for the health of the United States itself? Who will provide for the long-term health and sustenance of our national economy, our standard of living, and our global leadership?
I submit that our universities — and most especially our public universities -- play the role of improving our nation's long-term health. They perform the fundamental basic research that leads years down the road to a healthy and viable economy. In this presentation, I will discuss the pivotal role that universities play and why it is therefore so important to keep our universities strong and vital. My view is that the best way to keep universities strong and vital is to build and exploit synergies.
Slides for CIKM '12 keynote talk (Adobe pdf)
We describe recent breakthroughs in the field of compressed data structures, in which the data structure is stored in a compressed representation that still allows fast answers to queries. We focus in particular on compressed data structures to support the important application of pattern matching on massive document collections. Given an arbitrary query pattern in textual form, the job of the data structure is to report all the locations where the pattern appears. Another variant is to report all the documents that contain at least one instance of the pattern. We are particularly interested in reporting only the most relevant documents, using a variety of notions of relevance. We discuss recently developed techniques that support fast search in these contexts as well as under additional positional and temporal constraints.
Information and technology -- and their intersection in the area of information technology (IT) -- matter a lot and continue to change the world. Information technology is the breakthrough development that has opened all kinds of doors for society and civilization. This paper proposes information technology as a paradigm, both for advancing our agenda at KU in research excellence as well as for a basis of everything we do.
The University of Kansas is undergoing a transformation. It is driven by our strategic plan -- aptly named Bold Aspirations -- which guides and inspires us to raise the expectations we have for ourselves, the aspirations we have for our state, and the hopes we have for our world. We are in the third year of Bold Aspirations, and the level of change on campus so far is unprecedented.
Bold Aspirations outlines six important goals for the university. This paper relates specifically to Goal 4, which is focuses on engaged scholarship: “to engage local, state, national, and global communities as partners in scholarly activities that have direct public impact.” As part of that goal, we seek to promote active entrepreneurship and vibrant external partners. A key component of this strategy was the creation of the Office of Corporate Partnerships, developed to diversify KU's research portfolio. The Office of Corporate Partnerships was introduced into KU's existing commercialization enterprise, and in the two years since the office's creation, we have already seen an increase in the amount of corporate and foundation research funding as a percentage of our overall research portfolio.