Computer Science

A course of study in computer science

topic text pages
Programming Structure and Interpretation of Computer Programs Abelson and Sussman 610
Algorithms and Data SRtructures Introduction to Algorithms Cormen etal 1126
Compilers Compilers: Principles, Techniques, and Tools Aho etal 987
Operating Systems Operating Systems Design and Implementation Tanenbaum and Woodhull 1080
Networking TCP/IP Illustrated, Vol. 1: The Protocols Stevens 538
Computation Introduction to the Theory of Computation Sipser 413
Databases and Concurrency Transaction Processing: Concepts and Techniques Gray and Reuter 973
Machine Learning Artificial Intelligence: A Modern Approach Russell and Norvig 975
Information Retrieval Introduction to Information Retrieval Manning etal 496
Types Types and Programming Languages Pierce 489
Statistics Introduction to Mathematical Statistics Hogg and Craig 542
Numerical Analysis Numerical Analysis: Mathematics of Scientific Computing Kincaid and Cheney 729

Programming

In 1980 Abelson and Sussman designed MIT's introductory programming class: 6.001. The text for class was published in 1984 and revised in 1996. In the 2008 academic year the course was replaced with 6.01, which taught a mix of computer science and electrical engineering and used Python as a programming language. There is no published text for the new course and even if there was it might not be better than SICP as an introduction to programming because of the EE component.

Part of the appeal of SICP is in the choice of the language: Scheme. SICP spends very little time explaining how the language works and it doesn't refer the reader to a separate tutorial on the language either. Scheme is a simple language and the features which are needed to read the book and do the exercises are simpler still. As a result SICP is almost entirely devoted to general programming concepts.

CPT by Van Roy and Haridi uses Oz as an educational language. They illustrate a variety of programming styles with it, much like Abelson and Sussman do with Scheme. CPT has a stronger emphasis on concurrency. Oz is a declarative language in which threads block when they attempt to read from a variable that has not been assigned a value yet. This makes it a nice language for writing concurrent programs.

SICP: Table of Contents

If you wanted to abbreviate your study of SICP, you could leave out sections 3.4, 4.2, and 4.3. Consider skipping them and coming back later in any case. If you wanted to make the study even shorter you could also leave out 4.4 and all of chapter 5.

Algorithms and Data Structures

Knuth has published 4 volumes of ACP and more are planned. With the volumes published so far ACP has 3000 pages of text, making it too voluminous for undergraduate study. ACP has a pleasing level of mathematical rigor compared to Cormen etal and it covers topics which Cormen etal does not, such as random number generation. There are 30 pages in ACP devoted to the topic, followed by 80 pages of statistical tests designed to determine whether a sequence of numbers is random. My favorite test is the Chi-square because it is the one used to detect loaded dice.

Knuth presents algorithms in pseudocode and in an assembly language of his own devising called MIXAL. The machine language is called MIX. Here is a MIX simulator and MIXAL compiler implemented in JavaScript.

Introduction to Algorithms: Selected Table of Contents

Here is a short course of study consisting of sections selected from the first 14 chapters. One can make it even shorter by skipping ch. 9 and ch. 14. Your goal is to be able to implement heapsort, quicksort, a hashtable, and a red-black tree in C.

Compilers

Operating Systems

I read The Design of the Unix Operating System by Bach about 20 years ago. This still seems like a good introduction to operating systems because of the simplicity of what it describes and the prevalence of Unix. Modern systems merge the disk buffer cache and process memory management, however, and swapping is less used.

The book by Tanenbaum seems like a good balance between practice and theory because it builds a small but complete operating systems. When studying Tanenbaums ideas about OSes, it is maybe worthwhile to remember what he said in the Tanenbaum vs. Torvalds flame war. His predictions about the HURD operating system or the decline of the Intel architecture haven't come to pass. The Plan 9 designers had some intelligent criticism of Tanenbaum's ideas.

Networking

For computer networking, perhaps it is best to simply study the protocols that are actually used. Although published in 1994, Stevens remains a good source. Note that RARP and BOOTP have been replaced by DHCP as the preferred mechanism for allocating IP addresses to machines dynamically. Also, telnet, rlogin, and ftp send passwords over the internet in plain text and thus have been replaced by applications in the ssh suite.

The Internet Protocol Suite recognizes four layers. The lowest is the link layer, which today is usually one of the technologies in the list below. PPP is used over DSL connections. In the link layer, each message is called a frame and the maximum frame size is called the MTU for the local network. One can run netstat -i to get the MTU for each of the interfaces on a machine.

Above the link layer is the internet layer, which is mostly the IP protocol. The transport layer rests on top of that and is usually TCP or UDP. The top level is the application layer. Example application protocols are HTTP, SMTP, SSH-2. Here are important protocols which have appeared since the publication of Stevens:

Computation

Databases and Concurrency

Machine Learning

Information Retrieval

Lucene is a popular choice for a search engine.

Types

Type theory is only of use when designing a programming language, and then only if the language considered is statically typed. Don't be too sorry if time constraints require you to eliminate the topic from your course of study.

Pierce uses OCaml and introduces the lambda calculus in ch. 5.

Statistics

Numerical Analysis

Floating point numbers are defined by IEEE 754. Single and double precision floats use 32 bits and 64 bits, respectively. The decimal precision of single and double floats are 7.22 and 15.95 digits. The maximum exponent in decimal is 38.23 and 307.95.

The number system includes two zeros (positive and negative) and two infinities (positive and negative). In addition there are two version of NaN: quiet and signaling. The idea is that operations on a quiet NaN succeed (usually producing another NaN) whereas operations on a signaling NaN produce and error. The signaling NaN is a new feature.

There are five error conditions when performing floating point operations which are indicated in the hardware by status flags:

  • invalid operation
  • division by zero
  • overflow
  • underflow
  • inexact
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License