Category Archives: Technology

Three Required Programming Books

We were recently trying to hire some software engineers at work. Our usual approach with candidates involved a team interview session where the current developers all asked questions. A question that one of the developers on my team always asked was along the lines of, “What are three books that you think are important for all developers?” That’s not exactly how he asks it, but in my mind, I translate this into, “What three books would you expect any professional developer to be familiar with?”

It’s an interesting question and you can learn a few things about the candidate from the answer. I’ve thought about it, and I know what my answer is. I suspect it’s not the answer that would necessarily win me the most points if I were a candidate interviewing with that team. My answer certainly reveals two strong biases I have: I believe all professional programmers should be near-expert level C programmers (at least in terms of the language itself, not necessarily from a practical perspective of being able to successfully develop or manage a huge C project). I also believe that all professional developers should be familiar with The Unix Way. Because it is mostly The Right Way. Whatever the market says (clearly I’m about to disagree with the market…), it’s hard for me to consider Windows a serious enterprise application development and hosting platform, and it deserves little more than to be considered a passing fad.

Right. Back to the three books:

The C Programming Language, by Brian W. Kernighan and Dennis M. Ritchie. Given my first bias, this is an obvious choice. There has never been, nor will there likely ever be, as definitive or widely recognized a volume on any language. Read it. Know it. Love it. Everything that programmers do has C underneath it. While hiring is way too complex and there are too many other factors involved to boil things down to a simple litmus test, if life were that simple, I’d go so far as to say I wouldn’t hire any developer who hasn’t read K&R cover-to-cover at least one.

Advanced Programming In The Unix Environment, by W. Richard Stevens. Here you see my other bias playing out. I don’t really care what platform your current job is for. If you’re not able to dabble in Unix system programming (in C, of course), I’m not convinced that you have the same fundamental developer chops as people who can. This isn’t necessarily a read cover-to-cover type book after you get far enough in to understand the general Unix Way, but if you haven’t actually implemented C code that does string manipulation, file I/O, network sockets, memory management, threads, safe concurrency with critical sections, etc., then using higher level languages and frameworks is a crutch and you are more likely to make bad decisions. If you know how to do it in C, though, you can use those higher level languages for practicality and productivity, but know what their underlying implementation likely looks like and make correct decisions accordingly. If you’re a Windows Guy… sorry, Unix system programming is just so much more kick-ass than Windows system programming. (By the way, University CS programs that don’t make their Operating Systems students write a Unix shell in C that has program execution, pipe support, stdin and stdout redirection support, and a few other features are really doing their students a disservice.)

The Art of Computer Programming, by Donald Knuth. This is on my list because it simply has to be. This is the definitive monograph on the subject. I’m the first to admit that I haven’t actually read it all. The first few parts of Volume 1 deserve an honest read-through. The rest is great to skim, pick out topics you’re interested in to read more in depth…but really just the overall exposure is what you’re after. Getting used to thinking about algorithms that way Knuth talks about them. These volumes are perhaps the least practical books in my entire technical library, but at the same time if you put some effort into reading them–or parts of them–you will come away smarter than you used to be. The information density in these books is impressive. And it turns out they do have a practical side, too… I have, on numerous occasions, wanted to do something and just picked an algorithm straight out of one of the books to implement.

It’s great if candidates–or any developer–have also read practical books related to the languages, frameworks, and trends that a company they are working for uses. In fact, reading the three books I listed does very little to prepare a developer to work in a real-world software company. However, without understanding these three books (level of understanding required is in the order that I listed them… fully understand The C Programming Language, have a good handle on Advanced Programming In The Unix Environment, and get what you can via osmosis with The Art of Computer Programming), I honestly believe developers are at a disadvantage, and it will show in their software.

Automating Oracle Database Creation

Why?

I went through some time when, for some reason, I found myself creating lots of new Oracle databases on various systems. These databases were primarily on remote Solaris systems (because, as always, I don’t believe in running Oracle on Windows!).

The “obvious” way to create databases is with the Database Configuration Assistant (DBCA). However, I was unsatisfied with this approach for several reasons:

First, DBCA is a GUI tool and I only connect to the database server with SSH. To use DBCA, I ran a local X server and used X11 forwarding over SSH. Technically effective, but X over anything other than fast local network is barely usable.

Second, I wanted to provision databases that were as “lean and mean” as possible. The databases were usually for development or quick testing of different applications, and most applications didn’t depend on too much Oracle-specific functionality or advanced Oracle features. The databases that come out of DBCA always seemed a bit bloated to me. Furthermore, for applications that do use specific Oracle features (such as the embedded Java runtime, Streams, CDC, etc.), I want to know specifically what needs to be added to the base database to enable the functionality rather than just relying on a install-everything approach.

Finally, I believe anything you need to do server-side to deploy applications should be automated (or at least support the ability to automate the tasks). Creating the databases using the same automated script across my environments is much lower risk than remembering to click all the same settings in a GUI tool when I move through environments. Another aspect of this is that I was finding databases I created using DBCA on different systems tended to have variances in where various directories were created depending on how Oracle was installed. Over time I’ve come to like a particular scheme for organizing multiple databases on a single server, so by scripting the process I can go to any server that I’ve created databases on and know exactly where to find everything.

With all of that in mind, I went in search of the deep dark secrets of creating Oracle databases through PL/SQL instead of DBCA. This really boils down to three steps:

  1. Prepare to create the database
  2. Create the database
  3. Run post-creation scripts

Preparing to create the database really just involves making the directory structure you want and preparing the Oracle parameters file for the database you are going to create.

Then, creating the database is the big SQL statement to actually (duh!) create the database.

And finally, you need to run the SQL scripts to create the initial schema objects. This is also the first good opportunity to migrate the pfile to an spfile.

How?

The approach I took is to write a shell script that creates the directory structure and outputs the SQL and shell scripts to create the individual database (in the database’s admin directory so that the creation scripts used for a particular database are tucked away in that particular database’s directory structure for future reference).

The “creation script creator script” has some parameters you can change to indicate where Oracle is installed, and then of course the rest of the script builds paths based on how I normally set things up and like to see it organized. Very briefly, Oracle product is installed under /u01 and all of my data files go under /u02/oradata/database and recovery files go under /u02/orarecovery/database. I throw two control files under /u02 and stash one under /u01 on the theory that /u01 and /u02 should be different LUNs. Any other administrative stuff goes under /u01/app/oracle/admin/database.

The SID of the database you want to create is the only command-line parameter to the script. If you want anything else to be different, you need to edit the script ahead of time. If you don’t change the template for database creation and parameter file creation in the script, you’ll end up with a character set of AL32UTF8 and the database configured to use about 512MB of RAM on the system.

So without further ado, here’s the script I use:

#!/bin/sh

DB_SID=$1
DB_DOMAIN=mattwilson.org

ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=${ORACLE_BASE}/product/10.2.0/db_1
ORACLE_ADMIN=${ORACLE_BASE}/admin/${DB_SID}

DATA_PRIMARY=/u02/oradata/${DB_SID}
DATA_SECONDARY=/u01/app/oracle/oradata/${DB_SID}
DATA_RECOVERY=/u02/orarecovery

# Create admin directories
mkdir -p ${ORACLE_ADMIN}
for x in adump bdump cdump udump scripts
do
        mkdir ${ORACLE_ADMIN}/${x}
done

# Create data directories
mkdir -p $DATA_PRIMARY
mkdir -p $DATA_SECONDARY
mkdir -p $DATA_RECOVERY

# Create init.ora file for instance
cat - > ${ORACLE_ADMIN}/scripts/init.ora << __EOF__
db_name = $DB_SID
db_domain = $DB_DOMAIN

db_block_size = 8192
undo_management = auto
undo_tablespace = undotbs1

control_files = (${DATA_PRIMARY}/${DB_SID}_ctrl_01.ctl,
                 ${DATA_PRIMARY}/${DB_SID}_ctrl_02.ctl,
                 ${DATA_SECONDARY}/${DB_SID}_ctrl_03.ctl)

background_dump_dest = ${ORACLE_ADMIN}/bdump
core_dump_dest = ${ORACLE_ADMIN}/cdump
user_dump_dest = ${ORACLE_ADMIN}/udump
audit_file_dest = ${ORACLE_ADMIN}/adump

db_recovery_file_dest = $DATA_RECOVERY
db_recovery_file_dest_size = 2147483648

sga_target = 402653184
__EOF__

# Create database creation script
cat - > ${ORACLE_ADMIN}/scripts/create.sql << __EOF__
connect / as sysdba
set echo on
spool ${ORACLE_ADMIN}/scripts/create.log

startup nomount pfile=${ORACLE_ADMIN}/scripts/init.ora;

CREATE DATABASE "${DB_SID}"
MAXINSTANCES 1
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
CHARACTER SET AL32UTF8
NATIONAL CHARACTER SET UTF8
DATAFILE '${DATA_PRIMARY}/system01.dbf'
        SIZE 128M
        AUTOEXTEND ON
        NEXT 128M MAXSIZE UNLIMITED
        EXTENT MANAGEMENT LOCAL
SYSAUX DATAFILE '${DATA_PRIMARY}/sysaux01.dbf'
        SIZE 128M
        AUTOEXTEND ON
        NEXT 128M MAXSIZE UNLIMITED
UNDO TABLESPACE "UNDOTBS1" DATAFILE '${DATA_PRIMARY}/undotbs01.dbf'
        SIZE 128M
        AUTOEXTEND ON
        NEXT 16M MAXSIZE UNLIMITED
DEFAULT TEMPORARY TABLESPACE TEMP
        TEMPFILE '${DATA_PRIMARY}/temp01.dbf'
        SIZE 32M
        AUTOEXTEND ON
        NEXT 8M MAXSIZE UNLIMITED
DEFAULT TABLESPACE USERS DATAFILE '${DATA_PRIMARY}/users01.dbf'
        SIZE 64M
        AUTOEXTEND ON
        NEXT 64M MAXSIZE UNLIMITED
LOGFILE GROUP 1 ('${DATA_PRIMARY}/redo01.log') SIZE 64M,
        GROUP 2 ('${DATA_PRIMARY}/redo02.log') SIZE 64M,
        GROUP 3 ('${DATA_PRIMARY}/redo03.log') SIZE 64M;

@?/rdbms/admin/catalog.sql
@?/rdbms/admin/catproc.sql

connect system/manager
@?/sqlplus/admin/pupbld

connect / as sysdba
shutdown immediate;
connect / as sysdba
startup mount pfile=${ORACLE_ADMIN}/scripts/init.ora;
alter database archivelog;
alter database open;
create spfile='${ORACLE_HOME}/dbs/spfile${DB_SID}.ora'
        from pfile='${ORACLE_ADMIN}/scripts/init.ora';
shutdown immediate;
startup;

execute utl_recomp.recomp_serial();

exit;
__EOF__

# Create run script
cat - > ${ORACLE_ADMIN}/scripts/create.sh << __EOF__
#!/bin/sh
ORACLE_HOME=$ORACLE_HOME
ORACLE_SID=$DB_SID
export ORACLE_HOME ORACLE_SID
\$ORACLE_HOME/bin/sqlplus /nolog @create
__EOF__

chmod +x ${ORACLE_ADMIN}/scripts/create.sh

# All done!
echo -------------------------------------------------------------
echo Ready to run create database script.
echo Go to ${ORACLE_ADMIN}/scripts
echo Then run create.sh in that directory.
echo -------------------------------------------------------------

Just save that as something like create-setup-script.sh, make it executable, and you’re all set!

Dear Google: can you please add two features to GMail for me?

For several years, I ran my own server to handle my email. At first it was a fun project, gave me good real-world experience, and provided flexibility that I wouldn’t have had with most hosted options. Procmail and mutt were my friends. Over time, though, it became more of a burden than it was fun to keep up with anti-spam measures, and in the grand scheme of things I just didn’t feel like spending my free time maintaining caring for and feeding a production mail server.

The death knell for my own server was the introduction of Google Apps For Your Domain. Having played with regular GMail in the past, I liked the interface and its threading model, and I buy into the philosophy of searching email archives instead of trying to organize them. For those and other reasons, moving email to Google Apps sounds like a good option, so I set up a test domain and eventually moved mattwilson.org to Google Apps.

In short, I’ve been happy with the service and their spam filter is amazingly accurate. So I’m a happy camper, but there is one area where I’d like to see a couple of improvements: handling email list subscriptions.

I subscribe to several mail lists, and GMail’s searching and conversation threading features particularly shine when reading list traffic. Each list gets its own label and messages “skip the inbox” so I can just go through and read the lists I’m interested in as I have time. But here’s where the problems arise:

First, GMail’s filters don’t allow me to reliably drop messages from particular lists in a particular label (for GMail neophytes, think of labels as folders). For some lists I’ve subscribed to, the only way to identify that I received the message from that list is by looking for a specific header. Unfortunately, I can’t filter based on headers with GMail so the messages from those lists couldn’t be filed correctly. Even for the majority of my lists which I filter based on the list address in the “To” field, I occasionally get messages in the inbox because the list was bcc’ed for the particular message. There’s another header that still identifies the list, but I can’t act on it. So feature request one: I’d like to filter based on headers.

Second, I don’t read every message on every list. My workflow is to click on a label, scan the subject lines, and read the messages that look interesting. This leaves several unread conversations, and in the best case it takes three clicks to mark the remaining conversations as read. If I’ve been on vacation or not reading list traffic for a couple days and the messages expand past the first list screen, it takes more work to mark them as read. So feature request two: while browsing a label, I’d like a “Catch Up” or “Mark All As Read” button right up there next to the Delete button.

GMail is inherently a natural fit for managing an email account that subscribes to mail lists. The search is great, and the conversation interface is wonderful for following threads. With the addition of header-based filtering and a quick way to mark everything from a list as read, it would be truly fantastic.