File Descriptors – More than Meets the Eye – Pt. 3

A search on “what is a file descriptor” :
http://en.wikipedia.org/wiki/File_descriptor

“In computer programming, a file descriptor (FD) is an abstract indicator for accessing a file” — the “file” keyword is clickable and it goes on to compare “files” to “documents” on a computer.

The meaning is elusive. If you look at “ways to obtain an FD” — a socket() call provides a file descriptor. It’s hard to think of a network socket as a file, or document.. and the fact there is a sockfs… I’m really not getting this filesystem stuff.

“Generally, a file descriptor is an index for an entry in a kernel-resident array data structure containing the details of open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table. The process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process itself cannot read or write the file descriptor table directly.

In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directories, block and character devices (also called “special files”), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.”

‘other objects’ — notice they don’t say ‘files’ – because nobody refers to a network socket as a file!!! There is some mysterious zone to these file descriptors.. And it all happens at the kernel level..

I am digging to find out what is this is all about. Where does a file descriptor come from? How can we “create” one? My mind’s eye layed upon Solaris Doors, a door_create syscall will return a “Door descriptor” — a file descriptor without a name in the filesystem — whatever that is. Seriously, what is a filesystem anymore… It’s so abstract..

But I knew I had an opportunity to visualize how a file descriptor gets created, or at least see wtf Doors does to “create” one.

I’m using CScope on the Solaris 2.8 source tree. Here’s what I find:

osnet_volume/usr/src/uts/common/fs/doorfs/door_sys.c

int
door_create(void (*pc_cookie)(), void *data_cookie, uint_t attributes)
{
        int fd;
        proc_t *p = ttoproc(curthread);
        int err;

        if ((attributes & ~(DOOR_UNREF | DOOR_PRIVATE | DOOR_UNREF_MULTI)) ||
            ((attributes & (DOOR_UNREF | DOOR_UNREF_MULTI)) ==
            (DOOR_UNREF | DOOR_UNREF_MULTI)))
                return (set_errno(EINVAL));

        if ((err = door_create_common(pc_cookie, data_cookie, attributes, p,
            &fd, NULL)) != 0)
                return (set_errno(err));

        f_setfd(fd, FD_CLOEXEC);
        return (fd);
}

Note: FD_CLOEXEC means that the file descriptor should be closed if the process execs.

osnet_volume/usr/src/uts/common/sys/thread.h

/*
 * proctot(x)
 *      convert a proc pointer to a thread pointer. this only works with
 *      procs that have only one lwp.
 *
 * proctolwp(x)
 *      convert a proc pointer to a lwp pointer. this only works with
 *      procs that have only one lwp.
 *
 * ttolwp(x)
 *      convert a thread pointer to its lwp pointer.
 *
 * ttoproc(x)
 *      convert a thread pointer to its proc pointer.
 *
 * lwptot(x)
 *      convert a lwp pointer to its thread pointer.
 *
 * lwptoproc(x)
 *      convert a lwp to its proc pointer.
 */
#define proctot(x)      ((x)->p_tlist)
#define proctolwp(x)    ((x)->p_tlist->t_lwp)
#define ttolwp(x)       ((x)->t_lwp)
#define ttoproc(x)      ((x)->t_procp)
#define lwptot(x)       ((x)->lwp_thread)
#define lwptoproc(x)    ((x)->lwp_procp)

osnet_volume/usr/src/uts/common/sys/door.h

/* Attributes originally obtained from door_create operation */
#define	DOOR_UNREF	0x01	/* Deliver an unref notification with door */
#define	DOOR_PRIVATE	0x02	/* Use a private pool of server threads */
#define	DOOR_UNREF_MULTI 0x10	/* Deliver unref notification more than once */

/* Attributes (additional) returned with door_info and door_desc_t data */
#define	DOOR_LOCAL	0x04	/* Descriptor is local to current process */
#define	DOOR_REVOKED	0x08	/* Door has been revoked */
#define	DOOR_IS_UNREF	0x20	/* Door is currently unreferenced */

[..]
extern kmutex_t door_knob;
extern kcondvar_t door_cv;
extern size_t door_max_arg;

back in door_sys.c:

/*
 * Common code for creating user and kernel doors.  If a door was
 * created, stores a file structure pointer in the location pointed
 * to by fpp (if fpp is non-NULL) and returns 0.  Also, if a non-NULL
 * pointer to a file descriptor is passed in as fdp, allocates a file
 * descriptor representing the door.  If a door could not be created,
 * returns an error.
 */
static int
door_create_common(void (*pc_cookie)(), void *data_cookie, uint_t attributes,
    proc_t *p, int *fdp, file_t **fpp)
{
        door_node_t     *dp;
        vnode_t         *vp;
        struct file     *fp;
        extern  struct vnodeops door_vnodeops;
        static door_id_t index = 0;

        dp = kmem_zalloc(sizeof (door_node_t), KM_SLEEP);

        dp->door_target = p;
        dp->door_data = data_cookie;
        dp->door_pc = pc_cookie;
        dp->door_flags = attributes;
        vp = DTOV(dp);
        mutex_init(&vp->v_lock, NULL, MUTEX_DEFAULT, NULL);
        cv_init(&vp->v_cv, NULL, CV_DEFAULT, NULL);
        vp->v_op = &door_vnodeops;
        vp->v_type = VDOOR;
        vp->v_vfsp = &door_vfs;
        vp->v_data = (caddr_t)vp;
        VN_HOLD(vp);
        mutex_enter(&door_knob);
        dp->door_index = index++;
        /* add to per-process door list */
        door_list_insert(dp);
        mutex_exit(&door_knob);

        if (falloc(vp, FREAD | FWRITE, &fp, fdp)) {
                /*
                 * If the file table is full, remove the door from the
                 * per-process list, free the door, and return NULL.
                 */
                mutex_enter(&door_knob);
                door_list_delete(dp);
                mutex_exit(&door_knob);
                kmem_free(dp, sizeof (door_node_t));
                return (EMFILE);
        }
        if (fdp != NULL)
                setf(*fdp, fp);
        mutex_exit(&fp->f_tlock);

        if (fpp != NULL)
                *fpp = fp;
        return (0);
}

KM_SLEEP: allow sleeping until memory is available

osnet_volume/usr/src/uts/common/sys/door.h

/*
 * Underlying 'filesystem' object definition
 */
typedef struct door_node {
        vnode_t         door_vnode;
        struct proc     *door_target;   /* Proc handling this doors invoc's. */
        struct door_node *door_list;    /* List of active doors in proc */
        struct door_node *door_ulist;   /* Unref list */
        void            (*door_pc)();   /* Door server entry point */
        void            *door_data;     /* Cookie passed during invocations */
        door_id_t       door_index;     /* Used as a uniquifier */
        door_attr_t     door_flags;     /* State associated with door */
        uint_t          door_active;    /* Number of active invocations */
        struct _kthread *door_servers;  /* Private pool of server threads */
} door_node_t;

/usr/include/sys/door.h

#define VTOD(v) ((struct door_node *)(v))
#define DTOV(d) ((struct vnode *)(d))

osnet_volume/usr/src/uts/common/sys/vnode.h

/*
 * The vnode is the focus of all file activity in UNIX.
 * A vnode is allocated for each active file, each current
 * directory, each mounted-on file, and the root.
 */

/*
 * vnode types.  VNON means no type.  These values are unrelated to
 * values in on-disk inodes.
 */
typedef enum vtype {
        VNON    = 0,
        VREG    = 1,
        VDIR    = 2,
        VBLK    = 3,
        VCHR    = 4,
        VLNK    = 5,
        VFIFO   = 6,
        VDOOR   = 7,
        VPROC   = 8,
        VSOCK   = 9,
        VBAD    = 10
} vtype_t;

/*
 * All of the fields in the vnode are read-only once they are initialized
 * (created) except for:
 *      v_flag:         protected by v_lock
 *      v_count:        protected by v_lock
 *      v_pages:        file system must keep page list in sync with file size
 *      v_filocks:      protected by flock_lock in flock.c
 *      v_shrlocks:     protected by v_lock
 */
/* XX64 Can fields be reordered? */
typedef struct vnode {
        kmutex_t        v_lock;                 /* protects vnode fields */
        ushort_t        v_flag;                 /* vnode flags (see below) */
        uint_t          v_count;                /* reference count */
        struct vfs      *v_vfsmountedhere;      /* ptr to vfs mounted here */
        struct vnodeops *v_op;                  /* vnode operations */
        struct vfs      *v_vfsp;                /* ptr to containing VFS */
        struct stdata   *v_stream;              /* associated stream */
        struct page     *v_pages;               /* vnode pages list */
        enum vtype      v_type;                 /* vnode type */
        dev_t           v_rdev;                 /* device (VCHR, VBLK) */
        caddr_t         v_data;                 /* private data for fs */
        struct filock   *v_filocks;             /* ptr to filock list */
        struct shrlocklist *v_shrlocks;         /* ptr to shrlock list */
        kcondvar_t      v_cv;                   /* synchronize locking */
        void            *v_locality;            /* hook for locality info */
} vnode_t;

osnet_volume/usr/src/uts/common/sys/types.h

typedef	char		*caddr_t;	/* ?<core address> type */

Man mutex_init

SYNOPSIS
     cc -mt [ flag... ] file...[ library... ]
     #include <thread.h>
     #include <synch.h>

     int mutex_init(mutex_t *mp, int type, void * arg);

     int mutex_lock(mutex_t *mp);

     int mutex_trylock(mutex_t *mp);

     int mutex_unlock(mutex_t *mp);

     int mutex_destroy(mutex_t *mp);

DESCRIPTION
     Mutual exclusion locks (mutexes)  prevent  multiple  threads
     from  simultaneously  executing  critical  sections  of code
     which access shared data (that is, mutexes are used to seri-
     alize the execution of threads). All mutexes must be global.
     A successful call for a mutex lock by way  of   mutex_lock()
     will  cause  another  thread that is also trying to lock the
     same mutex to block until the owner thread unlocks it by way
     of   mutex_unlock().  Threads  within  the  same  process or
     within other processes can share mutexes.

     Mutexes can synchronize threads within the same  process  or
     in  other   processes.  Mutexes  can  be used to synchronize
     threads between processes if the mutexes  are  allocated  in
     writable  memory  and shared among the cooperating processes
     (see mmap(2)), and have been initialized for this task.

  Initialize
     Mutexes are either intra-process or inter-process, depending
     upon  the  argument  passed implicitly or explicitly  to the
     initialization of that mutex. A statically  allocated  mutex
     does  not  need to be explicitly  initialized; by default, a
     statically allocated mutex is initialized   with  all  zeros
     and its scope is set to be within the calling process.

     For inter-process synchronization, a mutex needs to be allo-
     cated   in  memory shared between these processes. Since the
     memory for such a mutex must be allocated dynamically,   the
     mutex needs to be explicitly initialized using mutex_init().

     The  mutex_init() function initializes the mutex  referenced
     by   mp  with  the  type specified by  type. Upon successful
     initialization the state of the  mutex  becomes  initialized

SunOS 5.8            Last change: 10 Sep1998                    1

Threads Library Functions                        mutex_init(3THR)

     and  unlocked.  No  current  type uses arg although a future
     type may specify additional behavior parameters  by  way  of
     arg. type may be one of the following:

Something’s not right. The mutex_init in Doors is using 4 arguments and different types.. This is the kernel mutex stuffs
osnet_volume/usr/src/uts/common/sys/mutex.h

/*
 * Public interface to mutual exclusion locks.  See mutex(9F) for details.
 *
 * The basic mutex type is MUTEX_ADAPTIVE, which is expected to be used
 * in almost all of the kernel.  MUTEX_SPIN provides interrupt blocking
 * and must be used in interrupt handlers above LOCK_LEVEL.  The iblock
 * cookie argument to mutex_init() encodes the interrupt level to block.
 * The iblock cookie must be NULL for adaptive locks.
 *
 * MUTEX_DEFAULT is the type usually specified (except in drivers) to
 * mutex_init().  It is identical to MUTEX_ADAPTIVE.
 *
 * MUTEX_DRIVER is always used by drivers.  mutex_init() converts this to
 * either MUTEX_ADAPTIVE or MUTEX_SPIN depending on the iblock cookie.
 *
 * Mutex statistics can be gathered on the fly, without rebooting or
 * recompiling the kernel, via the lockstat driver (lockstat(7D)).
 */
typedef enum {
        MUTEX_ADAPTIVE = 0,     /* spin if owner is running, otherwise block */
        MUTEX_SPIN = 1,         /* block interrupts and spin */
        MUTEX_DRIVER = 4,       /* driver (DDI) mutex */
        MUTEX_DEFAULT = 6       /* kernel default mutex */
} kmutex_type_t;

typedef struct mutex {
#ifdef _LP64
        void    *_opaque[1];
#else
        void    *_opaque[2];
#endif
} kmutex_t;

#ifdef _KERNEL

#define MUTEX_HELD(x)           (mutex_owned(x))
#define MUTEX_NOT_HELD(x)       (!mutex_owned(x) || panicstr)

extern  void    mutex_init(kmutex_t *, char *, kmutex_type_t, void *);
extern	void	mutex_destroy(kmutex_t *);
extern	void	mutex_enter(kmutex_t *);
extern	int	mutex_tryenter(kmutex_t *);
extern	void	mutex_exit(kmutex_t *);
extern	int	mutex_owned(kmutex_t *);
extern	struct _kthread *mutex_owner(kmutex_t *);

I found the SOB:
osnet_volume/usr/src/uts/common/os/mutex.c

/*
 * The iblock cookie 'ibc' is the spl level associated with the lock;
 * this alone determines whether the lock will be ADAPTIVE or SPIN.
 * The only exception is the case when 'ibc' is exactly LOCK_LEVEL,
 * which we treat as ADAPTIVE unless SPIN is explicitly requested.
 * At present, the only lock with this dubious property is reaplock.
 */
/* ARGSUSED */
void
mutex_init(kmutex_t *mp, char *name, kmutex_type_t type, void *ibc)
{
        mutex_impl_t *lp = (mutex_impl_t *)mp;

        ASSERT(ibc < (void *)KERNELBASE);       /* see 1215173 */

        if ((int)ibc >= ipltospl(LOCK_LEVEL) && ibc < (void *)KERNELBASE &&
            (SPIN_LOCK((int)ibc) || type == MUTEX_SPIN)) {
                ASSERT(type != MUTEX_ADAPTIVE && type != MUTEX_DEFAULT);
                MUTEX_SET_TYPE(lp, MUTEX_SPIN);
                LOCK_INIT_CLEAR(&lp->m_spin.m_spinlock);
                LOCK_INIT_HELD(&lp->m_spin.m_dummylock);
                lp->m_spin.m_minspl = (int)ibc;
        } else {
                ASSERT(type != MUTEX_SPIN);
                MUTEX_SET_TYPE(lp, MUTEX_ADAPTIVE);
                MUTEX_CLEAR_LOCK_AND_WAITERS(lp);
        }
}

I think might die if I analyze what that code to the bottom. So I won’t, I’ll just know that we initialize a mutex and we use mutux to serialize access to data across threads. : )

Leave a Reply

Your email address will not be published. Required fields are marked *

*