COMP421
Unix Environment for Programmers
Lecture 11: File I/O___________________________________________
Jeff Wiegley, Ph.D.
Computer Science
jeffw@csun.edu
09/12/2005
| ‘‘We should have some ways of connecting programs bylike garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also.’’ –M. D. McIlroy, October 11, 1964 |
1
Basic File I/O________________________________________
Operations on files are kept to a minimum:
because nearly everything is treated as files it is important that these be well designed, well defined, efficient and non-changing.
2
Opening files__________________________________________
int open(pathname, flags, [mode]) is used to open or create a file for reading (or writing if flags includes proper bits).
flag bits include:
example to open a file for appending data to the file:
#include <sys/types.h>
#include <sys/stat.h> #include <fcntl.h> int fd = open("testfile", O_RDWR|O_CREAT|O_APPEND); |
3
Closing files:__________________________________________
The Unix kernel buffers I/O operations in memory to increase performance. (Disks, tapes and punch cards were very, very slow).
For this reason as well as other accounting or procedural reasons it is necessary to “close” file descriptors when you are done using them.
#include <unistd.h>
int close(int fd); |
closure of file descriptors is very important.
web servers and such have to make sure they close what they aren’t using or they won’t be able to handle new requests.
4
Reading data:________________________________________
All reading is done with:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count); |
This is true of files, sockets and pipes and every other type of file. Other interprocess communication functions will be implemented on top of this.
Some things to be aware of:
5
Managing read():___________________________________
The amount of data read() will return is limited by either:
If you want to limit the amount of data the read() returns pass it a smaller value of count (count==1 will cause single bytes to be read.
If there are less bytes buffered than count then read() will only return what is available.
If you need to read an exact amount of data then you must nest the read() within a while loop and count how many bytes are left to go.
int need=572,gobbled;
while (need>0 && gobbled=read(fd,buf,9000)>0) need-=gobbled; |
6
Writing to files:_______________________________________
write() is similar:
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count); |
7
Seeking to a particular position:___________________
Some files such as database files have a very specific structure and grow to very large sizes.
To make operations on such files (and for hardware devices) the lseek() function allows for repositioning of the file pointer very rapidly.
#include <sys/types.h>
#include <unistd.h> off_t lseek(int fildes, off_t offset, int whence); |
The file pointer associated with the file descriptor is positioned to a specific location.
The position obtained is controlled by whence:
8
Duplicating descriptors:_____________________________
Files can be shared simultaneously by multiple processes.
To facilitate sharing the functions dup() and dup2() exist to duplicate file descriptors.
#include <unistd.h>
int dup(int oldfd); int dup2(int oldfd, int newfd); |
After duplication the two file descriptors are 100% interchangeable. performing an lseek() on one will be reflected in the other as well.2
Two very useful purposes of dup()
9
fcntl():______________________________________________
The kernel maintains a list of v-nodes for every process. Every v-node is associated with an open file descriptor.
The fcntl() function is basically a method for accessing or modifying information in the v-node such as current offset, ownership, file descriptor number.
10
ioctl():______________________________________________
Unix likes to treat everything as files but certain “devices” have needs beyond reading, writing and positioning.
ioctl() is the catch all function for making non-file-ish requests to devices represented by a file descriptor.
This includes operations such tray ejection, tape mounting and turning on device specific features such as DMA.
11
Blocking an asynchronous computation:__________
read() normally blocks when no data is ready but everything else is fine.
write() normally blocks when buffer systems are already full.
This can make modern programs very difficult to write.
Modern programs tend to be asynchronous and perform many tasks simultaneously. GUIs are highly asynchronous. Early or non-threaded web servers handle multiple HTML requests simultaneously.
If a read or write operation blocks then the entire program application blocks. This can lead to deadlock scenarios.
There are a couple of ways to handle the problem...
12
Non-Blocking I/O:___________________________________
The file descriptor can be opened with O_NONBLOCK or O_NDELAY.
This forces read() and write() to fail if they would have otherwise blocked.
In such a case read() (write()) will immediately return -1 and the global variable errno (included by #include <errno.h>) will be set to EAGAIN (errno==EAGAIN is true.)
This is an easy and straightforward method to solve asynchronous processing.
It complicates the application a bit.
It doesn’t scale well to handle lots of file descriptors.
13
select():_____________________________________________
select() is a function that can “poll” a collection of file descriptors simultaneously and returns only when one of the descriptors is “ready”.
File descriptors are represented as sets (fd_set).
The application adds desired file descriptors to read, write or exception sets and the sets are passed to select() for polling.
int select(int n, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout); |
n is the value of the largest file descriptor number plus 1!
timeout is a structure that can be used to limit the amount of time that the call to select() will block.
When select() returns the file descriptor sets can be tested to see which file descriptors are ready. File descriptors in the read set are guaranteed not to block a read if the set test yields true for that file descriptor.
14
select() continued:_________________________________
select() still complicates applications.
Works well on large sets of file descriptors.
Only provides limited asynchronous processing. (It will still block until at least file descriptor is ready or until a timeout occurs.)
15
(Non-)Blocking I/O and CPU cycles:_____________
When system calls, such as read(), block the kernel is aware and can put the process to sleep where it doesn’t use any CPU cycles. The kernel wakes the process when an event occurs that satisfies the blocking condition.
Non-blocking I/O thwarts this performance advantage because CPU cycles are consumed “polling”.
select() has the advantage that it blocks until a file descriptor is ready. A proper implementation of select() can still allow the kernel to suspend the execution of the application and use the CPU cycles for other running processes.
16
Buffered I/O (#include ¡stdio.h¿):_________________
The system-level functions of read(), write(), etc. are implemented to be simple, efficient and general purpose.
One disadvantage to these routines is that they are not buffered.3
This makes it difficult to read specific items such as integers and lines. (read() and write() have no knowledge that integers are 4 bytes long or that lines end with \n). <stdio.h>
To make life easier for the programmer the library of functions known as stdio.h was created to address this.
17
Common stdio.h functions:_______________________
stdio.h provides many functions, similar to those presented earlier, that are buffered. They don’t function on raw file descriptors; instead they operate on an abstract type called FILE * (a pointer to a FILE structure).
Allof these functions know how to perform without additional loops.
These understand numerical formats and can parse such data from stings into the mathematical value represented similar to Integer.parseInt() in Java.
These functions are built on top of the fcntl(), unistd() functions.
18