io.texi 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395
  1. @node I/O Overview, I/O on Streams, Pattern Matching, Top
  2. @c %MENU% Introduction to the I/O facilities
  3. @chapter Input/Output Overview
  4. Most programs need to do either input (reading data) or output (writing
  5. data), or most frequently both, in order to do anything useful. @Theglibc{}
  6. provides such a large selection of input and output functions
  7. that the hardest part is often deciding which function is most
  8. appropriate!
  9. This chapter introduces concepts and terminology relating to input
  10. and output. Other chapters relating to the GNU I/O facilities are:
  11. @itemize @bullet
  12. @item
  13. @ref{I/O on Streams}, which covers the high-level functions
  14. that operate on streams, including formatted input and output.
  15. @item
  16. @ref{Low-Level I/O}, which covers the basic I/O and control
  17. functions on file descriptors.
  18. @item
  19. @ref{File System Interface}, which covers functions for operating on
  20. directories and for manipulating file attributes such as access modes
  21. and ownership.
  22. @item
  23. @ref{Pipes and FIFOs}, which includes information on the basic interprocess
  24. communication facilities.
  25. @item
  26. @ref{Sockets}, which covers a more complicated interprocess communication
  27. facility with support for networking.
  28. @item
  29. @ref{Low-Level Terminal Interface}, which covers functions for changing
  30. how input and output to terminals or other serial devices are processed.
  31. @end itemize
  32. @menu
  33. * I/O Concepts:: Some basic information and terminology.
  34. * File Names:: How to refer to a file.
  35. @end menu
  36. @node I/O Concepts, File Names, , I/O Overview
  37. @section Input/Output Concepts
  38. Before you can read or write the contents of a file, you must establish
  39. a connection or communications channel to the file. This process is
  40. called @dfn{opening} the file. You can open a file for reading, writing,
  41. or both.
  42. @cindex opening a file
  43. The connection to an open file is represented either as a stream or as a
  44. file descriptor. You pass this as an argument to the functions that do
  45. the actual read or write operations, to tell them which file to operate
  46. on. Certain functions expect streams, and others are designed to
  47. operate on file descriptors.
  48. When you have finished reading from or writing to the file, you can
  49. terminate the connection by @dfn{closing} the file. Once you have
  50. closed a stream or file descriptor, you cannot do any more input or
  51. output operations on it.
  52. @menu
  53. * Streams and File Descriptors:: The GNU C Library provides two ways
  54. to access the contents of files.
  55. * File Position:: The number of bytes from the
  56. beginning of the file.
  57. @end menu
  58. @node Streams and File Descriptors, File Position, , I/O Concepts
  59. @subsection Streams and File Descriptors
  60. When you want to do input or output to a file, you have a choice of two
  61. basic mechanisms for representing the connection between your program
  62. and the file: file descriptors and streams. File descriptors are
  63. represented as objects of type @code{int}, while streams are represented
  64. as @code{FILE *} objects.
  65. File descriptors provide a primitive, low-level interface to input and
  66. output operations. Both file descriptors and streams can represent a
  67. connection to a device (such as a terminal), or a pipe or socket for
  68. communicating with another process, as well as a normal file. But, if
  69. you want to do control operations that are specific to a particular kind
  70. of device, you must use a file descriptor; there are no facilities to
  71. use streams in this way. You must also use file descriptors if your
  72. program needs to do input or output in special modes, such as
  73. nonblocking (or polled) input (@pxref{File Status Flags}).
  74. Streams provide a higher-level interface, layered on top of the
  75. primitive file descriptor facilities. The stream interface treats all
  76. kinds of files pretty much alike---the sole exception being the three
  77. styles of buffering that you can choose (@pxref{Stream Buffering}).
  78. The main advantage of using the stream interface is that the set of
  79. functions for performing actual input and output operations (as opposed
  80. to control operations) on streams is much richer and more powerful than
  81. the corresponding facilities for file descriptors. The file descriptor
  82. interface provides only simple functions for transferring blocks of
  83. characters, but the stream interface also provides powerful formatted
  84. input and output functions (@code{printf} and @code{scanf}) as well as
  85. functions for character- and line-oriented input and output.
  86. @c !!! glibc has dprintf, which lets you do printf on an fd.
  87. Since streams are implemented in terms of file descriptors, you can
  88. extract the file descriptor from a stream and perform low-level
  89. operations directly on the file descriptor. You can also initially open
  90. a connection as a file descriptor and then make a stream associated with
  91. that file descriptor.
  92. In general, you should stick with using streams rather than file
  93. descriptors, unless there is some specific operation you want to do that
  94. can only be done on a file descriptor. If you are a beginning
  95. programmer and aren't sure what functions to use, we suggest that you
  96. concentrate on the formatted input functions (@pxref{Formatted Input})
  97. and formatted output functions (@pxref{Formatted Output}).
  98. If you are concerned about portability of your programs to systems other
  99. than GNU, you should also be aware that file descriptors are not as
  100. portable as streams. You can expect any system running @w{ISO C} to
  101. support streams, but @nongnusystems{} may not support file descriptors at
  102. all, or may only implement a subset of the GNU functions that operate on
  103. file descriptors. Most of the file descriptor functions in @theglibc{}
  104. are included in the POSIX.1 standard, however.
  105. @node File Position, , Streams and File Descriptors, I/O Concepts
  106. @subsection File Position
  107. One of the attributes of an open file is its @dfn{file position} that
  108. keeps track of where in the file the next character is to be read or
  109. written. On @gnusystems{}, and all POSIX.1 systems, the file position
  110. is simply an integer representing the number of bytes from the beginning
  111. of the file.
  112. The file position is normally set to the beginning of the file when it
  113. is opened, and each time a character is read or written, the file
  114. position is incremented. In other words, access to the file is normally
  115. @dfn{sequential}.
  116. @cindex file position
  117. @cindex sequential-access files
  118. Ordinary files permit read or write operations at any position within
  119. the file. Some other kinds of files may also permit this. Files which
  120. do permit this are sometimes referred to as @dfn{random-access} files.
  121. You can change the file position using the @code{fseek} function on a
  122. stream (@pxref{File Positioning}) or the @code{lseek} function on a file
  123. descriptor (@pxref{I/O Primitives}). If you try to change the file
  124. position on a file that doesn't support random access, you get the
  125. @code{ESPIPE} error.
  126. @cindex random-access files
  127. Streams and descriptors that are opened for @dfn{append access} are
  128. treated specially for output: output to such files is @emph{always}
  129. appended sequentially to the @emph{end} of the file, regardless of the
  130. file position. However, the file position is still used to control where in
  131. the file reading is done.
  132. @cindex append-access files
  133. If you think about it, you'll realize that several programs can read a
  134. given file at the same time. In order for each program to be able to
  135. read the file at its own pace, each program must have its own file
  136. pointer, which is not affected by anything the other programs do.
  137. In fact, each opening of a file creates a separate file position.
  138. Thus, if you open a file twice even in the same program, you get two
  139. streams or descriptors with independent file positions.
  140. By contrast, if you open a descriptor and then duplicate it to get
  141. another descriptor, these two descriptors share the same file position:
  142. changing the file position of one descriptor will affect the other.
  143. @node File Names, , I/O Concepts, I/O Overview
  144. @section File Names
  145. In order to open a connection to a file, or to perform other operations
  146. such as deleting a file, you need some way to refer to the file. Nearly
  147. all files have names that are strings---even files which are actually
  148. devices such as tape drives or terminals. These strings are called
  149. @dfn{file names}. You specify the file name to say which file you want
  150. to open or operate on.
  151. This section describes the conventions for file names and how the
  152. operating system works with them.
  153. @cindex file name
  154. @menu
  155. * Directories:: Directories contain entries for files.
  156. * File Name Resolution:: A file name specifies how to look up a file.
  157. * File Name Errors:: Error conditions relating to file names.
  158. * File Name Portability:: File name portability and syntax issues.
  159. @end menu
  160. @node Directories, File Name Resolution, , File Names
  161. @subsection Directories
  162. In order to understand the syntax of file names, you need to understand
  163. how the file system is organized into a hierarchy of directories.
  164. @cindex directory
  165. @cindex link
  166. @cindex directory entry
  167. A @dfn{directory} is a file that contains information to associate other
  168. files with names; these associations are called @dfn{links} or
  169. @dfn{directory entries}. Sometimes, people speak of ``files in a
  170. directory'', but in reality, a directory only contains pointers to
  171. files, not the files themselves.
  172. @cindex file name component
  173. The name of a file contained in a directory entry is called a @dfn{file
  174. name component}. In general, a file name consists of a sequence of one
  175. or more such components, separated by the slash character (@samp{/}). A
  176. file name which is just one component names a file with respect to its
  177. directory. A file name with multiple components names a directory, and
  178. then a file in that directory, and so on.
  179. Some other documents, such as the POSIX standard, use the term
  180. @dfn{pathname} for what we call a file name, and either @dfn{filename}
  181. or @dfn{pathname component} for what this manual calls a file name
  182. component. We don't use this terminology because a ``path'' is
  183. something completely different (a list of directories to search), and we
  184. think that ``pathname'' used for something else will confuse users. We
  185. always use ``file name'' and ``file name component'' (or sometimes just
  186. ``component'', where the context is obvious) in GNU documentation. Some
  187. macros use the POSIX terminology in their names, such as
  188. @code{PATH_MAX}. These macros are defined by the POSIX standard, so we
  189. cannot change their names.
  190. You can find more detailed information about operations on directories
  191. in @ref{File System Interface}.
  192. @node File Name Resolution, File Name Errors, Directories, File Names
  193. @subsection File Name Resolution
  194. A file name consists of file name components separated by slash
  195. (@samp{/}) characters. On the systems that @theglibc{} supports,
  196. multiple successive @samp{/} characters are equivalent to a single
  197. @samp{/} character.
  198. @cindex file name resolution
  199. The process of determining what file a file name refers to is called
  200. @dfn{file name resolution}. This is performed by examining the
  201. components that make up a file name in left-to-right order, and locating
  202. each successive component in the directory named by the previous
  203. component. Of course, each of the files that are referenced as
  204. directories must actually exist, be directories instead of regular
  205. files, and have the appropriate permissions to be accessible by the
  206. process; otherwise the file name resolution fails.
  207. @cindex root directory
  208. @cindex absolute file name
  209. If a file name begins with a @samp{/}, the first component in the file
  210. name is located in the @dfn{root directory} of the process (usually all
  211. processes on the system have the same root directory). Such a file name
  212. is called an @dfn{absolute file name}.
  213. @c !!! xref here to chroot, if we ever document chroot. -rm
  214. @cindex relative file name
  215. Otherwise, the first component in the file name is located in the
  216. current working directory (@pxref{Working Directory}). This kind of
  217. file name is called a @dfn{relative file name}.
  218. @cindex parent directory
  219. The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
  220. have special meanings. Every directory has entries for these file name
  221. components. The file name component @file{.} refers to the directory
  222. itself, while the file name component @file{..} refers to its
  223. @dfn{parent directory} (the directory that contains the link for the
  224. directory in question). As a special case, @file{..} in the root
  225. directory refers to the root directory itself, since it has no parent;
  226. thus @file{/..} is the same as @file{/}.
  227. Here are some examples of file names:
  228. @table @file
  229. @item /a
  230. The file named @file{a}, in the root directory.
  231. @item /a/b
  232. The file named @file{b}, in the directory named @file{a} in the root directory.
  233. @item a
  234. The file named @file{a}, in the current working directory.
  235. @item /a/./b
  236. This is the same as @file{/a/b}.
  237. @item ./a
  238. The file named @file{a}, in the current working directory.
  239. @item ../a
  240. The file named @file{a}, in the parent directory of the current working
  241. directory.
  242. @end table
  243. @c An empty string may ``work'', but I think it's confusing to
  244. @c try to describe it. It's not a useful thing for users to use--rms.
  245. A file name that names a directory may optionally end in a @samp{/}.
  246. You can specify a file name of @file{/} to refer to the root directory,
  247. but the empty string is not a meaningful file name. If you want to
  248. refer to the current working directory, use a file name of @file{.} or
  249. @file{./}.
  250. Unlike some other operating systems, @gnusystems{} don't have any
  251. built-in support for file types (or extensions) or file versions as part
  252. of its file name syntax. Many programs and utilities use conventions
  253. for file names---for example, files containing C source code usually
  254. have names suffixed with @samp{.c}---but there is nothing in the file
  255. system itself that enforces this kind of convention.
  256. @node File Name Errors, File Name Portability, File Name Resolution, File Names
  257. @subsection File Name Errors
  258. @cindex file name errors
  259. @cindex usual file name errors
  260. Functions that accept file name arguments usually detect these
  261. @code{errno} error conditions relating to the file name syntax or
  262. trouble finding the named file. These errors are referred to throughout
  263. this manual as the @dfn{usual file name errors}.
  264. @table @code
  265. @item EACCES
  266. The process does not have search permission for a directory component
  267. of the file name.
  268. @item ENAMETOOLONG
  269. This error is used when either the total length of a file name is
  270. greater than @code{PATH_MAX}, or when an individual file name component
  271. has a length greater than @code{NAME_MAX}. @xref{Limits for Files}.
  272. On @gnuhurdsystems{}, there is no imposed limit on overall file name
  273. length, but some file systems may place limits on the length of a
  274. component.
  275. @item ENOENT
  276. This error is reported when a file referenced as a directory component
  277. in the file name doesn't exist, or when a component is a symbolic link
  278. whose target file does not exist. @xref{Symbolic Links}.
  279. @item ENOTDIR
  280. A file that is referenced as a directory component in the file name
  281. exists, but it isn't a directory.
  282. @item ELOOP
  283. Too many symbolic links were resolved while trying to look up the file
  284. name. The system has an arbitrary limit on the number of symbolic links
  285. that may be resolved in looking up a single file name, as a primitive
  286. way to detect loops. @xref{Symbolic Links}.
  287. @end table
  288. @node File Name Portability, , File Name Errors, File Names
  289. @subsection Portability of File Names
  290. The rules for the syntax of file names discussed in @ref{File Names},
  291. are the rules normally used by @gnusystems{} and by other POSIX
  292. systems. However, other operating systems may use other conventions.
  293. There are two reasons why it can be important for you to be aware of
  294. file name portability issues:
  295. @itemize @bullet
  296. @item
  297. If your program makes assumptions about file name syntax, or contains
  298. embedded literal file name strings, it is more difficult to get it to
  299. run under other operating systems that use different syntax conventions.
  300. @item
  301. Even if you are not concerned about running your program on machines
  302. that run other operating systems, it may still be possible to access
  303. files that use different naming conventions. For example, you may be
  304. able to access file systems on another computer running a different
  305. operating system over a network, or read and write disks in formats used
  306. by other operating systems.
  307. @end itemize
  308. The @w{ISO C} standard says very little about file name syntax, only that
  309. file names are strings. In addition to varying restrictions on the
  310. length of file names and what characters can validly appear in a file
  311. name, different operating systems use different conventions and syntax
  312. for concepts such as structured directories and file types or
  313. extensions. Some concepts such as file versions might be supported in
  314. some operating systems and not by others.
  315. The POSIX.1 standard allows implementations to put additional
  316. restrictions on file name syntax, concerning what characters are
  317. permitted in file names and on the length of file name and file name
  318. component strings. However, on @gnusystems{}, any character except
  319. the null character is permitted in a file name string, and
  320. on @gnuhurdsystems{} there are no limits on the length of file name
  321. strings.