铁定不纯的IO_Haskell笔记5

写在前面
一直有个疑惑，Haskell号称纯函数式语言，那么铁定不纯的场景（肯定有副作用，或者操作本身就是副作用）如何解决？

比如（伪）随机数、I/O等，一个纯函数的随机数发生器肯定是不存在的，那要如何处理这种场景呢？

Haskell的做法其实类似于React的componentDidMount()等组件生命周期函数，React建议（道德约束）保持render()是纯函数，带有副作用的操作挪到componentDidMount()等生命周期中。也就是通过生命周期钩子，把纯的和不纯的区分开。Haskell提供了do语句块，也是用来隔离不纯的部分的

一.I/O action
先看个函数类型：

> :t printprint :: Show a => a -> IO ()

print函数接受一个Show类参数，返回一个IO ()，称之为I/O Action，也是一种类型，如下：

> :k IOIO :: * -> *> :k IO ()IO () :: *> :i IOnewtype IO a  = GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld                  -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))    -- Defined in ‘GHC.Types’instance Monad IO -- Defined in ‘GHC.Base’instance Functor IO -- Defined in ‘GHC.Base’instance Applicative IO -- Defined in ‘GHC.Base’instance Monoid a => Monoid (IO a) -- Defined in ‘GHC.Base’

从类型上看，IO与Maybe :: -> 类似，都是接受一个具体类型参数，返回具体类型（比如IO ()）

P.S.其中，newtype与data类型声明类似，语法和用法也都基本相同，newtype是更严格的类型声明（直接换成data也能正常用，data换newtype就不一定了），具体区别是：

data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.

二.用户输入
可以通过I/O Action获取用户输入，例如：

main = do  line <- getLine  if null line then    return ()  else do -- do用来合成action    putStrLn line    main

上面示例是个简单的echo程序，getLine取一行输入，返回IO String，并通过<-运算符把String取出来，赋值给line变量，为空则什么都不做（返回IO ()，结束），否则把该行内容通过putStrLn输出到标准输出并换行，并递归执行main

其中，main表示入口函数（与C语言类似），do用来把多个I/O Action合并成一个，返回被合并的最后一个I/O Action。另外，do语句块里的I/O Action会执行，所以do语句块有2个作用：

可以有多条语句，但最后要返回I/O Action圈定不纯的环境，I/O Action能够在这个环境执行

类比JS，组合多条语句的功能类似于逗号运算符，返回最后一个表达式的值。圈定不纯环境类似于async function，I/O Action只能出现在do语句块中，这一点类似于await

P.S.实际上，执行I/O Action有3种方式：

绑定给main时，作为入口函数

放到do语句块里

在GHCi环境输入I/O Action再回车，如putStrLn "hoho"

执行
可以把main当做普通函数在GHCi环境下执行，例如：

> :l echo[1 of 1] Compiling Main             ( echo.hs, interpreted )Ok, modules loaded: Main.> mainwhat?what?

输入空行会退出，输入其它内容会按行原样输出

也可以编译得到可执行文件：

$ ghc --make ./echo.hs[1 of 1] Compiling Main             ( echo.hs, echo.o )Linking echo ...$ ./echoherehere

三.Control.Monad
Control.Monad模块还提供了一些适用于I/O场景函数，封装了一些固定的模式，比如forever do、when condition do等，能够简化一些场景

return
return用来把value包成I/O Action，而不是从函数跳出。return与<-作用相反（装箱/拆箱的感觉）：

main = do  a <- return "hell"  b <- return "yeah!"  putStrLn $ a ++ " " ++ b

两个用途：

用来制造什么都不做的I/O Action，比如echo示例里的then部分

自定义do语句块的返回值，比如不想把I/O Action直接作为do语句块的返回值，想要二次加工的场景

when
when也是一个函数：

Control.Monad.when :: Applicative f => Bool -> f () -> f ()

可以接受一个布尔值和一个I/O Action（IO属于Applicative类），作用是布尔值为True时值为I/O Action，否则值为return ()，所以相当于：

when' c io = do  if c then io  else return ()

这个东西的类型是：

when' :: Monad m => Bool -> m () -> m ()

所以如果用于I/O的话，第二个参数的返回类型只能是IO ()，看起来不很方便，但很适合条件输出的场景，毕竟print等一系列输出函数都满足该类型

sequencesequence :: (Traversable t, Monad m) => t (m a) -> m (t a)

这个类型声明看起来比较复杂：

Traversable :: (* -> *) -> ConstraintMonad :: (* -> *) -> Constraint-- 找两个对应实例，List和IOinstance Traversable [] -- Defined in ‘Data.Traversable’instance Monad IO -- Defined in ‘GHC.Base’

在I/O List的场景（把m换成IO，t换成[]），参数的类型约束是[IO a]，返回值的类型约束是IO [a]，所以相当于：

sequence' [] = do  return []sequence' (x:xs) = do  v <- x  others <- (sequence' xs)  return (v : others)

作用是把I/O List中所有I/O结果收集起来，形成List，再包进IO

P.S.有点Promise.all的感觉，接受一组promise，返回一个新promise携带这组结果

mapM与mapM_

Control.Monad.mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)Control.Monad.mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()

在I/O List的场景，mapM第一个参数是输入a输出IO b的函数，第二个参数是[a]，返回IO [b]，返回值类型与sequence一致。作用相当于先对[a]做映射，得到I/O List，再来一发sequence，例如：

> mapM (\x -> do return $ x + 1) [1, 2, 2][2,3,3]> mapM print [1, 2, 2]122[(),(),()]

mapM_与之类似，但丢弃结果，返回IO ()，很适合print等不关心I/O Action结果的场景：

> mapM_ print [1, 2, 2]122forMControl.Monad.forM :: (Traversable t, Monad m) => t a -> (a -> m b) -> m (t b)

与mapM参数顺序相反，作用相同：

> forM [1, 2, 2] print122[(),(),()]

只是形式上的区别，如果第二个参数传入的函数比较复杂，forM看起来更清楚一些，例如：

main = do  colors <- forM [1,2,3,4] (\a -> do    putStrLn $ "Which color do you associate with the number " ++ show a ++ "?"    getLine)  putStrLn "The colors that you associate with 1, 2, 3 and 4 are: "  mapM putStrLn colors

P.S.最后用forM（交换参数顺序）也可以，但出于语义习惯，forM常用于定义I/O Action的场景（如根据[a]生成IO [b]）

foreverControl.Monad.forever :: Applicative f => f a -> f b

在I/O的场景，接受一个I/O Action，返回一个永远重复该Action的I/O Action。所以echo的示例可以近似地改写成：

echo = Control.Monad.forever $ do    line <- getLine    if null line then      return ()    else      putStrLn' line

在echo的场景体现不出来什么优势（甚至还跳不出去了，除非Ctrl+C强制中断），但有一种场景很适合forever do：

import Control.Monadimport Data.Charmain = forever $ do  line <- getLine  putStrLn $ map toUpper line

即文本处理（转换）的场景，输入文本结束时forever也结束，例如：

$ ghc --make ./toUpperCase.hs[1 of 1] Compiling Main             ( toUpperCase.hs, toUpperCase.o )Linking toUpperCase ...$ cat ./data/lines.txthoho, this is xx.who's that ?$ cat ./data/lines.txt | ./toUpperCaseHOHO, THIS IS XX.WHO'S THAT ?toUpperCase: <stdin>: hGetLine: end of file

通过forever do把文件内容逐渐行处理成大写形式，更进一步的：

$ cat ./data/lines.txt | ./toUpperCase > ./tmp.txttoUpperCase: <stdin>: hGetLine: end of file$ cat ./tmp.txtHOHO, THIS IS XX.WHO'S THAT ?

把处理结果写入文件，符合预期

四.System.IO
之前使用的getLine、putStrLn都是System.IO模块里的函数，常用的还有：

-- 输出print :: Show a => a -> IO ()putChar :: Char -> IO ()putStr :: String -> IO ()-- 输入getChar :: IO ChargetLine :: IO String

其中print用来输出值，相当于putStrLn . show，putStr用来输出字符串，末尾不带换行，二者的区别是：

> print "hoho""hoho"> putStr "hoho"hoho

P.S.IO模块的详细信息见System.IO

getContentsgetContents :: IO String

getContents能够把所有用户输入作为字符串返回，所以toUpperCase可以这样改写：

toUpperCase' = do  contents <- getContents  putStr $ map toUpper contents

不再一行一行处理，而是取出所有内容，一次全转换完。但如果编译执行该函数，会发现是逐行处理的：

$ ./toUpperCaseabcABCefdEFD

这与输入缓冲区有关，具体见Haskell: How getContents works?

惰性I/O
字符串本身是一个惰性List，getContents也是惰性I/O，不会一次性读入内容放到内存中

toUpperCase'的示例中会一行一行读入再输出大写版本，因为只在输出的时候才真正需要这些输入数据。在这之前的操作都只是一种承诺，在不得不做的时候才要求兑现承诺，类似于JS的Promise：

function toUpperCase() {  let io;  let contents = new Promise((resolve, reject) => {    io = resolve;  });  let upperContents = contents    .then(result => result.toUpperCase());  putStr(upperContents, io);}function putStr(promise, io) {  promise.then(console.log.bind(console));  io('line\nby\nline');}// testtoUpperCase();

非常形象，getContents，map toUpper等操作都只是造了一系列的Promise，直到遇到putStr需要输出结果才真正去做I/O再进行toUpper等运算

interactinteract :: (String -> String) -> IO ()

接受一个字符串处理函数作为参数，返回空的I/O Action。非常适合文本处理的场景，例如：

-- 滤出少于3字符的行
lessThan3Char = interact (\s -> unlines $ [line | line <- lines s, length line < 3])
等价于：

lessThan3Char' = do  contents <- getContents  let filtered = filterShortLines contents  if null filtered then    return ()  else    putStr filtered  where    filterShortLines = \s -> unlines $ [line | line <- lines s, length line < 3]

看起来麻烦了不少，interact函数名就叫交互，作用就是简化这种最常见的交互模式：输入字符串，处理完毕再把结果输出出来

五.文件读写
读个文件，原样显示出来：

import System.IOmain = do  handle <- openFile "./data/lines.txt" ReadMode  contents <- hGetContents handle  putStr contents  hClose handle

形式类似于C语言读写文件，handle相当于文件指针，以只读模式打开文件得到文件指针，再通过指针读取其内容，最后释放掉文件指针。直觉的，我们试着这样做：

readTwoLines = do  handle <- openFile "./data/lines.txt" ReadMode  line1 <- hGetLine handle  line2 <- hGetLine handle  putStrLn line1  putStrLn line2  hClose handle

一切正常，读取文件的前两行，再输出出来，这个指针果然是能移动的

P.S.类似的hGet/Putxxx含有很多，比如hPutStr, hPutStrLn, hGetChar等等，与不带h的版本类似，只是多个handle参数，例如：

hPutStr :: Handle -> String -> IO ()

回头看看这几个函数的类型：

openFile :: FilePath -> IOMode -> IO HandlehGetContents :: Handle -> IO StringhGetLine :: Handle -> IO StringhClose :: Handle -> IO ()

openFile接受一个FilePath和IOMode参数，返回IO Handle，拿着这个Handle就可以找hGetContents或hGetLine要文件内容了，最后通过hClose释放文件指针相关的资源。其中FilePath就是String（给String定义的别名），IOMode是个枚举值（只读，只写，追加，读写4种模式）：

> :i FilePathtype FilePath = String  -- Defined in ‘GHC.IO’> :i IOModedata IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode    -- Defined in ‘GHC.IO.IOMode’

P.S.可以把文件指针当做书签来理解，书指的是整个文件系统，这个比喻非常形象

withFilewithFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

看起来又是一种模式的封装，那么，用它来简化上面读文件的示例：

readThisFile = withFile "./data/lines.txt" ReadMode (\handle -> do    contents <- hGetContents handle    putStr contents  )

看起来更清爽了一些，越来越多的函数式常见套路，做的事情无非两种：

抽象出通用模式，包括Maybe/Either等类型抽象，forever do, interact等常用模式抽象

简化关键逻辑之外的部分，比如withFile，map, filter等工具函数能够帮助剥离样板代码（openFile, hClose等一板一眼的操作），更专注于关键逻辑

所以，withFile所作的事情就是按照传入的文件路径和读取模式，打开文件，把得到的handle注入给文件处理函数（第3个参数），最后再把handle关掉：

withFile' path mode f = do  handle <- openFile path mode  result <- f handle  hClose handle  return result

注意，这里体现了return的重要作用，我们需要在返回结果之前hClose handle，所以必须要有返回自定义值的机制

readFilereadFile :: FilePath -> IO String

输入文件路径，输出IO String，Open/Close的环节都省掉了，能让读文件变的非常简单：

readThisFile' = do  contents <- readFile "./data/lines.txt"  putStr contentswriteFilewriteFile :: FilePath -> String -> IO ()

输入文件路径，和待写入的字符串，返回个空的I/O Action，同样省去了与handle打交道的环节：

writeThatFile = do  writeFile "./data/that.txt" "contents in that file\nanother line\nlast line"

文件不存在会自动创建，覆盖式写入，用起来非常方便。等价于手动控件的麻烦方式：

writeThatFile' = do  handle <- openFile "./data/that.txt" WriteMode  hPutStr handle "contents in that file\nanother line\nlast line"  hClose handleappendFileappendFile :: FilePath -> String -> IO ()

类型与writeFile一样，只是内部用了AppendMode，把内容追加到文件末尾

其它文件操作函数
-- 在FilePath指定的路径下，打开String指定的名字拼上随机串的文件，返回临时文件名与handle组成的二元组
openTempFile :: FilePath -> String -> IO (FilePath, Handle)
-- 定义在System.Directory模块中，用来删除指定文件
removeFile :: FilePath -> IO ()
-- 定义在System.Directory模块中，用来重命名指定文件
renameFile :: FilePath -> FilePath -> IO ()
注意，其中removeFile和renameFile都是System.Directory模块定义的（而不是System.IO中的），文件增删改查，权限管理等函数都在System.Directory模块，例如doesFileExist, getAccessTime, findFile等等

P.S.更多文件操作函数，见System.Directory

参考资料
Haskell default io buffering

Buffering operations

更多相关文章

随机推荐