編譯入門 - gcc toolchain

程式編譯的過程是很複雜的，初學者在學習寫程式的時候，大部份是透過IDE來編譯程式的，所以將內部的編譯流程都隱藏了起來。其實過程是很複雜的，我打算先以觀察gcc 編譯程式的的過程以及中間的產物來開始學習。

以下資料是透過閱讀「程式設計師的自我修養 - 連結、載入、程式庫」並在自己的電腦上驗證結果。

先透過一個最簡單的入門程式開始學起，Hello World !!

接下來我們使用gcc來編譯它巴


$gcc hello.c 

$./a.out 

Hello World!

上面看似簡單的編譯指令，其實中間包括了4個階段

前置處理 (Preprocessing)
編譯 (Compilation)
組譯 (Assembly)
連結 (Linking)

圖示：

下面我會使用gcc 一步一步的編譯 Hello World 這隻程式，將結果呈獻出來。

前編譯

第一個步驟，主要是處理在code中以 "#" 開頭的前編譯指令，像是 "#include" ， "#define" 等等。

規則如下：

把所有 #define 刪除，並且展開所有的巨集定義。
處理所有條件編譯指令如，"#if" "#ifdef" "#elif" "#else" "#endif"
處理 "#include" ，將包含的檔案插入到該前編譯指令的位置。重點是這個過程是遞迴進行的，也就是說包含的檔案可能還包含其他檔案。
刪除所有的註解 "//" 和 "/* */"
添加行號和檔名標識，像是 #2 "hello.c" 2，以便編譯時期產生除錯用的行號資訊，以及用於編譯時期產生的錯誤或警告訊息能夠顯示行號。
保流所有 #pragma 編譯器指令，因為編譯器需要使用他們。

指令是：


$gcc -E hello.c -o hello.i


$cpp hello.c > hello.i

可以透過file指令看這個hello.i的資訊，是可讀的ASCII text，這個檔案還是屬於 C source。
我們可以透過vim打開看看裏面是什麼東東。

簡單短短的不到10行的程式，經過前處理之後，產生了多達843行。原因就是中間插入了展開的程式，而且是遞迴式的展開，和行號資訊等等。

編譯

到了這個階段，就是將前置處理完的檔案進行一系列的詞法分析，語法分析，語意分析，和最佳化後產生相應的組合語言。關於這邊細節的部份之後會再整理另起文章講述心得。
因為這個過程是最複雜的地方，需要更完整的講解。

指令是：


$gcc -S hello.i -o hello.s

可以透過file指令了解這是 assembler source，裡面的內容就是所產生的組合語言。

內容就是x86的組合語言，和暫存器，以及一些flag和section，這邊先不多贅述。

然而現在的gcc把前編譯和編譯合併成衣個步驟了，透過cc1的程式來完成這兩個步驟。
這隻程式再哪呢？以我的筆電的話，因為平台是 64 bit所以是在 /usr/lib/gcc/x86_64-linux-gnu，然後gcc版本是 4.9.2
所以指令和輸出結果是：


$/usr/lib/gcc/x86_64-linux-gnu/4.9.2/cc1 hello.c 

 main 

Analyzing compilation unit 

Performing interprocedural optimizations 

 <*free_lang_data>   <*free_inline_summary>  Assembling functions: 

 main 

Execution times (seconds) 

 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 (50%) wall    1123 kB (66%) ggc 

 phase parsing           :   0.01 (100%) usr   0.00 ( 0%) sys   0.01 (50%) wall     524 kB (31%) ggc 

 preprocessing           :   0.01 (100%) usr   0.00 ( 0%) sys   0.01 (50%) wall     219 kB (13%) ggc 

 TOTAL                 :   0.01             0.00             0.02               1709 kB

也可以直接用以下指令


$gcc -S hello.c -o hello.s

「重要觀念」
gcc這個指令只是這些後台程式的包裝，會根據不同的參數要求而呼叫前編譯器 cc1 ，組譯器 as，連結器 ld。

組譯

他可以將組合語言轉換成機器可以執行的指令。每一個組合語句幾乎都對應一條機器指令。所以組譯器過程相對於編譯器來講比較簡單，沒有複雜的語法，沒有語意，也不需要指令最佳化，只是根據組語指令和機器指令的對照表一一翻譯。

指令：


$as hello.s -o hello.o


$gcc -c hello.s -o hello.o

或是直接用gcc從c檔案直接產生目的檔


$gcc -c hello.c -o hello.o

可以用file指令看
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
他是可重定的檔案，因為裏面有用到外來檔案的函式呼叫，所以這些函式呼叫需要經過連結才能確定位址。

連結

這個部份正好是這本書最主要想要講述的地方。
透過 ld 連結器，產生能夠真正執行的HelloWorld程式。

指令 :


$ld -static crt1.o crti.o crtbeginT.o hello.o -start-group -lgcc -lgcc_eh -lc -end-group crtend.o crtn.o

可以用file來看一下這個可執行檔

hello.o: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=a2f1d119391894eb7d6afb79a0962c2cd27380ce, not stripped

可以透過gcc編譯時開啟 Verbose mode來看編譯時所需要連結哪些檔案，輸出的結果也包含之前階段的編譯過程。


$gcc hello.c -o hello.o -v

相關的解釋可以看到stackoverflow上的解釋

另外可以透過 man gcc，來了解每個編譯時下的參數定義

-c : Compile or assemble the source files, but do not link. The linking stage simple is not done. The ultimate output is in the form of an object file for each source file.
By default, the object file name for a source file is made by replacing the suffix .c, .i, .s, etc., with .o
Unrecognized input files, not requiring compilation or assembly, are ignored.
-S : Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified
By default, the assembler file name for a source file is made by replacing the suffix .c, .i, etc., with .s
Input files that don't require compilation are ignored.
-E : Stop after the preprocssing stage; do not run the compiler proper. The output is int the form of preprocessed source code, which is sent to the standard output.
Input files that don't require preprocessing are ignored.
-o : Place output in file file. This applies to whatever sort of output is being produced, whether it be an executable file, an object file, an assembler file or preprocessed C code.
If -o is not specified, the default is to put an executable file in a.out, the object file for source.suffix in source.o, its assembler file in source.s, a precompiled header file in source.suffix.gch, and all preprocessed C source on standard output.

藉由select & poll來學習 Linux device driver programming

最近因為想學習撰寫一隻 Linux device driver，所以先藉由 Linux Device Driver Programming 驅動程式設計，這本書裡的select & poll 的範例程式開始學習。但是因為這本書當時再寫的時候，kernel版本大致分為 2.4 和 2.6，版本比較舊，有些函式已有變動，所以有做一些修改。(我的Kernel版本為4.2版) 驅動程式筆記與程式碼講解函式加上 static ，讓命名空間限制在檔案內。不過在這裡就算不加static也不會影響kernel整體的符號表。這是只緊限於kernel module的時候，也就是動態載入的驅動程式來說。如果是希望其他module能夠呼叫的話，就必須要使用EXPORT_SYMBOL來明確匯出函式。驅動程式內部能夠使用到 printf()，因為kernel space沒有直接對應的console (鍵盤，畫面)。但是還是有可代用的function printk()。它輸出的資料會跑到kernel buffer內。kernel buffer 可以用 dmesg 指令查閱，不過空間才只有 128KB(default)，而且是環狀的形式(所以心資料會蓋過最舊的資料)。因此不能一直把資料保留在裏面，可以用syslogd 或是 klogd 之類的程式把資料寫到 syslog 裡(var/log/message)，不過這種方法還是可能會漏掉訊息。下面是訊息等級，在 kernel 4.2 版本，預設等級是 log level。但是在 kernel 2.6 版本預設等級是 4。驅動程式的進入點並不是 main() ，因為驅動程式與一般應用程式是不同的，而且必須準備多個，其中至少要兩個進入點 insmod 與 modprobe 呼叫的初始化函式 rmmod 呼叫的結束函式對於這隻driver來說就是 devone_init()與devone_exit()。其他進入點還包含系統呼叫中斷服務程序計時器程序驅動程式碼

閱讀完整內容

蓋瑞隨筆記

搜尋此網誌

編譯入門 - gcc toolchain

前編譯

編譯

組譯

連結

標籤

留言

張貼留言

這個網誌中的熱門文章

藉由select & poll來學習 Linux device driver programming

安裝QT可能會發生的問題，跟解決辦法