From patchwork Sat Jan 20 23:49:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Randy Dunlap X-Patchwork-Id: 10176779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1457D6023A for ; Sat, 20 Jan 2018 23:49:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 018D8205AF for ; Sat, 20 Jan 2018 23:49:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E9296205F8; Sat, 20 Jan 2018 23:49:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1206A205AF for ; Sat, 20 Jan 2018 23:49:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756734AbeATXtR (ORCPT ); Sat, 20 Jan 2018 18:49:17 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:54101 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756726AbeATXtR (ORCPT ); Sat, 20 Jan 2018 18:49:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: Content-Type:MIME-Version:Date:Message-ID:Cc:Subject:From:To:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=aMrSzQTVEsfUNktyUuoNAZ17JsbOCSawMQvuMDvKHcc=; b=UMqNtmZapQonZ9z893wTZSHjJ ssXNczpO2cpqz9Bm4Ok5b3hP8nPjQG9AqSUQnoLhXFOsm28FYUvH+Pay4aQQus62+oTj9Ps/UsgfZ HUSQHJL9KqD26s9AY6EwF7Ve63TNnTFj2qhLIGOEv5XzXVWnxJNYbnPtgtZe3cDmAOwypVF3IYzLH /5OPXGF5ydW/d1Z1hiIWnKG4wknxpqrJDGXwvqwvySbsJWwaCm2sijGj0dtQdVnDi1J99sVLQc99n QuttJ/E8BoHTg3xeul3OzkaIQc3wmZ4LNuemVSJzI7ssTTjF+9xQFba2g+pEOrrIbO60Cu15VjfcP gODLjEpKA==; Received: from static-50-53-52-16.bvtn.or.frontiernet.net ([50.53.52.16] helo=dragon.site) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1ed2su-0001ei-68; Sat, 20 Jan 2018 23:49:16 +0000 To: Linux-Sparse , Christopher Li From: Randy Dunlap Subject: [PATCH] Documentation: make data-structures.txt easier to read Cc: Josh Triplett Message-ID: <0db75caa-21ef-c5c9-2ae6-63cb115d1c1f@infradead.org> Date: Sat, 20 Jan 2018 15:49:15 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Language: en-US Sender: linux-sparse-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sparse@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Randy Dunlap Make this file easier to read by removing the name at the beginning of each line. Put its source and attribution at the beginning of the file. Signed-off-by: Randy Dunlap Reviewed-by: Josh Triplett --- Documentation/data-structures.txt | 107 ++++++++++++++-------------- 1 file changed, 54 insertions(+), 53 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- sprs-2018-0119.orig/Documentation/data-structures.txt +++ sprs-2018-0119/Documentation/data-structures.txt @@ -1,54 +1,55 @@ +- A slightly edited irc discussion with Josh Triplett. +- Describes most data structures used in sparse. - As far as the parsing structures go... - The C parser exists in two main files: parse.c, which parses statements, and expression.c, which parses expressions. - parse.h contains the definition of struct statement, which represents a C statement. - That includes only those things which can't appear as an expression, which primarily includes control flow statements such as if, loops, switch/case, and goto. - expression.h contains the definition of struct expression, which represents a C expression. That has a lot more content, since most C constructs can appear in expressions. - A series of statements forms a compound statement (STMT_COMPOUND). - That appears as another struct statement which has a statement_list member. - A function body consists of a compound statement. - When you look at a loop body, if or else body, or case body, you'll notice that they just have a struct statement, not a statement_list; they can have multiple statements by using a compound statement. - Also note that all loops get turned into a single "iterator" statement. - for, while, and do-while. - A symbol, then, represents a name in a C file. A symbol might represent a variable, a function, a label, or various other things. - See symbol.h. - "struct symbol" represents one symbol. - As with the various other structures, it has some common data and a union of sub-structures for the parts that differ between different types. - Most of the interesting bits come in the NS_SYMBOL case. - Among other things, it has a struct statement for the body of a function (if any), a list of symbols for the arguments, an expression for a variable initializer, and so on. - Together, struct symbol, struct statement, and struct expression represent most of the abstract syntax tree for C. - So, that represents most of the "front-end" of Sparse: parsing C and generating that abstract syntax tree. - That much occurs in pretty much any program using the Sparse frontend. - The backend varies among programs. - For instance, the c2xml backend goes that far, then outputs XML. - The sparse static analysis backend has a few steps: it generates linearized bytecode, does some evaluation on that, and outputs some warnings. - Several other backends run that linearized bytecode stage. - The linearized bytecode itself has a set of nested structures. - linearize.h defines all of them. - At the top level, it has struct entrypoint. - That represents an entrypoint to the code, which would normally mean a function. - An entrypoint has a list of basic blocks. - struct basic_block. - A basic block represents a series of instructions with no branches. - Straight-line code. - A branch only occurs at the end of a basic block, and branches can only target the beginning of a basic block. - Typically, a conditional will consist of a basic block leading up to the branch, a basic block for the true case, a basic block for the false case, and a basic block where the two paths merge back together. - Either the true or the false case may not exist. - A loop will normally have a basic block for the loop body, which can branch to the top at the end or continue to the next basic block. - So basic blocks represent a node in the control flow graph. - The edges in that graph lead from one basic block to a basic block which can follow it in the execution of the program. - Each basic block has a series of instructions, "struct instruction". - "enum opcode" lists all the instructions. - Fairly high-level instruction set, corresponding directly to bits of C. - So you have an entrypoint, which has a graph of basic blocks, each of which has a list of instructions. - An entrypoint also has a pointer to the first instruction. - One last bit of trickiness: struct pseudo. - Have you ever heard of "static single assignment" or SSA form? - struct pseudo represents one of those single-assignment variables. - Each one has a pointer to the symbol it represents (which may have many pseudos referencing it). - Each one also has a pointer to the instruction that defines it. - That covers most of the major data structures in Sparse. - Now, given all that, some of the top-level stuff in sparse.c may make more sense. - For instance, the context checking works in terms of basic blocks. - Hopefully some of that helped you understand Sparse better. - +As far as the parsing structures go... +The C parser exists in two main files: parse.c, which parses statements, and expression.c, which parses expressions. +parse.h contains the definition of struct statement, which represents a C statement. +That includes only those things which can't appear as an expression, which primarily includes control flow statements such as if, loops, switch/case, and goto. +expression.h contains the definition of struct expression, which represents a C expression. That has a lot more content, since most C constructs can appear in expressions. +A series of statements forms a compound statement (STMT_COMPOUND). +That appears as another struct statement which has a statement_list member. +A function body consists of a compound statement. +When you look at a loop body, if or else body, or case body, you'll notice that they just have a struct statement, not a statement_list; they can have multiple statements by using a compound statement. +Also note that all loops get turned into a single "iterator" statement. +for, while, and do-while. +A symbol, then, represents a name in a C file. A symbol might represent a variable, a function, a label, or various other things. +See symbol.h. +"struct symbol" represents one symbol. +As with the various other structures, it has some common data and a union of sub-structures for the parts that differ between different types. +Most of the interesting bits come in the NS_SYMBOL case. +Among other things, it has a struct statement for the body of a function (if any), a list of symbols for the arguments, an expression for a variable initializer, and so on. +Together, struct symbol, struct statement, and struct expression represent most of the abstract syntax tree for C. +So, that represents most of the "front-end" of Sparse: parsing C and generating that abstract syntax tree. +That much occurs in pretty much any program using the Sparse frontend. +The backend varies among programs. +For instance, the c2xml backend goes that far, then outputs XML. +The sparse static analysis backend has a few steps: it generates linearized bytecode, does some evaluation on that, and outputs some warnings. +Several other backends run that linearized bytecode stage. +The linearized bytecode itself has a set of nested structures. +linearize.h defines all of them. +At the top level, it has struct entrypoint. +That represents an entrypoint to the code, which would normally mean a function. +An entrypoint has a list of basic blocks. +struct basic_block. +A basic block represents a series of instructions with no branches. +Straight-line code. +A branch only occurs at the end of a basic block, and branches can only target the beginning of a basic block. +Typically, a conditional will consist of a basic block leading up to the branch, a basic block for the true case, a basic block for the false case, and a basic block where the two paths merge back together. +Either the true or the false case may not exist. +A loop will normally have a basic block for the loop body, which can branch to the top at the end or continue to the next basic block. +So basic blocks represent a node in the control flow graph. +The edges in that graph lead from one basic block to a basic block which can follow it in the execution of the program. +Each basic block has a series of instructions, "struct instruction". +"enum opcode" lists all the instructions. +Fairly high-level instruction set, corresponding directly to bits of C. +So you have an entrypoint, which has a graph of basic blocks, each of which has a list of instructions. +An entrypoint also has a pointer to the first instruction. +One last bit of trickiness: struct pseudo. +Have you ever heard of "static single assignment" or SSA form? +struct pseudo represents one of those single-assignment variables. +Each one has a pointer to the symbol it represents (which may have many pseudos referencing it). +Each one also has a pointer to the instruction that defines it. +That covers most of the major data structures in Sparse. +Now, given all that, some of the top-level stuff in sparse.c may make more sense. +For instance, the context checking works in terms of basic blocks. +Hopefully some of that helped you understand Sparse better.