RMS recently released reference manual for GNU C - I am finding it to be direct and succinct.
Edit: I used to teach C to a group of undergrads and one question that came up often was around the special significance of the main function and order of definition of functions.
The RMS book addresses this very succinctly -
Every C program is started by running the function named main. Therefore, the example program defines a function named main to provide a way to start it. Whatever that function does is what the program does. The main function is the first one called when the program runs, but it doesn’t come first in the example code. The order of the function definitions in the source code makes no difference to the program’s meaning.
If you understand basic concepts of programming but know nothing about C, you can read this manual sequentially from the beginning to learn the C language.
You know I can think of a handful of examples where main is not true. C to my understanding does not define that the implementation does and think this is kind of exactly what the other is saying that this implied assumptions lead to issues in C development. This albeit is an extremely superficial assumption that isn't in 99% of cases going to matter except in of course security and embedded systems
I think the biggest issue with C is that it's taught as the programs you are building is foundation of everything while brushing under the rug that in also all course work you are not build C on bare metal you almost always are building on on OS which runs an implemented hardware which are extremely important. But by teaching your building on this all powerful language makes overconfident programmers. I've known so many developers who jump the gun and think they are write what will happen on the computer without understanding how high in the clouds they normally are.
But if you declare the foo function before main, it works correctly regardless of where it's defined. The definition just has the effect of also declaring it, which is what makes the difference.
(Of course you know all this much better than I do, but someone younger reading this thread might be mystified.)
That wasn't the statement in the book, though. The thing is, this is a serious misfeature which isn't present in other languages, and can cause bizarre problems to the poor newbie user.
This order dependency is why C source code tends to be written bottom-up, rather than the more natural top-down.
gcc 12.2 on Arch warns about it with no switches - sure RMS did not put a disclaimer on his claim but one can reasonably assume he meant if you write correct C the order will not matter. (Btw I have a feeling every statement about C would need to carry that disclaimer lol - main will be called if you did not LD_PRELOAD a .so that caused premature exit in init function etc.)
gcc /tmp/testf.c
/tmp/testf.c: In function ‘main’:
/tmp/testf.c:4:18: warning: implicit declaration of function ‘foo’ [-Wimplicit-function-declaration]
4 | printf("%d\n", foo(3));
| ^~~
On x64, foo(double) expects its argument in xmm0, but foo(int) expects it in rdi. So the 3 passed in rdi is ignored and it operates on whatever happened to be in xmm0.
I am stupid, too. I generally do not use git(1). I wrote a short script to download the manual as a single 639K text file. I do not use a GUI or mouse so this makes it easier for me to work with code snippets. I also reformat the text to match personal preferences. Here is an example script using popular curl program to make the HTTP requests.
# requirements: curl, links, perl, texinfo (Linux), texi2html (BSD)
#!/bin/sh
set -v;
x0=c-intro-and-ref.git;
x1=https://git.savannah.nongnu.org/cgit/;
test -d $x0||mkdir $x0;cd $x0||exit;
for x2 in Makefile c.texi cpp.texi fdl.texi fp.texi texinfo.tex;do
echo url=$x1$x0/plain/$x2;
echo output=$x2;
echo user-agent=\"\";
done|curl -K/dev/stdin -s;
case $(uname) in :)
;;*BSD) texi2html --no-headers --no-split --html c.texi
;;Linux) makeinfo --no-headers --no-split --html c.texi
esac;
# personal preference for ~15-inch screen: 4 space indent, 60-70 max chars per line
links -width 70 -dump .html \
|sed '/^ *Link:/d;/^ *\*/s/\*//;s/^/ /;/Jump to: /{N;N;d;}' > c.txt;
exec less c.txt;
but that only saves like one or two keystrokes but is probably good to know for shell trivia questions or when implementing something that supposedly can handle shell continuation lines
djb used this style when shell scripting and I believe that is where I picked it up.^1 I have not seen (m)any HN commenters publishing their work for public scrutiny who are more skilled in programming, including C, than djb. Other programmers, good ones, but not a large number, copied his C style.
Having taught C at University, I found the hardest bit being how many "well, but..."s to give.
C feels like a very simple language, but there are lots of subtle corners which probably don't need teaching at first.
On the other hand, I think it's important to hammer home "undefined behaviour" early. There are too many guides that say writing outside the bounds of an array "writes to memory outside the array", which simply isn't true. It might write, it might not, who knows, undefined behaviour means all bets are off.
In general I feel a lot of practical teaching of C is teaching about undefined behaviour, which is something many other languages (Java, Python, Haskell, Rust), either don't have, or where they do beginners won't stumble across it.
A good tactic may be to teach D instead. It's rather close to C (especially with the -betterC flag which disables among other things the garbage collector) but fixes some of the more egregious flaws. No spiral rule. Proper modules. No preprocessor (but CTFE and mixins instead). It stays close enough to C that undefined behavior exists but far enough away to avoid nasal demons.
and the user has completely disappeared along with their comment/question.
Anyhow, you can think about the __GNUC__ part as an attempt to fast-forward the loop by testing size_t-length chunks for the character by XOR-ing the chunk with a repeated string of that character. The first loop aligns the pointer for the fast-forward. The final loop is then used to test the unaligned part, which when not using __GNUC__, would be the complete loop.
I felt the need to answer because I'd done the work.
C should be taught like any other language. One should understand the historical context in which a language was born and why it gained popularity. A lot of "vintage" C code is indeed bad because elementary software engineering principles were not widely understood back then. Pascal code of the same vintage isn't significantly better, either. We've walked a way in collectively understanding the importance of modularity, cohesion and coupling and balancing complexity, and that smart isn't always better in the long run.
C still is, and will continue to be for the foreseeable future, the lingua franca, the least common denominator. Acknowledging how C comes with a very... interesting set of tradeoffs that make it uniquely well-suited for certain purposes and at the same time incredibly dangerous is a worthwhile proposition if one is aiming to truly understand C development.
True, but a good C course should explain exactly those concepts and what kind of problems (in relation to assembly and other HLLs) was C meant to solve back at the time.
I would highlight the following:
* Structured programming support, which means nested loops and conditionals without a primary need for "goto" jumps, enabling a sense of "depth" that is missing in the "flat" world of assembly
* Expression-oriented syntax, meaning that operators (even those having a side-effect) return a value of a certain type, and can be nested, again enabling recursive program structure versus a flattened one
* Global symbol allocation and resolution, which means that a programmer uses names rather than addresses to refer to global variables and functions
* Abstraction over function calling conventions, which enables the programmer not to worry about function prologues and epilogues and the order of pushing arguments on the stack or in registers
* Automatic storage management, meaning that a function-scope local variable is used by the programmer with its name and the compiler decides whether to put it in a register or at a certain offset in the stack frame
* Rudimentary integer-based type system that has the distinction between a scalar and a fixed-size collection of scalars laid-out sequentially (arrays), and special integers called "pointers", supporting a different set of operations (dereferencing to a certain type and adding or subtracting other integers from them, without any safety guarantees whatsoever)
Nothing more, nothing less. Not understanding these foundations is the source of major pain.
The expression-ness of C really sets it apart from even its successors. Declarations look like expressions, everything does. You can see in C code from the time how heavily expressions are used. C++, Java, etc. all added a bunch of new keywords to the syntax that make it much more like Pascal or other 'normal' languages. The culture of those languages leans much more towards statements as well.
Macro Assemblers, which were never a thing in UNIX, do offer support for structured programming, see MASM, TASM, or going back to the days C was born, something like HLASM on IBM mainframes.
Additionally many of the C features had already been sorted out in JOVIAL, NEWP, PL/I, BLISS among others about a decade before C was born.
C was solving the issues of UNIX v3 design, that is all.
Plenty of languages can be used to teach low level programming concepts.
> C still is, and will continue to be for the foreseeable future, the lingua franca, the least common denominator.
In the context of platform ABIs, sure. The widespread stabilization and ossification of C ABIs is a boon for the rest of the ecosystem but it's entirely at the expense of the C language/stdlib. Hence the performance advantage of projects like fmtlib.
Notwithstanding its ubiquity C is in many ways "The Sick Man of Asia". Every major C compiler is written in C++ with tooling heading the same way. The dominance of C++ in the heterogeneous space has accelerated this trend and spread it to many HPC libraries. Even foundational bits such as Microsoft's UCRT or llvm-libc are written in C++.
On the current trajectory C will become the next Fortran, i.e. a widely used language which is nonetheless unable to support itself.
A little. While I have heard it (from native speakers and non), it's less common. I also find it confusing because "least" can be interpreted as "lowest", or it can be interpreted in context of the following word: "common". "Least common" (meaning infrequent or most rare) changes the meaning to one different and misleading compared to "lowest", which is why I interpret this phrasing as sounding weird.
I think so? To me, least seems to be used for things like patience, distance, tidiness etc whereas lowest seems to be used more for money or countable things.
Just curious, is there a C system programming book that gradually feeds the readers with progressive (in difficulty) sys programming projects? I red CSAPP and that is a good one, but the scope of the projects is a bit limited.
Try Bruce Molay's Understanding UNIX/LINUX Programming: A Guide to Theory and Practice.
It teaches us various Linux/Unix concepts like signal, threads, file I/O, socket, etc. On some chapters, we'll be guided to re-implement various
built-in tools like who, ls, sh, pwd. Very interesting from developer point of view.
C Programming: A Modern Approach was my first introduction to programming outside of writing a few python scripts (didn't know I could define my own functions, for example). With just it and MVSC I was able to pick up a lot of the general concepts and practices, Last project I made while studying the book was a clock for decimal time and French Republican dates in ncurses. This was only back in 2020, so it still is relevant.
I have nothing to say other than that I love C and I will always love C. That love affair began when I first walked through the tutorial chapter of K&R, and it never died down. I'm not sure what it is, but it is just beautiful to me.
We're 4 years away from the time when TFA predicts no new projects will begin with C.
I wonder how the claim holds up? Certainly not well in the embedded space, and Rust mainline kernel development seems still a few years away at least.
I'd argue there's some transition in space fsw to c++, but that's not really significantly different than C+classes, since most c++ features and std:: are not allowed.
When does automotive, iot, and other spaces anticipate tansitioning?
In your case, how much compatibility is lost from using C99 over C89? (I'm comfortable with C99 and still write some low-level multiplatform code in it, but losing the ability to declare variables where they're first used is a bridge too far for me...)
IMO, targeting C89 is only warranted when you know for a fact it's needed. After all, not only is C99 a good improvement but toolchains stuck on C89 are not the type of toolchain I'd want to support.
> In your case, how much compatibility is lost from using C99 over C89?
Until somewhat recently C89 was the latest version of C that was supported by the MSVC compiler. Support for C99 was always terribly broken and Microsoft chose to claim it did supported C99 but without supporting mandatory features, which meant they did not in fact supported C99.
This sad state of affairs only changed significantly in 2020, with a lowkey announcement.
They claimed support for C99 to the extent required for ISO C++ compliance, plus some features critical for a couple of their developers.
Likewise, even with the updated C11 and C17 support, it isn't 100% there, only good enough, because C++ is what matters for Windows developers when not using .NET.
FWIW I’ve seen space FSW instances where both std:: was allowed and forbidden. One instance was soft real-time FSW running on a customized Linux. Most (sane) C++ was allowed, exceptions were not disabled, just disallowed, std:: was allowed during initialization, but not past that, memory allocation was only allowed during initialization as well, although a stray allocation at runtime was only fatal in testing, not during flight. It was logged in both cases and considered a critical issue. PR reviews had strict checklist rules regarding allowed syntax/features. It was very productive using all of the necessary C++ features, within reason, to write code.
"The other way of teaching undefined behavior, by looking at its consequences, is something that we should spend a bit of time on, but it requires a different kind of thinking and we probably won’t expect the majority of students to pick up on all the subtleties — even seasoned professional C programmers are often unaware of these."
I feel that last phrase exemplifies why this should be given quite a lot of attention. It is also the sort of thing that will lead a student to a deeper insight into the language and how it differs from the other procedural languages that they are likely to already be familiar with (the same can be said for memory management, BTW.)
Such a course might be something of a grind, but if so, it would inculcate the right sort of attitude for programming competently in C!
For dessert, take a quick look at some Obfuscated C winners.
As I think about teaching my son programming, I think I would start with C and assembly because although it’s super low level, it gives you a fundamental understanding of how the machine works. The debugging and problem solving skills these languages impart will also be invaluable.
Honestly, I don't think C is a good match for "how the machine works".
If you look at the assembler an optimising compiler produces from C, it can take hours to learn how to map the C to the assembler. Also, all the business with "undefined behaviour" (which they will hit soon, and often) isn't how computers work either -- assembler has (very little) undefined behaviour, if you write to a random memory location it is written to (unlike in C where maybe it is, maybe it isn't).
My recommendation would be to teach them whatever is the quickest way to something they enjoy, wether that be Python, Minecraft, Roblox, Javascript for web dev, or C if they (for some reason) really want to start by doing kernel programming or something like that.
You can learn all the "fine details" later, they have a lifetime to do it :)
>... it gives you a fundamental understanding of how the machine works.
ISO/GNU C is unfit for programming classes of devices as ubiquitous as smartphones, or virtually any type of SoC. There is a reason CUDA/OpenCL/ROCm/SYCL exist and why they can't be programmed like usual C if you want performance.
Genuinely curious: do you know of any C implementations that do not use a stack for such cases?
It seems so prevalent that locally-scoped variables are often referred to as "stack variables" in casual conversation, but I'm curious of cases where it's not true...
It was semantic, not necessarily a literal mapping to memory. It absolutely was a correct way of modeling and explaining how C programs can be understood to work.
There is a comment mentioning this. I do have a somewhat different view that you should minimize the number of "reinterpreting" casts (stealing the C++ terminology) in general, which is often ill-advised at the first place. This practice frequently eliminates the need for -fno-strict-aliasing.
I disagree. That is, when teaching C, I never gave examples avoiding any aliasing, and if it somehow came up, I advised against it. But I don't think it's a good idea to invest teaching time in compiler options regarding aliasing; it's too specific for an introductory C course.
fully agree, use strict-aliasing only after profiling and on your bottleneck code, if necessary, otherwise, just use no-strict-aliasing(linux kernel uses it too)
`-fno-strict-aliasing` is even less formally defined than C itself is, so I don't think it's a good tool, you're just hoping it works for your program and not actually checking. UBSan/ASan cleanliness is a better goal.
C is an achievement that must be unlocked by programming in different other high-level languages for years.
Without this - C will be pointless lecture.
It's high time we declare C "unsafe at any speed" and cease teaching it to every student. Only those who really need it (legacy C codebase, microcontroller with a toolchain that doesn't yet support Rust) need to learn it.
Just telling on yourself, "I can't write software without training wheels", "I couldn't be bothered to learn how pointers work", so flagrantly.
Rust is unspecified, lacks real battle testing, and lacks any substantial track record.
Comparisons to C are comical, and discussing Rust as being a C replacement as if it were a forgone conclusion is just... I mean I can't think of any way to phrase this that won't result in a ban - so use your imagination.
Rust is probably a fine language. A lot of y'all down the rabbit hole need a reality check though.
You can't write software without training wheels either, you just think you can.
I know enough about C to know that I can't write it safely 100% of the time, especially when you introduce things like parsing untrusted input and threading. Thinking you can do this safely, and thinking you don't make mistakes suggests you actually don't know as much about C as you think you do.
There number of subtle and unexpected things that cause UB are pretty concerning. Most of our software that we rely on day-to-day is filled with subtle bugs, many of which will eventually be exploited and used for RCE and other nasty things. I don't understand how that couldn't concern you!
To be clear I don't think we should stop teaching C or anything that extreme. I don't think it should stop being used completely either. Mostly just that we should prefer safe languages when possible and practical, or use hardening features when we do use unsafe languages, like bounds checking for example. A lot of times I think we shouldn't even prefer rust, a lot of userspace software can be written in a GC'd language without issues.
> Just telling on yourself, "I can't write software without training wheels"
I think both of these reactions go too far, in opposite directions. Of course the fact that C is difficult doesn't mean that we should stop teaching it. It could just as well mean the opposite, that we need to teach it better, certainly as long as it stays in widespread use. But at the same time, we should acknowledge that C is difficult even for experienced professionals, and that "not knowing how pointers work" isn't the main reason every large C and C++ codebase on earth has memory corruption vulnerabilities.
(I'm sure that's not literally true. Someone somewhere must've written a lot of perfect C code. But I think the usual posterchild for well-tested C is sqlite, and even sqlite has had memory corruption issues in the wild.)
I completely agree. Rust exists already so I do not understand why all the other languages need to continue existing. Only those that use Rust can be considered to be a good person since using Rust is the morally responsible thing to do.
The question then becomes, why do we want more evil people in the world?. Does that make any sense? because that's exactly what you get when you don't use Rust.
Rust doesn't even have a formal specification at this time. Technically speaking, everything in Rust is undefined behaviour. That's a hard sell for, e.g., people writing safety-critical systems.
Another problem recently was that compiling Rust required downloading a binary blob from Mozilla. That's a no-go for many projects.
C is not just unsafe in the sense that it supports unsafe behavior.
It's unsafe in the sense that it's nearly impossible for even clever, experienced programmers to write nontrivial amounts of code in it that don't have foot-shooting behavior in some form.
Let them learn C when they need to use C. Consider it a specialist language that is only used for certain tasks, like Fortran or COBOL -- not something everybody has to know.
Edit: I used to teach C to a group of undergrads and one question that came up often was around the special significance of the main function and order of definition of functions.
The RMS book addresses this very succinctly -
https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00005...