28th May, 2006

Common Coding Mistakes, pt1

Sunday, 10:13 am in CodeGirl

Hang around the CodeGrrl forums for a while, and a few things become apparent. The same sloppy coding mistakes keep cropping up over and over again, causing the same problems over and over again. The mistakes are well-intentioned, in that people are generally trying to do the right thing, they just often don’t have enough knowledge of the whys and hows of what they’re trying to do.

With that in mind, I’ve attempted to assemble some of the most common mistakes I see young scripters making, explain why they’re mistakes, and offer some alternatives.


What It Is(n’t)

I suppose I should start at the start; what, exactly, is PHP? I once had a girl tell me it was “much harder than HTML and CSS” and yes, this is true. But why is it true?

HTML is what’s known as a markup language. I suppose the best analogy for it is that it’s an adjective language; it’s used to describe how something (in this case, text) looks. PHP, meanwhile, is a verb-language; used to described what something does.

Technically, PHP gets thrown into the basket of ‘scripting languages’ (as opposed to so-called ‘real’ programming languages like C), mostly because it’s loosely typed. Loosely typed roughly means that you don’t have to worry about managing system memory when writing PHP scripts (they’re scripts, not ‘programs’). PHP scripts do not have memory leaks, for example, and you can happily declare a variable and stick a string in it, then over-write it with a number, then turn it into an array. PHP is also an interpreted language. That means that you do not – here’s some more technical stuff – compile it into a binary in order to run it. In a compiled language, like C/++, you must first write the code in a text file, then run it through a compiler. The compiler is a program that, very simply, checks the code for errors and – if that’s all okay – turns it into a binary file; in Windows, these are .exe files. Programs, in other words. You then run your binary; if it still doesn’t do what you want, you go back to your code, add some more bits and repeat the process. The compiler does a bunch of other stuff, too (such a linking), but that’s the basics of it.

In comparison, PHP is an interpreted language. This means that instead of running the code through a compiler in order to get a binary, we instead run the code through an interpreter, which gives us the output; no binary is created, and we must have the interpreter handy in order to see the output of our script. For PHP, the interpreter is hidden away inside our webserver; this is what it means when hosts say they are ‘running PHP’. Without the interpreter talking to the webserver, our PHP would just show up as flat text files. But on a server configured to run PHP, the webserver has been configured to recognise our file extension – usually .php, but also sometimes .phtml or .php3/.php4 – and send that file to the interpreter. The interpreter then ‘does’ the code, and sends the output – usually a HTML file – back to the webserver for display.

This is also why you hear PHP being talked about as a server-side language; because the code is processed by your server before being handed over to the client (i.e. the person browsing your website). In contrast, HTML is a client-side language; it’s up to the individual client (i.e. your browser) to ‘interpret’ the HTML rather than the webserver. That’s why your webpage invariably looks different to everyone who views it.

Common Mistake #1: MySQL

Oh, right, MySQL. MySQL – and, in fact, any database – is yet another program that runs alongside our webserver and our PHP interpreter. In this case, it’s the PHP interpreter that ‘hands control’ over to MySQL when it encounters code that instructs it to do so; such as everyone’s favourite mysql_query(). The ‘hows’ of this handover, however, are something I often see causing problems.

So how does PHP ‘talk’ to MySQL? The exact low-level specifics of it aren’t really relevant, but from an abstract perspective, PHP does this by creating a link, which is analogous to PHP picking up the phone and making a call to MySQL. Now, MySQL is a bit like a helpdesk; it can accept multiple phone calls (from PHP or other sources) at any one time. It usually has a maximum of about 50 or so. Every time you visit a webpage that uses a database, you are making a new phonecall. If someone else happens to be browsing the same website as you at the same time, that’s another phonecall. If someone is browsing another site on the same server? Yup, another phonecall. The PHP interpreter can make many phonecalls to MySQL, but generally each individual PHP script can only make one phonecall at a time. This is generally where people run into problems.

The problems generally arise when people are trying to integrate more than one script at a time; for example running Enth and WordPress on the same page. If you think of these kinds of webpages as a conversation, here is what usually happens:

PHP: [reading down its script] Oh crap, WordPress needs some data out of the database. [dials] Hello, MySQL?
MySQL: Hello mysql@localhost, how may I help you?
PHP: Yeah, hi. I need some data from your wordpress database.
MySQL: Sure! Let me just patch you through…
WordPress: Hi there, how can I help you?
PHP: Can you get me the data for the last four day’s worth of blog posts?
WordPress: Of course, sending now…
PHP: Thanks! [reads some more] Crap, now Enth needs something. [hangs up phone, redials] Hello MySQL?
MySQL: Hello mysql@localhost, how may I help you?
PHP: Yeah, hi. I need some data from your enth database.
MySQL: Sure! Let me just patch you through…
Enth: Hi there, how can I help you?
PHP: Can you get me the data for the last three joined fanlistings?
Enth: Of course, sending now…
PHP: Thanks! [reads some more] Damnit, now WordPress wants a list of categories; can you get that for me?
Enth: I’m sorry, I cannot find the data you require.

Wow, that was goofy.

Anyway, here we have a classic communications problem. If you think of the mysql_connect() command as the instruction to ‘dial’ up the MySQL server, and the mysql_select_db() command as PHP asking the MySQL switchboard to patch it through to the correct department, hopefully you can see where the problem with ‘mixing scripts’ comes from. That is, PHP scripts are not very smart, and generally only like being on one phonecall to MySQL at once. So what does a PHP script do when it encounters a second mysql_connect() – usually caused by including one script’s head file underneath another’s? It hangs up the first ‘call’. So when the part of the script rolls around that requires it to get data out of the first database again, it gets very confused.

How to fix this?

Well, the simplest answer is to simply put all tables you’re going to use for any one page into the same database. That is, stick your Enth stuff and your WordPress stuff into the same database. Then it doesn’t really matter if you prematurely instruct PHP to ‘hang up’ its connection to MySQL, since the reconnect should be to the same database. Most scripts nowadays are written with a ‘prefix’ option for exactly this situation; that is, all Enth tables are prefixed with enth_ while all WordPress ones are prefixed by wp_. That stops table name clashes.

The other solution is to force PHP to make two (or more!) phonecalls at once. PHP can do this, but its initial instinct is not to. The way to force it, is to use mysql_connect() (it won’t work with mysql_pconnect()) and sticking an extra 1 at the end.

Check the following script:

<?php

$link1 = mysql_connect( 'localhost', 'user', 'password' );
$link2 = mysql_connect( 'localhost', 'user', 'password', 1 );

mysql_select_db( 'db1', $link1 );
mysql_select_db( 'db2', $link2 );

$sql1 = mysql_query( "SHOW TABLES", $link1 );
$sql2 = mysql_query( "SHOW TABLES", $link2 );

while( $t1 = mysql_fetch_row( $sql1 ) )
  print $t1[0] ."<br />n";

print "<br />n";

while( $t2 = mysql_fetch_row( $sql2 ) )
  print $t2[0] ."<br />n";

mysql_close( $link1 );
mysql_close( $link2 );

?>

This script essentially forces PHP to open two connections to MySQL simultaneously; represented here by $link1 and $link2. The fourth argument to mysql_connect() tells PHP to open two links to MySQL; generally if PHP sees two calls to mysql_connect() using the same connection data it will keep only one connection and make the second a reference to the first. You can then use these two links to open two databases. By default, PHP always looks in the last opened MySQL database on any one connection. Since we have two connections here, not one, we can work on two databases. Change the variables around and try the script. Now take out out the ‘1’ and try it again.

See?

Common Mistake #2: Redeclaring Variables

I think this one is 90% caused by bad foundations laid down by CodeGrrl’s Build-a-Blog, though that might be finger pointing, and I’m sure I’ve seen it in a few other scripts too. So what is it?

How many times have you seen (or done) the following:

$sql = mysql_query( "SELECT * FROM sometable" )
  or die( mysql_error() );

while( $r = mysql_fetch_array( $sql ) ){
  $field1 = $r['field1'];
  $field2 = $r['field2'];
  $field3 = $r['field3'];
  $field4 = $r['field4'];

  print "$field1, $field2, $field3, $field4<br />";
}

Come on, fess up; I know you have.

So, what’s wrong with doing this? Nothing. Technically, though it does sort of have the effect of announcing to the serious coders of the world that you’re, erm, a bit of a noob. Sorry, but… it does. Really.

Why? Because it’s wasting memory. Memory management is not really something we worry about much in PHP scripts unless we make dumb mistakes like calling MySQL queries that never end. This is a result of PHP being loosely typed (remember we talked about that above?). However just because we don’t have to worry about something doesn’t mean we shouldn’t, and in my opinion it’s extremely sloppy coding to redeclare a bunch of perfectly good variables. Arrays and objects aren’t scary; we can work with them just as easily as we can any other thing. Sometimes easier; try running print_r() on a reutrn array/object some time and then think of just how useful that can be.

PHP is a very gentle language (if you don’t believe me, go write web applications in CGI Perl or – horrors – C++ some day), and has an extremely robust database interface. mysql_fetch_array() and it’s counterpart (my personal choice; no real reason, I’ve just always used it) mysql_fetch_object() have been programmed with special loving care to return the most human-readable output possible. So please learn to use it; it’s not hard.

Not redeclaring variables also extends into other areas. For example, calling functions with return values on variables. I commonly see this sort of thing:

$var2 = stripslashes( $var1 );

stripslashes() here could be any function call that returns a formatted version of the variable passed to it, where you do not want to keep the contents of the unformatted variable. In this sort of situation, you don’t have to create a second variable to hold the output. Instead you can do the following:

$var1 = stripslashes( $var1 );

Or even:

print "This is a stripped slash string: ". stripslashes( $var1 );

Tips and Tricks: Associative Arrays

Since I made such an impassioned plea for people not to redeclare associative arrays into ‘flat’ variables, I might as well give some tricks on how to use them.

First, what is an associative array? For that matter, what the hell is an array?

Most variables in PHP are ‘flat’; that is, they only hold one value:

$var1 = "Shiina Ringo";
$var2 = 42;
$var3 = '!';

An array, on the other hand, is one ‘wrapper’ variable that holds multiple values. Generally we use them to collate ‘like’ values; which is why they’re good for MySQL output. If we’re getting weblog posts out of a database, all the information about that post ‘relates’. The post’s title, the post’s date, the post’s text and so on. I guess you can think of arrays as a bit like folders in your operating system. Maybe you keep all your music files in a folder called My Music. You don’t put your pictures in My Music – they probably go in My Pictures – because they’re not music. You also (hopefully!) don’t stick all your files in the root of C:\. So is it with arrays; we use them to lump together stuff that, well, goes together.

There are two types of arrays; associative and indexed. Indexed arrays are ‘classic’ arrays; each element (that’s the different values) in the array is referenced by a number, starting with 0 and ending at n-1 (the number of elements total minus 1). They are written like $arrname[0], $arrname[1] and so on.

The second type of array is called an associative array in PHP (in other languages they’re called other things, such as dictionaries); these are arrays that are referenced by a word, called a key, rather than a number.  They aren’t scary, and we can use the elements of associative arrays just like any other variable.

somefunction( $array['key1'], $array['key2'] );
$array['key3'] = stripslashes( $array['key3'] )'

print "This is an array: $array[key1], $array[key2]";

And so on. Notice, however, one thing. When we use an associative array inside a string (anything delimited by – or inside – a ” or ‘) we drop the quotes from around they key word. Technically, we don’t ever have to put the keys of associative arrays inside quotes. By default, if PHP sees something it doesn’t understand, it treats it as a string. However, it’s bad form to do this. Why? Because if the key of your associative array is inside quotes, PHP knows that it’s to be treated as a key and not something else. Have a look at the following:

<?php
function getKeyNum(){
  return 'banana';
}

$arr = array(
  'apple' => "Mmm, apples!",
  'banana' => "Yuck!"
);

$k = getKeyNum();
print $arr[$k];
?>

Yup, you can use other variables to dynamically select which array key you want to access. (I think you can do it with functions, too, but don’t quote me.)

The general rule? Avoid confusing PHP, and know when to use quotes and when not to:

// outside of a string, use quotes
$arr['key'];

// inside a string, drop the quotes
$var = "$arr[key]";

// referencing dynamically, drop the quotes
$arr[$var];

Final Words…

There are more things I could cover – so many more – but they will have to wait until our next instalment I think.

My final word, however, is this; don’t be intimidated. I’m sure some of you have gotten down to this point and are freaking out; “It’s all so much to remember!” The temptation to do things poorly or the ‘easy way’ is very strong, but please try and avoid it. Start from good foundations and it’s much, much easier to write good code. And good code in strong code, and strong code is much, much harder to hack than weak or sloppy code. It’s worth it in the long run, I swear.

Until next time; happy coding.

Comments

  1. User Avatar

    Ever thought about becoming a tutor or something? tongue.png

    Every single word in this post is like a stab in the face because like some spacktard, I can’t get my WordPress thingummybob to work. Gah.

    But you’re right about bananas. They are teh suck (OMG I used some vaguely l33t speak - that’s LOL to the proles that is etc to fade yawn).

  2. User Avatar

    OMG I’ve been sitting on your emails for days now! gasp.png  I’m so sorry! angry.png

    Guh, bad Dee, forgetful Dee.

  3. User Avatar

    Haha; honestly, don’t worry about it. S’not as if it’s desperately urgent or important or anything and s’not as if I really ought to be thinking about that, considering I’ve got to put up all my work on Wednesday for my final, final, Final (apparently, it’s final so I’ve been told) assessment that will determine whether or not the last three-and-a-half years of my life have been for a half-decent degree.

    So the cool kids have blogs. Pfft. SCENE! wink.png

  4. User Avatar

    All fixed; commence with installation!

    You spelt your own name wrong in the DB_USER variable. tongue.png

  5. User Avatar

    Spelt…? No way. <i>No way</i>. I don’t believe you. I don’t… no, no. No. I went back and forth, repeating and starting from scratch… It can’t be… You don’t realise what you’ve done, Dee. I’ve always said if I ever did something spectacularly fucktarded like spell my own name wrong online, I’d ban <i>myself</i> from the internet.

    I’m off to down a sachet or six of silica gel. wink.png

    (Thanks, though)

  6. User Avatar

    You missed the first i in the first instance of your name; it’s an easy thing to do, seriously. I’ve been coding for years, trust me on this; 90% of all coding errors come from dumb shit like that.

    Plus it’s always the last thing you check. “Ho ho ho, I wouldn’t spell my own name wrong!”

    Bzzt, denied. angry.png

  7. User Avatar

    Not Clear

    While I have little doubt that you know what you are talking about, either you are not writing clearly or I don’t see you telling us what to do INSTEAD of the bad coding mistakes you see.  That means, this little article provides no help to me whatsoever when it absolutely could.

  8. User Avatar

    I’m… sorry, but your comment just has me absolutely baffled.  I honestly don’t know how to respond to it.

    The only thing I can assume is that you’re tripping up on the tl;dr of the article.  And yes, it’s extremely tl;dr, because that’s just how I write.  However, I’ve re-read this twice now since seeing your comment, and I can’t really see how this is ‘not clear’.  Maybe that’s my own failing (I’m notoriously bad at being able to assess the quality of my own writing), but considering that part of my professional career is having to write clearly, and my ability to do so is something that I have been praised on since high school…  hrm.

    I suppose I’m just confused because the vast majority of the words here are, in fact, telling people how and why.  It’s not lowest-common-denominator stuff, but then again it’s not supposed to be.  Nor is it really a step-by-step tutorial; it’s discussing three things that, I suppose, have more to do with the theory of programming rather than just the “now write x, now write y” of your average PHP tutorial.  The thing on multiple MySQL connections especially is reasonably ‘advanced’ as far as these things go (it’s about as complex resource management as you’re going to get in PHP).  Maybe the title is slightly misleading in that regard.

    Perhaps it would help if I knew what parts you were struggling on?  I suppose, to break it down, the first part (What It Is(n’t)) is discussing the difference between an interpreted language like PHP, a markup language like HTML, and a compiled language like C/++.

    The second part (Common Mistake #1: MySQL) is discussing how PHP handles connections to MySQL databases, and why your scripts will most likely die if you’re trying to run two scripts side-by-side that keep their data in separate databases.  I see this error a lot at CodeGrrl – usually people trying to run WordPress and a fanlisting script at the same time – and AFAIK I’m the only person there yet (or at least was at the time of initial writing) who’s ever answered why it happens rather than just providing the stock-standard solution (put both scripts in the same db).  If you’ve never encountered this error before, this section will make much more sense if you try the example code.  There’s no simple ‘fix’ to this, incidentally, because (oddly enough) there is no simple fix to this.  You either understand the problem or you don’t.

    The third section, Common Mistake #2: Redeclaring Variables, is talking about some general bad-form coding that I see everywhere.  In a nutshell, it is telling you that instead of doing this sort of thing:

    $myName = $array['myName'];

    print "$myName";

    You should suck it up, learn to code properly, and do this:

    print "$array[myName]";

    And then rabbits on for a bit about why doing the former will make all the cool kids laugh about you behind your back.  Like the previous two sections, it touches on the notion of resource/memory management in programming; something that is vital in ‘hard’ programming languages like C/++, but is often abused terribly in ‘soft’ languages like PHP.

    And finally, the fourth section (Tips and Tricks: Associative Arrays) is just telling people not to be scared of associative arrays.

    Does that clear things up a little?

    [b]Editblush.png/b] Incidentally, I fixed up the dead sk.fan link.  My bad for dropping a script in the middle of writing it…

  9. User Avatar

    Yes, that does clear it up. Forgot to mention that the part I wasn’t getting was the redeclaring variables.  Now that I understand what you were aiming to say, the reason I declare the variables like that (the way you don’t suggest) is simply because when I type it all out multiple times, I end up forgetting an apostrophe or something and I get super annoyed. I found that doing it that way helps with that.

    Do you know how much difference it makes for resource or memory to do it the “proper way”?  I don’t really care what people think of me based on how I code so if it doesn’t really make that much of a difference, I’m not going to change.  Considering I know how to do it, I just choose to do it the first way to lessen on my mistakes when I type fast.

  10. User Avatar

    Arrays Are Luff!

    Do you know how much difference it makes for resource or memory to do it the “proper way”?

    Not in explicit numbers, but you can work it out using common sense.  In a nutshell, if you have five variables, and then you dump the value of those five variables into another five variables then you have doubled the memory allocation your script requires (because you’re telling it to store twice the data).

    The question is does this actually matter?  With any compiled application the answer would be, “Hell yes!”… but PHP isn’t a compiled language.  It’s interpreted, and realistically the kinds of scripts that you or I are likely to be commonly writing there is absolutely no noticeable difference to the user no matter what you do.  If you notice, down the bottom of the page there is a timer; even on the bulkiest of sk.log pages (and the code here is far from optimised) it will rarely crack 2 seconds execution time.  It’s pretty exceptional for it to hit 1 second.  Your users simply cannot tell (because human brains are slow) the difference between a page that loads in 0.1 seconds and one that loads in 0.2 seconds.  So from a user perspective, it’s moot.

    Does it makes a difference to the machine you’re running the script on?  Again, probably not; you’re talking such tiny scripts using such tiny amounts of processing power, that you are just never going to kill the PHP interpreter.  Ever.1  And PHP does its own garbage collection (it destroys all variables at the end of every script execution), so you’re not running the risk of memory leaks or whatever.  About the only time I could think that this sort of behaviour might affect the execution of a script is if you were redeclaring a variable that was holding a huge amount of data (e.g. a whole image file encoded in binary)… but even then, I’d be surprised.

    I mean, you can also simply call an unset() on the array you’ve just not used after you’ve cannibalised all the values out of it, leaving you back at square one in the memory allocation stakes.

    Having said all that, there is a big But…, and it’s got to do with Programming Theory 101.

    I’m going to make an assumption here and guess that you’ve never learnt ‘hard’ programming in an institutionalised environment (i.e. school), so just bear with me for a moment while I rabbit on a bit, because there’s some background…

    Anyway, there’s this anecdotal story that every CompSci student learns in their first C++ class, and it’s about a control structure called goto.  goto is, I suppose, like the # symbol for a named internal link in an HTML page, only for programming languages.  You use it like this:

    #include <iostream.h>

    void main() {

      int i = 0;
      START:
      i++;
      cout << "Counter is at: " << i;
      goto START;

      return;
    }

    What this is doing, is setting a variable called i, naming a ‘start point’, incrementing the counter, printing the value of the counter, then using goto to return to our start point.  It’s a very, very basic control structure that is replaced in more mature languages by while() and for().

    But not in BASIC; goto is the control structure in BASIC.  Once Upon a Time, BASIC was written as My First Programming Language; a gentle introduction into CompSci for beginners.  Some of those people who’d started on BASIC then moved onto harder languages like C… and started running into problems.  Because BASIC was, well, basic and what works for ten-line exercise doesn’t always work for a several thousand line application.  goto was one of those things; notorious for reducing code into unreadable spaghetti.  Now, no C teacher in their right mind tells their students to use goto, but people had picked up the ‘habit’ in BASIC programming and carried it over to C.

    This is why I don’t like the whole thing about redeclaring variables.  It doesn’t matter too much in a language like PHP, but it has the same ‘feel’ to it as goto does, and what seems harmless now has the potential to get someone into real trouble in a tighter language, especially when you start getting into pointers and the difference between a bitwise and logical copy.  To people who’ve come from a formal programming background, it looks inexcusably sloppy.  And – as you’ve said yourself – lazy, and lazy coding is bad coding; full stop, end of story.  (I should know; I’m a master of it.)

    I dunno, maybe this is a Real Programmer™ thing; I can see how someone who doesn’t come from a formalised CompSci background would consider all this pointlessly snotty and elitist, but…  I really did have to learn this stuff the hard way.  And it is important; quality code is Srs Bizness.  Start getting lazy in one area and, well, it creeps.

    It’s not a habit I’d ever teach to others.

    1. Not quite true; I think I’ve done it once, but I was at the time using PHP to connect to a db2 database to retrieve massive amounts of data which I was then displaying graphically. ^
  11. User Avatar

    Just to be clear, I didn’t say it was lazy. I said it reduced my errors and in all actuality, I’m typing more so I don’t refer to it as lazy coding.  However, I get what you’re saying.  I’m a bit of a perfectionist so I kind of want to do it the proper way, as you call it.  I’ll just have to watch my coding better and I’m pretty good at spotting a missing apostrophe and whatnot. I have problems later on when I try to print the variable sometimes and I think that is why I stopped trying to do it that way.  My laptop is caput right now or else I’d attempt it again and let you know what I mean.

    P.S. You are right, I didn’t learn any of my programming in a school environment. I am all self-taught (minus a HTML Programming course in HS that I took after I already knew HTML and it didn’t even teach about DOCTYPES and such).

    Anyway, thanks for explaining to me about this.  I needed it “dumbed down” I guess because you were being a bit too technical for me.  

Add Comment
auto insert line breaks
use log.code
use smilies
Verification
  • v-s.net v0.6 and all content (unless noted) © Dee.
  • sk.log v0.6 spat this out in 2.138 seconds.
  • 77 / 181,758
artistic-twobyfour