Programming Language Syntax Comparison - part 4: Javascript, AJAX, and Security

Wonderful toys

When the number of ways a user needs to interact with an interface increase, designing a sensible interface becomes more complex. If I continued with the same style I had been using on some of the simpler pages, there would have needed to be several pages in sequence to allow a user to enter data in a sane manner. This creates several problems:

  • Having to persist more data going forward, over several pages
  • Forcing the user to have to travel back and forth between pages when editing lots of data
  • Requiring the user to retain information about choices they made from the previous page

That is bad design. So I dug up my sleeves and got around to trying to implement JavaScript and AJAX for the first time. At first, I thought this would allow me to only have to build two pages instead of one; but eventually I managed to get everything onto one page.

Side by side of the multiple views on the Syntax edit page

Each of those three tables are defined with a separate php controller and view, and then loaded onto the main page using AJAX. The user never has to reload the page to bring up another language, or to bring up a new set of data to be edited. Even updating rows in the database from these interfaces can be done without a page load. The main 'syntax editing' page itself has a few empty divs/tables to load those UI's onto, as well as a bunch of supporting java script functions.

The basic user interface flow works like this...

  • Select a programing language from the drop down
  • User can interact with an overview of all of the rows in the database for that language.
  • The Major Group names on the left can be clicked to filter the list of minor groups on the right. (When the database has been fully populated, there will be about 23 major groups, and over 100 minor groups.)
  • The major group columns that define if a database row exist for the language, or if that row is marked as 'has content' for the language can be clicked to insert or toggle that flag respectively in the database.
  • Clicking on edit, brings up the third UI which actually lets you edit the 'coding syntax' for the language, along with a template showing the general form that code should be in.
  • Each row of 'coding syntax' for each subsection can then be updated individually to the database.

I made the design decision that users would only be able to view and edit the fields in one major section at a time. Giving them the option to edit more would just put to much info on the screen to be useful. I've even added in some ctrl+keypress events to make adding some of the simple html formatting onto the textarea. This brings up the next topic...

Securing against injection

Right now the design is such that the database stores html formatted text. This text is garbed from the database and inserted 'as is' where it is displayed as html on the main page. In order for an administrator to edit the content, they beed to safely insert html markup into the web form and submit it to the database. So I tried a little experiment to see what would happen if someone attempted to send malicious code to the database...

First I attempted to inject some php code using "Hi! <?php print("hacked!"); ?>" into my textarea. Pressing the 'test' button, which will bring up the current content in a new window as it will appear on the site, the only value I could see was "Hi!". After inspecting the html element, it appears that it automatically html commented out the <?php ?> line for me. Interesting...

Ok, for the next test I tried injecting a script to the database with the following text. "Hi! <script>alert("hacked!");</script>". Sure enough, when loading the test window I got a pop up alert meaning my submitted text had actually run as a script on the site. Not good.

My first instinct was that there must be some special html command that would be able to disable scripts from running within a certain <div> within the html. But after a good bit of searching it seems that there is nothing as simple or easy as that. I did end up reading a lot of interesting sites about how to protect one self from injection attacks. Going by their guidelines, it seems most of my site was already covered with escaping or parameter passing.

Eventually I found out that there was an existing php library called HTML Purifier that allows you to extensively escape your code in a safe way. They even had a demo page where you could see how it would effect different code. Attempting a few of my amateurish hacks, I saw it covered the basic cases I was worried about while preserving the main html markup I was interested in keeping.

Adding this library to my website, I was now a lot more secure. For all that, I still would rather keep this administrative tools private for now. But I feel slightly safer about eventually deploying them.

Making sure my marked up language was safe was only part of the problem though...

Preserving Style

The other half of the issue is making sure that data is preserved no matter how many times it travels between the html form and the database. So if I use the website to update a row to the database, load that data back into the web form, and then update it again; no data will be lost. This includes line breaks, spaces, html markup, and html encoding.

In addition to that, I want the text that displays in the editable textarea to be relatively readable, with proper spacing and line returns; so that they are easier to edit when someone goes back in to change them.

Unfortunately this started out not to be the case... One of the biggest culprits was the "<" sign. Anytime this symbol was preceded by another character, it would refuse to appear in the resulting text when passing the string to be displayed or inserted. This was probably a byproduct of HTML purifier and whatever natural escaping the browser was doing. For instance "#include <stdio.h>" would drop everything after the "#include" when it displayed.

So one of the things I had been doing with javascript is to add some keypress commands so users can quickly add <br /> or the <span class="comments"> markup. I created a new event so that users could add the HTML entity for the left bracker, to insert it as &lt; for these cases. It would be on the user then to use that entity command for bracket, while still allowing them to preserve the <b> as is.

Problems with recursion with problems with recursion with...

This worked fine at first. The textarea will show "&lt;stdio>, and insert that properly to the database. The database will return it properly for display to the main page. The main page would display the &lt; as '<' which is what we want.

But the next time the text was loaded back into the textarea for editing... it displayed as "<stdio.h>" again. Meaning if you tried to update that field to the database again, even without changing any data, the <stdio.h> will vanish from all subsequent displays...

I can't just regx the < away either, as this would likely also regex the '<' in '<b>' and prevent any of my markup from working . At least not without a completely ridiculous number of exception cases, which would break every time I wanted to add a new feature or didn't pre format my text juuuuust right.

Eventually I was able to solve the problem by further surrounding the htmlpurify($content) statement, which is removing all of the malicious scripts, with a htmlspecialchars() before displaying it in the textarea. What this does is actually convert the '&' in '&lt;' to its HTML entity value. So the html that shows up in the edit textarea looks like we want it to again. "&lt;stdio>"

There are probably still a lot more finicky issues like this involving other special character combinations I haven't encountered yet. I'm still dealing with a few edge cases of preserving newline spacing in the edit form, and regexing the white space so it properly displays indentation. I dread what will happen when I try to insert JavaScript and PHP language syntax into the database...

Content Complete

In addition to all of the logic and functionality implementation I've been working on, I did some cleaning in my HTML structure so that files are easier to locate for developers. I've also started implementing some css styling to clean up some of the inline code as well.

Really, despite some of the quirks with the data entry process, the site is otherwise content complete. I could probably focus now on doing data entry for the 120 sub categories for each language. The code examples already exist for the three languages from the harding version.

There are still a few more advanced or convenience features I would like to add to the admin tools, and I am a bit concerned that the java script portion of my content editing page is reaching 250+ lines... but at this point, doing the data entry and then polishing up the CSS should be enough to make the user side of the site fully functional.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *