The 2013 Academy Awards ceremony has brought the chaos in the VFX industry to the forefront. This week on Divergent Opinions, Mike and I covered this story, as well as a bunch of cool research.
Category Archives: Uncategorized
MOOCs: Solving the wrong problem
Over the last six months, the world of “e-learning” has been totally and completely overrun by the MOOC, or massive open online course. In my years involved with technology in higher education, I’ve never seen another concept suck the oxygen out of the academic technology space in this fashion. While not fundamentally new, the MOOC has totally dominated the e-learning conversation and dramatically shifted higher education agendas.
I’m a proponent of the core concepts of the MOOC. Reducing barriers to knowledge is a good thing. An education model based on keeping knowledge locked up in (forgive me) an ivory tower is fundamentally unsustainable in a digital world. I also believe that large, lecture-based classes can be inefficient uses of student and instructor time, and there are plenty of cases of subpar instruction in higher education.
The dramatic rise of the MOOC within the mainstream press and higher education discourse has resulted in traditional institutions rushing to join the party. It is this action that I believe is shortsighted, poorly reasoned, and potentially destructive.
Let me begin by articulating some of my assumptions. The term “MOOC” (we could have a whole separate article on how horrible the acronym is, but it’s too late now) means different things to different people. To some extent, it has become a “unicorn” term – a way for the uninformed to sound prescient without needing to back their words with knowledge. The types of courses I’m addressing here on are those taught in the traditional large lecture hall, usually introductory courses. These courses typically fill a lecture hall with disinterested freshmen, and often taught by a disinterested graduate student. Coursework generally consists of readings, discussion sections, and multiple choice and short answer tests.
The MOOC movement instead takes the lectures, breaks them into bite-sized chunks, and offers them presented by articulate, informed professors. Students would watch these lectures online, participate in online discussion with other students, and then take standard assessments. Because this type of course can be infinitely scaled, it can be opened to any and all interested parties – traditional students as well as those learning simply “for fun”. A single course can effectively support an unlimited number of students with negligible overhead.
I should note that, although I tend to think about these issues in terms of the institution I’m most familiar with – the University of Minnesota – I believe the lessons and missteps I’ll discuss are being repeated throughout the higher education system.
Dissecting the MOOC
As I stated at the beginning, I think the MOOC concept is perfectly fine. Taken on its own, the typical MOOC outlined above is a reasonable approach to replacing large lecture coursework. If it means lectures are better designed, more polished, and delivered with more passion and energy, that’s even better.
I do take issue with the notion that the MOOC is a righteous democratization of knowledge, opening it up to those learners simply interested for the joy of learning. I think this is a complete red herring argument, presented by MOOC backers to deflect valid criticism. The statistics bear this out: retention rates for MOOCs are consistently in the 10-15% range. It turns out that although most people think learning for fun sounds like a good idea, the reality is that it rarely happens. Moveover, the MOOC format does not lend itself to answering a specific question or providing a specific skill – training and tutorial organizations like lynda.com and the Khan Academy are far better suited to these types of needs. If my end goal is to learn Photoshop or integral calculus, I’m unlikely to participate in an ongoing, sequenced course like a typical MOOC.
That said, I don’t otherwise find serious fault with the way MOOCs are being run by firms like Coursera and Udacity. If they’re able to sustainably produce a product that users are interested in, great.
The higher education response
The response from higher ed, on the other hand, is seriously flawed.
Let’s begin with the language. The companies behind the MOOC use language that assumes that traditional classroom instruction is broken. To hear them describe it, the typical lecture hall is a post-apocalyptic wasteland of bored students and depressed faculty, droning on, with little to no learning taking place. Rather than responding with an impassioned, evidence-based defense of this form of instruction, higher education has by and large accepted this characterization and decided that MOOCs are indeed the answer. Institutions have entered this discussion from a defensive, reactionary position.
Are large lecture courses perfect? Certainly not. In fact, I’m a huge proponent of the blended model, which seeks to shift lecture delivery to polished, segmented, online videos, with class time dedicated to deep dives and discussion. But I do think large lecture courses have value as well. A good lecturer can make the experience engaging and participatory, even in a lecture hall with hundreds of students. And, the simple act of trudging across campus to sit in a lecture hall for 50 minutes acts as a cognitive organizer – a waypoint on a student’s week. Keep in mind, these courses are typically made up primarily of eighteen year old freshmen, struggling with life on campus and figuring out how to organize their time and lives.
These large lecture courses almost always include a group discussion component. This is a time for students to participate in a mediated, directed discussion. More importantly, this is a time for the nurturing and growth that I believe is critical to the undergraduate experience. Seeing classmates, hearing their inane or insightful questions, listening to them articulate the things that they’re struggling with – these are all important but difficult to quantify parts of learning, which are not captured by a solitary or participation-optional online experience. Even a highly functional synchronous online discussion is inherently less likely to go down an unexpected rabbit hole in the exciting and invigorating way that a great classroom discussion can – even in a large lecture hall.
Instead of making the case for traditional courses and shifting the conversation to ways to make them even better (and more affordable), institutions have rushed to turn their own courses into MOOCs, and offer them to the world. This brings me to my next serious concern about this movement.
Why does anyone care about your MOOC? If the MOOC concept is taken to its logical end, the world needs a single instance of any given course. Institutions seem to be unwilling to acknowledge this. Instead, there’s an assumption they’ll build the courses and the world will beat a path to their door. Why would I choose to take a course from the University of Minnesota, when I can take the same course from Harvard or Stanford? Are institutions ready to compete at the degree of granularity this type of environment allows?
The rush to create MOOCs reminds me a bit of the little free library system. I lack the metrics to convincingly argue this point, but as an outside observer, it seems that people are generally more interested in creating the libraries than using them, and it feels like the number of libraries equals or exceeds the number of customers. I believe that the MOOC landrush risks resulting in a similar imbalance. If every school, from the prestigious to the second or third tier, rushes to offer MOOC forms of their courses, there is likely to be an abundance of supply, without the requisite demand.
Heating Buildings is Expensive. MOOCs are by and large the stereotypical “bubble” product. There’s little to no business model behind them, either for their corporate backers or participating institutions. Although that’s probably fine for the companies delivering the courses – their overhead costs are relatively low – it’s a huge issue for institutions with massive infrastructure and staff overhead. If we’re moving to a model where the cost of coursework is dramatically reduced, it presents existential threats for the other missions of the institution, or the very existence of the institution all together. While it’s assumed that institutions will continue to charge as part of the degree-granting process, nobody seems to think that a course delivered via a MOOC should be priced similar to a course delivered in a classroom.
How does the institution benefit? How does offering a MOOC make the institution a better place? Less expensive options for receiving course credit are certainly a benefit for students, but that is a separate issue. Higher education is far too expensive, but MOOCs are not the sole solution. In general, the value proposition for the University is esoteric – simply offering the course supposedly means the world will enhance the content by bringing a wider audience, and its existence will enhance your institution’s standing in the world. These are, at best, hopelessly idealistic. Because there’s no business model to speak of for MOOCs, Universities are left shouldering the costs of creating the courses with little to no expectation of having that value returned.
An alternative path
Having great courses delivered by great instructors with great instructional design is… great. I’d much rather take Introduction to Economics from Paul Krugman than a grad student. Having that type of content available is inarguably better for the world.
I believe the path forward is to leverage the best content, and combine it with the high-touch, in-person experience that makes undergraduate education so important, particularly for traditional students. Mediated discussions, in-person office hours, and writing assignments graded by people with (ostensibly) actual writing skills are the types of growth activities that create the high functioning, well-rounded people our society needs.
It’s also crucial for higher education to begin pushing back against the language of the broken classroom. Although institutions are indeed broken in innumerable ways, by and large, instruction actually works pretty well.
It’s critical as well that a clear distinction is drawn between MOOCs and online education in general. Along with Jude Higdon, I teach an online course which is in many ways the anti-MOOC. Our students do large amounts of writing, which is copiously annotated by the instructors. Students participate in synchronous chat sessions and office hours with the instructors. Although lecture content is reused, the interpersonal interaction is real, genuine, and frequent. This concept is, by design, not scaleable. But I believe the benefits in terms of breadth and depth offered to students by this experience are demonstrably better than that offered by a MOOC. Institutions need to be honest about these tradeoffs.
A MOOC is not a magic bullet. It will not solve higher education’s substantial woes. It will create new woes.
Your MOOC will almost certainly not make a dent in the universe. The world will not beat a path to your door, and you still need to pay to maintain the doors.
Pebble, Day One
My Pebble watch arrived yesterday. This was a project that I kickstarted back in May of 2012. While moderately behind schedule, they’ve delivered a functioning, elegant product which does what it was supposed to.
It’ll take some time for the Pebble ecosystem to develop. Right now, in addition to serving as a watch, the Pebble can control music playback on my iPhone, and display notifications (for example, show me a text message as it comes in). Eventually, I suspect we’ll see whole new types of application for this breed of glance-able, connected device.
Already, I’m finding the notification functionality pretty attractive. It’s great to not have to pull my phone out of my pocket (particularly when all bundled up in the winter) to see who’s calling or check an iMessage. The build quality seems excellent, and the whole device works pretty slick. I’m still getting used to the whole notion of wearing a watch, having not done so since the 90s (!!), but so far I can give it a thumbs up.
Divergent Opinions: Bad Uses for Brain Stimulation
This week on Divergent Opinion, we round up the news from the week, with a focus on some new and some not-so-new codec news.
Waiting for Energon Cubes
Energy storage is a key component in our inevitable move away from fossil fuels. If we ever want renewables to take over for base-load demand (having a wind farm keep your fridge running even when the wind isn’t blowing), or drive long distances in plug-in electrics, we’ll need a serious revolution in energy storage.
This is a field I’m very excited about, both in the near term and the long term. It’s an area where there are still big problems to be solved, with lots of opportunities for real ground-up innovation in basic physics, materials science, chemistry and manufacturing.
There’s a need to begin developing our language around energy storage, and to develop a more thorough understanding of the tradeoffs involved. This has been made abundantly clear by the coverage surrounding the battery issues of the Boeing 787 Dreamliner. Most mainstream press has been unable or unwilling to cover the science behind the battery issues, or to accurately explain the decision making that lead to the selection of the type of batteries involved.
When we talk about energy storage, there are two key factors we need to talk about. Energy density and specific energy. Energy density is how much energy you can fit into a given space (megajoules/liter), and specific energy (megajoules/kg) is how much energy you can “fit” in a given mass.
Let’s look at a concrete example. The Tesla Roadster relies on a large battery pack, made up of many lithium ion cells. The battery pack weighs 450 kilograms, and has a volume of approximately 610 liters. It stores 53 kilowatt hours of energy (190 MJ). So, it has an energy density of 0.42 MJ/kg and a specific energy of 0.73 MJ/liter.
For comparison, let’s look at the Lotus Elise (I could have said “my Lotus Elise” but I didn’t, because I’m classy), which is fundamentally the same car running on gasoline. It can carry 11 gallons of gasoline in its fuel tank. Gasoline has an energy density of 46 MJ/kg, and a specific energy of 36 MJ/liter (those of you screaming about efficiency, hold on). The 29 kilograms of gasoline in a full tank represent 1334 MJ of energy, approximately 7 times more than the 450 kilogram Tesla battery pack. Frankly, it’s a wonder the Tesla even moves at all!
Now, it’s important to add one more layer of complexity here. Internal combustion engines aren’t particularly efficient at actually moving your car. They’re very good at turning gasoline into heat. The very best gasoline engines achieve approximately 30% efficiency at their peak, so of that 1334 MJ in the Lotus’ tank, perhaps only 400 MJ are actually used to move the car. The rest is used to cook the groceries that you probably shouldn’t have put in the trunk. The electric drive train in the Tesla on the other hand is closer to 85% efficient.
That’s a quick example of why understanding some of the engineering, science, and math behind energy storage is important – without means for comparison, it can be difficult to grasp the tradeoffs that have been made, and why products end up being designed the way that they are.
I’ll dig deeper into the specific types of battery technologies on the market and the horizon in a future post. At the moment, they’re all within approximately the same ballpark for density and specific energy, and simply offer different tradeoffs in terms of charge times, safety, and longevity.
Batteries are not the only way to store energy though, and aren’t nearly as sexy as some of the alternatives.
Fuel cells have fallen out of vogue a bit over the last few years. While Honda forges on, most of the excitement seems to have been supplanted, for now, with acceptance of the fact that the lack of a large-scale hydrogen distribution network dooms them to a chicken-or-the-egg fate for the foreseeable future. Fuel cells operate by combining stored hydrogen with oxygen from the air, to release energy. Because hydrogen can be made by electrolyzing water, fuel cells are a feasible way of storing energy generated by renewable sources.
Due to the increased efficiency of an electric drivetrain and the high energy density (though lower specific energy) of hydrogen, a fuel cell drivetrain can rival gasoline for overall system efficiency. Unfortunately, they achieve all of this using a variety of exotic materials, resulting in costs that are completely unrealistic (think hundreds of thousands of dollars per car) and look likely to remain there for the foreseeable future. That said, just today saw word of a fuel cell technology-sharing deal between BMW and Toyota – perhaps there’s still some life in this space.
There’s another type of energy storage, which excites me most of all – these are the “use it or lose it” short term energy storage technologies, which are designed primarily to replace batteries in hybrid drivetrains, or to smooth short term power interruptions in fixed installations.
I’d like to explore these further in depth in the future, but for now, a quick survey is appropriate. The technology I’m most interested in is kinetic energy storage in the form of flywheels. At it’s most basic, you take a wheel, get it spinning, and then couple it to a generator to convert the motion back into electricity.
This is an old technology. Traditionally, you used very heavy wheels, spinning relatively slowly. This type of system is sometimes used in place of batteries for short term power in data centers. In the last few years, flywheels have gotten interesting for smaller-scale applications as well, thanks to modern materials sciences. A small amount of mass spinning very fast can store the same amount of energy as a large amount of mass spinning very slowly. Modern materials and manufacturing mean it’s realistic to build a hermetically sealed flywheel which can spin at hundreds of thousands of RPM. Ricardo has done just that, as has Torotrak.
These systems have the advantage of being relatively lightweight, simple and low-cost. While they don’t store a large amount of energy, they’re ideal for regenerative braking and increasing the overall efficiency of an ICE drivetrain.
Another category of energy storage is thermal storage. These are what they sound like – means to store heat (from solar most often) for extended periods of time. This is another old technology, with some interesting new twists. Remember that gasoline engines turn lots of their energy into heat. Some manufacturers are experimenting with systems which can convert some of that heat into energy, using good old fashioned steam.
A final type of storage which doesn’t fit nicely into any category is compressed air. This week, Peugeot-Citroen (PSA) unveiled their compressed air hybrid drivetrain. This system uses compressed air pressurized by an onboard pump, driven through regenerative braking and other “waste” energy capture. While more complex than a flywheel, total energy storage is much greater as well, and PSA is talking of 30% reductions in emissions thanks to this technology. Tata has also experimented with cars using the MDI compressed air drivetrain, which is designed to be “fueld” by an offboard compressor.
As I noted at the beginning, part of what makes me excited about this space is that it’s not a solved problem. There are loads of companies all around the world creating innovative solutions. Most of them will probably fade away, but some have a reasonable chance of replacing or supplementing the “status quo” energy storage options we have today. Interestingly as well, no one country is dominating the research in this space. The UK, in keeping with tradition, has a large number of very small companies working on projects (their “cottage industries” are often actually housed in cottages!), while the US does this sort of development primarily via research institutions, and other countries rely on government-run labs.
Until all are one, bah weep grana weep ninny bon.
Yet Another Blog Reboot
I’ve lost track of how many blogs I’ve had over the last 15 years. While most of them have been personal or travel-oriented in nature, I’ve maintained Discrete Cosine – on and off – since January of 2006.
Originally, Discrete Cosine was intended as a video-industry oriented site – news, reviews and tips on things I was doing while working in the TV Studios at the University of Minnesota. Over time, it followed my migration into academic-technology oriented work, and then became fully dormant when I left the University in August of 2011.
I’ve now migrated the old Discrete Cosine content to the WordPress platform, and that’s where you find yourself now. Rather than start fresh, I’m considering this a bit of a reboot – I like the Discrete Cosine name too much to leave it behind.
For now, this site will just be a place for me to post thoughts about technology (and related stuff) which are too long to fit in a tweet.
I’ll continue to post more personal content on the major social networks, and my travel blogging will continue to live at travel.discretecosine.com.
We’ll see how long this lasts, but for now, it’s a start. Please bear with me as I get things a bit more organized.
An introduction to reverse engineering
(This blog is still in hibernation, but I needed somewhere to post this)
Reverse engineering is one of those wonderful topics, covering everything from simple “guess how this program works” problem solving, to poking at silicon with scanning electron microscopes. I’m always hugely fascinated by articles that walk through the steps involved in these types of activities, so I thought I’d contribute one back to the world.
In this case, I’m going to be looking at the export bundle format created by the Tandberg Content Server, a device for recording video conferences. The bundle is intended for moving recordings between Tandberg devices, but it’s also the easiest way to get all of the related assets for a recorded conference. Unfortunately, there’s no parser available to take the bundle files (.tcb) and output the component pieces. Well, that just won’t do.
For this type of reverse engineering, I basically want to learn enough about the TCB format to be able to parse out the individual files within it. The only tools I’ll need in this process are a hex editor, a notepad, and a way to convert between hex and decimal (the OS X calculator will do fine if you’re not the type to do it in your head).
Step 1: Basic Research
After Googling around to see if this was a solved issue, I decided to dive into the format. I brought a sample bundle into my trusty hex editor (in this case Hex Fiend).
A few things are immediately obvious. First, we see the first four bytes are the letters TCSB. Another quick visit to Google confirms this header type isn’t found elsewhere, and there’s essentially no discussion of it. Going a few bytes further, we see “contents.xml.” And a few bytes after that, we see what looks like plaintext XML. This is a pretty good clue that the TCB file consists of a . Let’s scan a bit further and see if we can confirm that.
In this segment, we see the end of the XML, and something that could be another filename – “dbtransfer” – followed by what looks like gibberish. That doesn’t help too much. Let’s keep looking.
Great – a .jpg! Looking a bit further, we see the letters “JFIF,” which is recognizable as part of a JPEG header. If you weren’t already familiar with that, a quick google for “jpg hex header” would clear up any confusion. So, we’ve got the basics of the file format down, but we’ll need a little bit more information if we’re going to write a parser.
Step 2: Finding the pattern
We can make an educated guess that a file like this has to provide a few hints to a decoder. We would either expect a table of contents, describing where in the bundle each individual file was located, some sort of stop bit marking the boundary between files, byte offsets describing the locations of files, or a listing of file lengths.
There isn’t any sign of a table of contents. Let’s start looking for a stop bit, as that would make writing our parser really easy. Want I’m going to do is pull out all of the data between two prospective files, and I want two sets to compare.
I’ve placed asterisks to flag the bytes corresponding to the filenames, since those are known.
1E D1 70 4C 25 06 36 4D 42 E9 65 6A 9F 5D 88 38 0A 00 64 62 74 72 61 6E 73 66 65 72 42 06 ED 48 0B 50 0A C4 14 D6 63 42 F2 BF E3 9D 20 29 00 00 00 00 00 00 DE E5 FD
01 0C 00 63 6F 6E 74 65 6E 74 73 2E 78 6D 6C 9E 0E FE D3 C9 3A 3A 85 F4 E4 22 FE D0 21 DC D7 53 03 00 00 00 00 00 00
The first line corresponds to the “dbtransfer” entry, the second to the “contents.xml” entry. Let’s trim the first entry to match the second.
38 0A 00 64 62 74 72 61 6E 73 66 65 72 42 06 ED 48 0B 50 0A C4 14 D6 63 42 F2 BF E3 9D 20 29 00 00 00 00 00 00
01 0C 00 63 6F 6E 74 65 6E 74 73 2E 78 6D 6C 9E 0E FE D3 C9 3A 3A 85 F4 E4 22 FE D0 21 DC D7 53 03 00 00 00 00 00 00
It looks like we’ve got three bytes before the filename, followed by 18 bytes, followed by six bytes of zero. Unfortunately, there’s no obvious pattern of bits which would correspond to a “break” between segments. However, looking at those first three bytes, we see a 0x0A, and a 0x0C, two small values in the same place. 10 and 12. Interesting – the 12 entry corresponds with “contents.xml” and the 10 entry corresponds with “dbtransfer”. Could that byte describe the length of the filename? Let’s look at our much longer JPG entry to be sure.
70 4A 00 77 77 77 5C 73 6C 69 64 65 73 5C 64 37 30 64 35 34 63 66 2D 32 39 35 62 2D 34 31 34 63 2D 61 38 64 66 2D 32 66 37 32 64 66 33 30 31 31 35 65 5C 74 68 75 6D 62 6E 61 69 6C 73 5C 74 68 75 6D 62 6E 61 69 6C 30 30 2E 6A 70 67
0x4A – 74, corresponding to a 74 character filename. Looks like we’re in business.
At this point, it’s worth an aside to talk about endianness. I happen to know that the Tandberg Content Server runs Windows on Intel, so I went into this with the assumption that the format was little-endian. However, if you’re not sure, it’s always worth looking at words backwards and forwards, just in case.
So we know how to find our filename. Now how do we find our file data? Let’s go back to our JPEG. We know that JPEGs start with 0xFFD8FFE0, and a quick trip to Google also tells us that they end with 0xFFD9. We can use that to pull a sample jpeg out of our TCB, save it to disk, and confirm that we’re on the right track.
This is one of those great steps in reverse engineering – concrete proof that you’re on the right track. Everything seems to go quicker from this point on.
So, we know we’ve got a JPEG file in a continuous 2177 byte segment. We know that the format used byte lengths to describe filenames – maybe it also uses byte lengths to describe file lengths. Let’s look for 2177, or 0x8108, near our JPEG.
Well, that’s a good sign. But, it could be coincidental, so at this point we’d want to check a few other files to be sure. In fact, looking further in some file, we find some larger .mp4 files which don’t quite match our guess. It turns out that file length is a 32bit value, not a 16bit value – with our two jpegs, the larger bytes just happened to be zeros.
Step 3: Writing a parser
“Bbbbbut…”, I hear you say! “You have all these chunks of data you don’t understand!”
True enough, but all I care about is getting the files out, with the proper names. I don’t care about creation dates, file permissions, or any of the other crud that this file format likely contains.
Let’s look at the first two files in this bundle. A little bit of byte counting shows us the pattern that we can follow. We’ll treat the first file as a special case. After that, we seek 16 bytes from the end of file data to find the filename length (2 bytes), then we’re at the filename, then we seek 16 bytes to find the file length (4 bytes) and seek another 4 bytes to find the start of the file data. Rinse, repeat.
I wrote a quick parser in PHP, since the eventual use for this information is part of a larger PHP-based application, but any language with basic raw file handling would work just as well.
tcsParser.txt
This was about the simplest possible type of reverse engineering – we had known data in an unknown format, without any compression or encryption. It only gets harder from here…
Presentation on the HTML5 video tag
A few weeks back, I was given the opportunity to present at MinneWebcon. My talk, “<video> will be your friend” focused on the legal issues and implementation possibilities surrounding the HTML5 video tag.
I’ve put my slides online, if you want to take a look. I’ve also recorded the first half the of the lecture as part of a test of our Mocha class capture application. I’ll be recording the second half Real Soon Now.
Podcast Producer 2 tip – running xgrid jobs as logged in user
So I’ve been playing with an interesting “feature” in PCP2 – the “chapterize” command generates different results when it can talk to the window server versus when it can’t. In my case, it generates much better results in the case of the former.
“But,” you say, “my PCP2 xgrid jobs can’t talk to the window server!”
Very true. However, you can change the user that PCP2 uses to submit Xgrid jobs, and Xgrid will run the job with that user’s permissions if everyone is single signon’d to the same kerberos domain.
So, now we’ve got PCP2 jobs running as a real user. Next, log into the GUI as that user.
Now, when PCP2 workflows run, they’ll be able to talk to the window server, and at least in the case of “chapterize,” use what appears to be the “Good” code path. Faster, more accurate, more delightful.
Quicktime Eats Cookies
It appears that when embedding Quicktime in a webpage, being viewed by Safari 4 in Snow Leopard, Quicktime no longer passes cookies to the server. So, if you’re having the Quicktime plugin load a file that uses cookie data to verify permissions, you’ll need to move to a query string model.
This only happens when Safari is running in 64bit mode, so I imagine it has to do with the “plugins running as separate entities” crash protection that Snow Leopard adds.
This does not appear to impact the Flash plugin.