In some ways this is a Part One of this subject, largely because my IO subsystem isn't in any way finished and I have literally just put something together so that I can get things loaded in to my test framework, but the basic idea I have here is one I'll probably base a few things on so it is worth quickly writing about.
Loading Data
As we know unless you are going fully hard coded procedural with your project at some point you are going to need to load things. It is a pretty fundamental operation but one with an array of solutions in the C++ world.
The two I have been toying with, trying to make up my mind between, have been Async IO (likely via IOCP on Windows) or Memory Mapped IO.
I've done some experiments in the past with the former, hooking up IOCP callbacks to a task system based around Intel's Threading Building Blocks and it certainly works well but I'm not sure it is the right fit; while I'm interested in being able to stream things in an async manner other solutions could well exist for the async part of the problem when coupled with another IO solution.
Which brings us to memory mapped IO, which in some ways is the fundamental IO system for Windows, being built upon (and a part of) the virtual memory subsystem. While not async, and risking stalling a thread due to page faults, it does bring with it the useful ability to be able to open views in to an already open file, perfect for directly reading from an archive for example.
A Mapping we will go
Memory mapped IO on Windows is also pretty simple;
- Open target file
- Create a file mapping
- Create a view in to that file mapping
- Use the returned pointer
Then, when you are done, you unmap the view and close the two handles referencing the opened file mapping and file you want to use. (If you were doing archive-like access then you might not do the latter two steps until program end however.)
The code itself is pretty simple, certainly if we want to open a full file for mapping;
char * OpenFile(const std::wstring &filename)
{
HANDLE fileHandle = ::CreateFile(filename.c_str(), GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE, 0,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if (fileHandle == INVALID_HANDLE_VALUE)
return nullptr;
int fileSize = query_file_size(fileHandle);
if (fileSize <= 0)
return nullptr;
HANDLE fileMappingHandle = ::CreateFileMapping(fileHandle, 0, PAGE_READONLY, 0, 0, 0);
if (fileMappingHandle == INVALID_HANDLE_VALUE)
return nullptr;
char* data = static_cast<char*>(::MapViewOfFile(fileMappingHandle, FILE_MAP_READ, 0, 0, fileSize));
if (!data)
return nullptr;
return data;
}
It gets a bit more complicated if you want offsets and the like, however for our initial purposes it will do.
Enter C++
There is, of course, an obvious problem with the above; no clean up code and all you get back is a char *
which doesn't really help; at best you can undo the mapping of the file but those handles are lost to the ether.
So what can we do?
One approach would be to wrap the data in an object and have it automatically clean up for us; so our function returns an instance of that class with the associated destructor and access functions.
class DirectFileHandle
{
public:
DirectFileHandle(char * data, HANDLE file, HANDLE mapping) : data(data), file(file), mapping(mapping) {};
~DirectFileHandle()
{
::UnmapViewOfFile(data);
::CloseHandle(fileMappingHandle);
::CloseHandle(fileHandle);
}
char * getData() { return data; }
// default copy and move functions
// plus declarations for holding the data and two handle pointers
}
Not a complete class, but you get the idea I'm sure.
However that's a lot of work, plus the introduction of a type, in order to just track data and clean up.
Is there something easier we can do?
std::unique_ptr to the rescue!
As mentioned all we are really doing is holding a pointer and, when it dies, needing to clean up some state which we don't really need access to any more.
Fortunately in std::unique_ptr
we have a class designed to do just that; clean up state when it goes out of scope. We can even provide it with a custom deletion function to do the clean up for us.
So what does our new type look like?
std::unique_ptr<char, std::function<void(char*)>>;
As before the primary payload is the char*
but we directly associate that with a clean up function which will be called when the unique_ptr goes out of scope.
From there it is a simple matter of changing our function's signature to return that type and update our final return statement with my new favourite C++ syntax;
return{ data, [=](char* handle)
{
::UnmapViewOfFile(handle);
::CloseHandle(fileMappingHandle);
::CloseHandle(fileHandle);
return;
}
};
As with the gate io login code from the previous entry we don't need to state the type here as the compiler already knows it.
The capture-by-copy default of the lambda ensures we have a copy of the handle objects come clean up time and the address of the data is supplied via the call back.
But what about those error cases? In those cases we change the returns to be return{nullptr, [](char*) { return; }};
effectively returning a null pointer.
The usage so far
A quick example of this in usage can be taken from my test program, which I'm using to test and build up functionality as I go;
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
_In_opt_ HINSTANCE hPrevInstance,
_In_ LPWSTR lpCmdLine,
_In_ int nCmdShow)
{
sol::state luaState;
luaState.open_libraries(sol::lib::base, sol::lib::package);
luaState.create_named_table("Bonsai"); // table for all the Bonsai stuff
Bonsai::Windowing::Initialise(luaState);
using DirectFileHandle = Bonsai::File::DirectFile::DirectFileHandle;
DirectFileHandle data = Bonsai::File::DirectFile::OpenFile(L"startup.lua");
luaState.script(data.get());
std::function<bool ()> updateFunc = luaState["Update"];
while (updateFunc())
{
Sleep(0);
}
return 0;
}
A few custom libraries in there, however the key element is dead centre with the file open function and the usage of the returned value on the next line to feed the Lua script wrapper.1
The Problem
There is, however, a slight issue with the interface; we have no idea of the size of data being returned.
Now, in this case it isn't a problem; one of the nice function of memory mapped files on Windows (at least) is that due to the security in the kernel memory pages returned by the OS to user space get zero initialised. In this case we can see that by catching things in a debugger and then looking at the memory pointed at by data.get()
which is, as expected, the file content followed by a bunch of nulls filling the rest of the memory page.
Given that setup we are good when we are loading in string based data but what if we need something else or simply want the size?
At this point it is temping to throw in the towel and head back towards a class, but a simpler option does exist; our old friend std::pair
which in this case will let us pair a size_t
with the pointer to the data handler.
The Solution
So, first of all we need to perform some type changes;
using FileDataPointer = std::unique_ptr<char, std::function<void(char*)>>;
using DirectFileHandle = std::pair<size_t, FileDataPointer>;
What was our DirectFileHandle
before now becomes FileDataPointer
and DirectFileHandle
is now a pair with the data we require. Right now I've decided to order it as 'size' and 'pointer' but it could just as easily be the reverse of that.
After that we need to make some changes to our function;
FILEIO_API DirectFileHandle OpenFile(const std::wstring &filename)
{
// as before
if (fileHandle == INVALID_HANDLE_VALUE)
return{ 0, FileDataPointer{ nullptr, [](char*) {return; } } };
The function signature itself doesn't need to change thanks to our redefining of our alias, however the return types declared in the code do.
Previously we could just directly construct the std::unique_ptr
and the compiler would just figure it out, however if we try that with the new type it seems to change deduction rules and we get errors;
return{ 0, {nullptr, [](char*) {return; } } };
// The above results in the error below from MSVC in VS17
error C2440: 'return': cannot convert from 'initializer list' to 'std::pair<::size_t,Bonsai::File::DirectFile::FileDataPointer>'
note: No constructor could take the source type, or constructor overload resolution was ambiguous
The compiler has decided what we have supplied it with is an initialiser list and as such tries to find a constructor to convert, but as none existed it produced an error.
(I believe this is a legitimate problem and not a case of 'early compiler syndrome')
So, we have to supply the type of the std::unqiue_ptr
in order to sort the types out. This change is repeated all down the function at the various return points, including the final one, where the main difference at that point is that we return the real file size and not the 0 place holder.
After that we need to make a change to the usage site as now we have a pair being returned and not a wrapped pointer to use;
auto data = Bonsai::File::DirectFile::OpenFile(L"startup.lua");
luaState.script(data.second.get());
In this case nothing much changes but we now have the size around if we need it.
But we aren't quite done...
Now, if we had a full C++17 compiler to use we could make use of one final thing; structured bindings
Structured bindings give us syntax to unpack return values in to separate variables; we can do something like this already with std::tie
2 but structured variables allow us to both declare the assign at the same time.
// C++14 way
using FileDataPointer = Bonsai::File::DirectFile::FileDataPointer;
// Declare the variables up front
FileDataPointer ptr;
size_t size;
// Now unpack the return value
std::tie(size, ptr) = Bonsai::File::DirectFile::OpenFile(L"startup.lua");
luaState.script(ptr.get());
// C++17 way
// Declare and define at the same time
auto [size, ptr] = Bonsai::File::DirectFile::OpenFile(L"startup.lua");
luaState.script(ptr.get());
That, however, is for a future compiler update; for now we can stick to the std::pair
method which at least allows us to switch to the C++17 syntax as and when the compilers can handle it.
Summing up
In a real setup you would check for nulls before using however this code demonstrates the principle nicely I feel.
(I also know it works as I made a slight error on the first run where my lambda captured by reference, meaning at callback time I got a nice crash during shutdown as the handle it was trying to reference was no longer valid.)
So there we have it, a simple C++17 based Memory Mapped File IO solution - I'll be building on this over time in order to build something a bit more complex, but as a proof of concept it works well.