やったー!

← Back to the index page

How to split string in Lua

Simple case

There are numerous examples available online that demonstrate how to split a string in the Lua language. Most of them are very specific and do not cover generic and edge cases.

Like this one:

---A very dumb split string function.
---@param str string
---@param sep? string
---@return string[]
local function split(str, sep)
    sep = sep or "%s"
    local t = {}
    for s in string.gmatch(str, "([^" .. sep .. "]+)") do
        t[#t + 1] = s
    end
    return t
end

Probably in most cases it is enough. The advantage of this version is that it is very performant. But if you need a more common solution with a more complex pattern for splitting, even considering UTF-8 encoding, this example above will not work.

Common solution

I would like to introduce the function that covers all edge cases. Of course, it is not such performant as I would like to have. The resЛult is identical to JavaScript’s String.prototype.split().

---Generic split function for splitting the string, taking into account
---UTF-8 encoding.
---@param str string Input string.
---@param sep? string Separator pattern or string (default empty string).
---@param n? number Number of splits: if less than zero, then all substrings are returned.
---If 0 empty table is returned.
---@param offset? number UTF-8 bytes offset (default 1)
---@param plain? boolean Turns off the pattern matching facilities.
---@return string[]
local function split(str, sep, n, offset, plain)
    sep = sep or ""
    offset = offset or 1
    n = n or -1

    ---Result value
    ---@type string[]
    local t = {}

    if n == 0 then
        return t
    end

    local len = utf8.len(str)
    -- If empty string, then return table with single element containing empty string.
    if len == 0 then
        t[#t + 1] = ""
        return t
    end

    local i = 1
    local start = 1
    while true do
        local sepBegin, sepEnd = str:find(sep, start, plain)
        if not sepBegin then
            t[#t + 1] = str:sub(start)
            break
        elseif sepEnd < sepBegin then
            -- If empty separator, then explode string considering UTF8
            t[#t + 1] = str:sub(
                    utf8.offset(str, start),
                    utf8.offset(str, sepBegin + offset) - offset
            )
            if sepBegin < len then
                start = sepBegin + 1
            else
                break
            end
        else
            if sepBegin > start then
                t[#t + 1] = str:sub(start, sepBegin - offset)
            else
                t[#t + 1] = ""
            end
            start = sepEnd + offset
        end
        if n == i then
            break
        end
        i = i + 1
    end

    return t
end

Testing common solution

Tests are performed using Laura testing library.

local laura = require("laura")
local expect = laura.expect
local describe = laura.describe
local it = laura.it
local split = require("./split")

describe("function split()", function()
    local testPairs = {
        { "", "", { "" } },
        { "a,b,c", "def", { "a,b,c" } },
        { "a,b,c", ",", { "a", "b", "c" } },
        { " xyz ", "", { " ", "x", "y", "z", " " } },
        { "abc def", "", { "a", "b", "c", " ", "d", "e", "f" } },
        { "абв где", "", { "а", "б", "в", " ", "г", "д", "е" } },
        { " a b c", " ", { "", "a", "b", "c" } },
        { "Hello Mike, Hello Jane", "Hello", { "", " Mike, ", " Jane" } },
        {
            "a man a plan a canal panama",
            "a ",
            { "", "man ", "plan ", "canal panama" },
        },
        { "Миру - Мир!", "Мир", { "", "у - ", "!" } },
        {
            "月は明るく輝いているa",
            "",
            {
                "月",
                "は",
                "明",
                "る",
                "く",
                "輝",
                "い",
                "て",
                "い",
                "る",
                "a",
            },
        },
        {
            "hello,world.and.dots",
            "[.,]",
            { "hello", "world", "and", "dots" },
        },
        {
            "/home/user/config",
            "[\\/]",
            { "", "home", "user", "config" },
        },
        { "===", "=", { "", "", "", "" } },
    }

    for _, pair in ipairs(testPairs) do
        local name = string.format(
            'should split "%s" with seprator "%s"',
            pair[1],
            pair[2]
        )
        it(name, function()
            expect(split(pair[1], pair[2])).toDeepEqual(pair[3])
        end)
    end

    local testPairsN = {
        { "hello world", " ", 0, {} },
        { "hello world", " ", -1, { "hello", "world" } },
        { "hello world", " ", -999, { "hello", "world" } },
        { "a,b,c,d", ",", 2, { "a", "b" } },
        { "a,b,c,d", ",", 1, { "a" } },
        { "a,b,c,d", ",", 49, { "a", "b", "c", "d" } },
    }

    for _, pair in ipairs(testPairsN) do
        local name = string.format(
            'should split "%s" %d times with seprator "%s"',
            pair[1],
            pair[3],
            pair[2]
        )
        it(name, function()
            expect(split(pair[1], pair[2], pair[3])).toDeepEqual(pair[4])
        end)
    end
end)

--[[
SUMMARY
20 of 20 passing
0 failing
0 skipping

time: ~13ms / mem: 302.93KB @ 2025-02-18 15:41:14
pass
--]]
← Back to the index page