here is just a quick little reminder that if you are ever parsing usernames and or user based content, think if you can parse non-Latin based text
The Problem
Recently I have ran into an issue where the regex for my parsing code, simply does not work on non-Latin based alphabets. For example, if I wanted to parse the display-name from this string: display-name=CoalTheTroll;emotes=;flags=;id=3ceab6bd-de3f-4d05-8038-5cebdb2af1c7; :tmi.twitch.tv USERNOTICE #cohhcarnage
The typical code would look like this:
fun userNoticeParsing(text: String):String{
val displaynamePattern = "display-name=([a-zA-Z0-9_]+)".toRegex()
val displayNameMatch = displayNamePattern.find(text)
return displayNameMatch?.groupValues?.get(1)!!
}
A simple solution (some might say lazy) is to not worry about ASCII character sets. With regex, we simply say, match all characters after display-name. The code would look like this:
fun userNoticeParsing(text: String):String{
val displayNamePattern = "display-name=([^;]+)".toRegex()
val displayNameMatch = displayNamePattern.find(text)
return displayNameMatch?.groupValues?.get(1) ?: "username"
}
with the regex code above, display-name=([^;]+), we are stating. Match display-name= and any characters that follow one or more times, stop matching once you find a ;. The ()brackets allow us to break the regex expression into groups allowing for a easier match and quick retrieval of what we actually want. Lasty we us the ?: operator to say, if not match is found return "username"
Now, even with character based display names, such as Mandarin our code will work:
Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.
Simplifying the regex to match all characters after display name= seems like a pragmatic approach. Have you considered potential downsides or edge cases with this method? Construction Services in San Antonio TX
Comment hidden by post author
Comment hidden by post author - thread only accessible via permalink
I appreciate the clarity and thoroughness of your explanation regarding the challenges of parsing non-Latin based Twitch usernames in Kotlin. Your wordle unlimited solution, while labeled by some as simple, is indeed pragmatic and effective.
Comment hidden by post author
Some comments have been hidden by the post's author - find out more
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (3)
I have hidden a comment trying to convince users to click on a sketchy link. SCAMMER NO SCAMMING!!!
Simplifying the regex to match all characters after display name= seems like a pragmatic approach. Have you considered potential downsides or edge cases with this method?
Construction Services in San Antonio TX
I appreciate the clarity and thoroughness of your explanation regarding the challenges of parsing non-Latin based Twitch usernames in Kotlin. Your wordle unlimited solution, while labeled by some as simple, is indeed pragmatic and effective.
Some comments have been hidden by the post's author - find out more