Pre-SIP: Sharp (string) interpolation

How would the destination type be inferred your example?

It would make sense that this would work:

val foo: SharpLiteralParser = #5m/s^2#
// Expands to:
val foo: SharpLiteralParser = summon[SharpLiteralParser].apply("5m/s^2")

This should also work:

def foo(slp: SharpLiteralParser): ???

foo(#5m/s^2#)
// Expands to:
def foo(summon[SharpLiteralParser].apply("5m/s^2"))

But if it’s just val foo = #5m/s^2#, then it’s not clear to me where SharpLiteralParser would be coming from.

I gave it a shot and it seems to work (scala 3.1.3): Scastie - An interactive playground for Scala.

import scala.annotation.experimental
import scala.util.matching.Regex

case class Acceleration(accelValue: Int)

trait SharpLiteralParser {
  transparent inline def apply[S <: String with Singleton](string: S): Any
}

@experimental
object StandardUnitsParser:
  type AccelerationRegex = "(\\d+)m\\/s\\^2"
  val AccelerationRegex: Regex = compiletime.constValue[AccelerationRegex].r

  type TargetSharpType[S <: String with Singleton] =
    compiletime.ops.string.Matches[S, AccelerationRegex] match {
      case true  => Acceleration
      case false => Nothing // how to make it compile time error?
      // if you need more variants, you can add nested type matching here
    }
  given SharpLiteralParser with {
    override transparent inline def apply[S <: String with Singleton](
        string: S
    ): TargetSharpType[string.type] = (string match {
      case AccelerationRegex(value) => Acceleration(value.toInt)
      case _                        => sys.error("unrecognizable literal")
    }).asInstanceOf[TargetSharpType[string.type]]
  }

@main @experimental
def main(): Unit = {
  import StandardUnitsParser.given

  // with syntax sugar it would be:
  // val acceleration1 = #5m/s^2#
  val acceleration1 = summon[SharpLiteralParser].apply("5m/s^2")
  println(acceleration1)
  // IntelliJ is lost in the line below, but scala compiles and runs it fine
  println(acceleration1.accelValue)
  val acceleration2: Acceleration = acceleration1
  println(acceleration2.accelValue)

  val problem = summon[SharpLiteralParser].apply("gibberish")
  println(problem) // stacktrace is a little bit off, but throws at runtime
}

Doesn’t look perfect, but that’s probably not a big deal for users (since they deal with the simple API, not with the complex internals). For users the much bigger deal is that IntelliJ is lost, i.e. it can’t compute static type, so practically all the goodies that come from IDE are lost, unfortunately :frowning:

2 Likes

Reviving this topic, I have an alternative proposal that provides a better way to look at this.

A Single-word String Syntax

Currently for strings we have two syntax possibilities:

  • A single line string (spaces are allowed):
    "some sentence example"
  • A multiline string:
"""some
paragraph
example"""

After some though, I think this proposal aims to simply introduce a single word string and interpolation syntax via the # character:

  • A single word string:
    val x = #example //this is a single word string
    phone#+1-555-555-5555 //single word interpolation

Yes, the general motivation for this proposal is to represent numeric values with further ease, but one justification for this proposal is the same as why we don’t have a single syntax for all strings that just uses """? To be explicit of what the limitation of the string is and save us some characters. A single-word string has the same reasoning.

Rules

  1. String word start and end
    Now all is left is to define grammatically what ends a string word, once it starts with #?
  • The obvious word terminators: white-space characters & new-lines
  • Semicolon: I think it should terminate. val helloWord = #hello; val x = 5
  • Colon: I think it should not terminate. Not special enough.
  • Another #: I think it should terminate to avoid confusion.
  • Comma: I think it should terminate to avoid confusion inside of a parameter list if leaving out spaces.
  • Period: By itself it should not terminate. See next section how it does terminate.
  1. Period+alphabet - a special terminator
    The major question is what about period since it’s often used both inside numbers and for Scala identifier selectors. I propose that a period followed by an alphabet character causes the string word to terminate and start an identifier selector parsing.

  2. Escape characters
    There will be no support for escape character like \n, \t, as \ is just like any basic character within a single word string.

  3. Interpolated values
    No interpolated values support should be enabled, IMO.

Examples

Here is an updated list of examples, following the updated rules above:

val justAWord = #justAWord
val binVal = b#101001100101
val hexVal = h#0304903FFAA
val bigValErr = big#345,463,489,989,893,859,438,943,643 //Illegal syntax (comma terminates)
val dateVal = date#22.02.2022
val date2Val = date#22/02/2022
ip#192.168.0.1.connect //defines an `ip` interpolation and selects `.connect`
ip6#fe80::aede:84ff:fe10:1722.connect //okay
val phoneVal = phone#+1-800-555-5555
val phoneList = List(phone#+1-800-555-5555,phone#+1-800-777-7777) //comma is a terminator now, so no problem
val complexArithmetic = c#33.2e5-j15.1 * c#22.1-j1e7 //two complex number multiplication
val complexArithmeticErr = c#33.2e5-j15.1*c#22.1-j1e7 //error. The string interpolation fails.
val fractionArithmetic = f#2+1/3 * f#2/3 //two and a third times two thirds
//physical units combination
val velocity = p#12.5m / p#22.3s
val length = p#3s * p#12.4m/s
p#9.81m/s^2.setGravity //selecting `setGravity` on the interpolated `9.81m/s^2`
p#9.2m/s.setGravity //`setGravity` can fail at compile time with "wrong physical units. Found speed units instead of acceleration"
//legal syntax, for the sake of explaining the grammar, but not recommended
val dotWord = #4dots:.....print //runs print on the string "4dots:...."
1 Like

Okay, this seems like something I could get behind, here is my personal list of endings however:

  * The obvious word terminators: white-space characters & new-lines
  * Semicolon: I think it **should** terminate. `val helloWord = #hello; val x = 5`
  * Colon: I think it **should not** terminate. Not special enough.
- * Another `#`: I think it **should** terminate to avoid confusion.
+ * Another `#`: I think it **should** fail at compiletime to avoid confusion.
  * Comma: I think it **should** terminate to avoid confusion inside of a parameter list if leaving out spaces.
-  * Period: *By itself* it **should not** terminate. See next section how it does terminate.
+  * Period: It **should** terminate, `#hello.+('!')` is `hello!`.
+ * Closing parens: **Should** terminate, therefore
+ * Openning parens: Should either **terminate** as well, or **fail at compiletime**

We do not want to allow #Hello# as an alternative to "Hello", and therefore have to introduce parsing errors, which I believe to be useful anyways, notably for parens

I think I like all of this proposal except #justAWord. While I understand the conceptual motivation, this one’s just an alternate way to define a String literal, and to be honest, it just doesn’t look like a String literal to me. I think it would introduce more confusion than benefit.

So I might recommend tweaking this so that you have to precede the # with a String type declation. At that point, it’s pretty distinctive, and (modulo the discussion of terminators, which I do think is a bit tricky) mostly reads pretty clearly.

3 Likes

Oh and if we really need a decimal separator, we can always co-opt the comma (,)
I really don’t want periods in single word strings, that’s where my brain instinctively cuts text
(Even with syntax highlighting, this would still be an issue when writing)

In my opinion the rules are too complex to justify the addition whose main motivation seems to be saving a single character.

If we just use normal strings delimited by “ I don’t think your example list is too ugly.

Simplicity is an important value that I still hope we can drive towards.

4 Likes

It’s not only that. When you write a complex numeric expression, it can be very difficult to find a missing " character.

Can you easily find the missing " characters?

(p"12.5m" / p22.3s" * p"2.4m/s") + (p"-23.5m/s + p"-23.5m/s)

The benefit of a single # is that it terminates very early, unlike quotes that require a matching pair.

I don’t think I had any problems in your example. This is in part perhaps because even this syntax highlighting gets this right. Even if you aren’t using a fancy IDE, almost everyone has strings highlighted differently and so I don’t remember having any difficulties finding the ends of strings (except when using nesting single quoted strings inside triple quoted).

6 Likes

I asked ChatGPT to give me some feedback. Interesting…

Thank you for sharing your Scala improvement proposal. It introduces a novel approach to handling single-word strings and interpolations, particularly focused on numeric representations. Let’s explore the potential use-cases of this proposal, especially in the context of data science, and compare it with Python’s current capabilities.

Use-Cases in Data Science

  1. Simplified Numeric Representations: Data scientists often deal with complex numeric data like hexadecimal values, binary data, or scientific notation. The proposed syntax (b#, h#, big#, etc.) can simplify the representation of such data, making Scala code more readable and concise.

  2. Physical Units Representation: The ability to easily define physical units (e.g., p#9.81m/s^2) can be extremely useful in scientific computing and data analysis involving physical measurements. This makes Scala more attractive for simulations, physics-based modeling, and engineering applications.

  3. Efficient String Handling for Single Data Points: Data processing sometimes requires handling individual data points as strings. The proposed single-word string syntax can make this more efficient, especially when dealing with large datasets where memory and processing time are crucial.

  4. Data Formatting and Parsing: For data scientists who work with date and time formats or specific numeric formats (like IP addresses or phone numbers), this syntax offers a more straightforward way to handle such data without the need for additional parsing or formatting functions.

Comparison with Python

Python, widely used in data science, does not currently have a direct equivalent to this proposed Scala feature. Python’s strengths lie in its simplicity and the extensive availability of libraries for data manipulation (like Pandas, NumPy, and SciPy). However, when it comes to the specific handling of single-word strings or specialized numeric formats, Python relies on standard string formatting or additional libraries.

  1. Verbosity: In Python, formatting strings with special numeric representations often requires more verbose syntax or additional library calls. Scala’s proposed feature could streamline these operations.

  2. Readability: Scala’s proposal might offer improved readability for specific numeric representations compared to Python, which can be especially beneficial for complex data manipulation and analysis.

  3. Custom Interpolations: While Python supports string interpolation (f-strings since Python 3.6), Scala’s proposal to combine interpolation with specific data formats (like phone#+1-800-555-5555) is unique and can be particularly useful in data parsing and formatting tasks.

  4. Physical Units Handling: Python requires external libraries (like Pint) for handling physical units, while Scala’s proposed syntax integrates this directly into the language, potentially offering more seamless integration for relevant applications.

Conclusion

This proposal for Scala could enhance its appeal and utility in data science, especially for tasks involving complex numeric representations, specific data formatting, and handling physical units. While Python remains a strong contender in the data science field due to its simplicity and library ecosystem, Scala’s proposed feature could offer more concise and readable code for specific use cases, thus enhancing its usability in scientific and engineering applications.


Can you please provide of a concrete syntax comparison example between python and Scala if this feature was accepted? Pick any data science or machine learning use-case and show syntax example assuming Scala has a NumPy or SciPy equivalent libraries.

Certainly! For this comparison, let’s consider a use case in machine learning where we need to handle complex numbers and perform arithmetic operations on them. This example is relevant in fields like signal processing or quantum computing. We’ll compare how this task could be handled in Python using NumPy (a popular library for numerical computing) and in Scala with the proposed single-word string and interpolation syntax, assuming the existence of equivalent libraries to NumPy or SciPy in Scala.

Use Case: Complex Number Arithmetic

Task

Perform arithmetic operations on complex numbers. Specifically, let’s add two complex numbers and then multiply the result by a third complex number.

Python Syntax (Using NumPy)

import numpy as np

# Define complex numbers
complex_num1 = np.complex(3.2, -1.5)  # Equivalent to 3.2 - 1.5j
complex_num2 = np.complex(2.1, 2.7)   # Equivalent to 2.1 + 2.7j
complex_num3 = np.complex(-0.5, 0.4)  # Equivalent to -0.5 + 0.4j

# Perform arithmetic operations
result = (complex_num1 + complex_num2) * complex_num3

print(result)

Scala Syntax (With Proposed Feature)

// Assuming similar complex number handling is available in Scala's NumPy equivalent
// and using the proposed single-word string syntax for complex numbers

// Define complex numbers
val complexNum1 = c#3.2-j1.5  // Using the proposed syntax for complex number
val complexNum2 = c#2.1+j2.7  // `c#` indicates a complex number; `j` is the imaginary part
val complexNum3 = c#-0.5+j0.4

// Perform arithmetic operations
val result = (complexNum1 + complexNum2) * complexNum3

println(result)

Analysis

  • Readability and Conciseness: Scala’s proposed syntax allows for a more concise representation of complex numbers (c#3.2-j1.5) compared to Python’s np.complex(3.2, -1.5). This can make Scala code more readable and succinct, especially when dealing with numerous complex numbers.

  • Library Dependency: Python relies on NumPy for complex number operations, while the Scala example assumes similar functionality is built into the language or available through equivalent libraries.

  • Ease of Use: For data scientists familiar with complex numbers, Scala’s proposed syntax could be more intuitive and closer to the mathematical notation used in textbooks and academic papers.

Conclusion

If Scala were to adopt this proposed feature, it could offer a more streamlined and expressive way of handling complex numbers in scenarios common in data science and machine learning. This would enhance Scala’s usability in fields that require sophisticated numerical computations, potentially making it a more attractive option for certain types of scientific and engineering applications compared to Python.


Scala already has the capability of enabling these features with its full string interpolation. E.g., c"3.2-j1.5". What are the benefits, if any, for this feature addition? What are the drawbacks?

You’re correct that Scala’s existing string interpolation capabilities, such as c"3.2-j1.5", already allow for custom and flexible handling of strings, including complex formats. The proposed single-word string syntax (#) introduces a different approach. Let’s explore the potential benefits and drawbacks of this addition:

Benefits

  1. Conciseness and Readability: The single-word syntax reduces the need for additional characters (like quotes and interpolator prefix) around a single word or a numeric value. This can make code more concise and potentially improve readability, especially in cases where numerous such values are used.

  2. Clarity of Intent: By using a specific syntax for single-word strings, it becomes immediately clear to the reader that the value is intended to be a single word or unit without any spaces or additional formatting. This explicitness can be beneficial for code clarity.

  3. Ease of Parsing: For the compiler or interpreter, the proposed syntax might simplify the parsing process for these specific types of strings, potentially leading to slight improvements in compilation or interpretation speed.

  4. Specialized Use Cases: In scenarios like handling physical units, numeric values, or specific formats like IP addresses, the proposed syntax offers a more domain-specific approach that can be more intuitive in certain contexts.

Drawbacks

  1. Learning Curve and Complexity: Adding another way to handle strings introduces more complexity to the language. This can increase the learning curve for new Scala programmers or those not frequently dealing with such use cases.

  2. Redundancy: Given that Scala’s full string interpolation already provides robust capabilities, the proposed feature might be seen as redundant. The existing features might be sufficient for most use cases, making the additional syntax unnecessary.

  3. Inconsistency with Other Languages: This feature could further differentiate Scala from other programming languages in terms of syntax, which might be a drawback for developers who work with multiple languages and prefer consistent syntax patterns.

  4. Limited Scope and Flexibility: The proposed feature, being specifically targeted at single-word or numeric representations, might be less flexible than full string interpolation. It is optimized for a narrow range of use cases and might not offer the same level of versatility.

Conclusion

The proposed single-word string syntax in Scala offers benefits in terms of conciseness and clarity for specific use cases, particularly those involving straightforward, single-word or numeric data representations. However, it also introduces additional complexity and potential redundancy given Scala’s existing string interpolation capabilities. The value of this feature would largely depend on the frequency and nature of its use in Scala programming, particularly in domains where such specific string handling is common.

1 Like

I actually think IDE parsers are a counter argument since they start coloring everything as a string once an opening quote is introduced.

I had no trouble picking those out without syntax highlighting.

(p#1.5m / 22.3s * p#2.4m/s) + (p-23.5m/s + p#-23.5m/s)

Was that easier than your example? What about

(p#1.5m / #22.3s * p#2.4m/s) + (m#-23.5m/s + p #-23.5m/s)

Same, but again, IMO opinion with quoted strings the highlighter is actually working against you because the parsing is greedy for quotes.
image

A space end marker for word strings acts as a quick terminator and helps isolate such incidents.
, while a space will be quick to terminate

I’d actually say the spillover nature of quotes actually helps see that something is wrong: why would abstract class be in a string ? Something’s wrong !

I agree both should terminate.

We used to have something similar, but it used the ' character…

3 Likes