You have been invited to join the Taaalk "Repopulating the Taaalk database from archive.org".
Please type your invite code here:
class Taaalk < ApplicationRecord belongs_to :user # the user who created the Taaalk has_many :speakers # Taaalk speakers != users, this is so an individual # user can have different profiles depending on who # they are talking to has_many :messages has_one_attached :image # the Taaalk's optional background image end
class Speaker < ApplicationRecord belongs_to :taaalk belongs_to :user has_many :messages has_one_attached :image # profile image has_rich_text :biography # a speaker's bio; rich text used to enable links # (e.g. robertheaton.com in your bio above) end
class Message < ApplicationRecord belongs_to :user belongs_to :taaalk belongs_to :speaker has_rich_text :content # the raw input from the rich_text_area I'm typing this # message in now. We can probably ignore. # e.g. Message.last.content.body.to_s # "<div class=\"trix-content\">\n # <div>Hi!</div>\n # </div>\n" has_rich_text :safe_content # the raw content which has gone through my javascript # parser which adds "text bubbles" to each block. # safe_content is used to render messages in front end. # e.g. Message.last.safe_content.body.to_s # "<div class=\"trix-content\">\n # <div class=\"tlk-bubble-holder\"><div class=\"tlk-bubble\">Hi!</div></div>\n # </div>\n" end
class User < ApplicationRecord has_many :taaalks has_many :speakers has_many :messages has_one_attached :image has_rich_text :biography validates :username, presence: true validates :email, presence: true end
[ { "slug": "my-taaaalk-slug", "interviewer": { name: "Alice" }, "interviewee": { "name": "Bob", }, "messages": [ { "created": 1234567890, "author": "interviewer", "contents": "Hello Bob, how are you?" }, // ...etc... ] }, // ...etc... ]
$ ruby html_articles_to_json.rb
Dir.each_child("t") do |d| dir = "t/" + d puts dir Dir.each_child(dir) do |f| file = File.open(f) file_data = file.read puts file_data[0, 30] end end
require 'nokogiri' # Fetch and parse HTML document doc = Nokogiri::HTML(DATA_FROM_FILE) # Search for nodes by css doc.css('nav ul.menu li a', 'article h2').each do |link| puts link.content end
require 'nokogiri' taaalks = [] Dir.each_child("t") do |d| dir = "t/" + d Dir.chdir(dir) do taaalk = {} data = Nokogiri::HTML(File.open("index.html")) taaalk[:title] = data.css('h1')[0].text taaalk[:speakers] = [] data.css('.spkr-info').each do |spkr| speaker = {} spkr.css('h3 a').each do |s| speaker[:profile_path] = s['href'] speaker[:name] = s.text end speaker[:twitter_handle] = spkr.css('.twitter-handle').text speaker[:bio] = spkr.css('.trix-content').inner_html taaalk[:speakers] << speaker end taaalks << taaalk end end pp taaalks # => [..., {:title=>"Bitcoin Maxima & Other Crypto Things", :speakers=> [{:profile_path=>"/u/joshua-summers", :name=>"Joshua Summers", :twitter_handle=>" JoshSummers1234", :bio=> "\n" + " <div>I'm the founder of Taaalk ✌️. Confused by crypto. Hopefully not for long.</div>\n"}, {:profile_path=>"/u/thomas-hartman", :name=>"Thomas Hartman", :twitter_handle=>" thomashartman1", :bio=> "\n" + " <div>Bitcoin investor, wealth manager, and project consultant. <br>blog: <a href=\"https://standardcrypto.wordpress.com/\">https://standardcrypto.wordpress.com/</a>\n" + "</div>\n"}]}, ...]
taaalks = Dir["./t/*/index.html"].map do |path| page = Nokogiri::HTML(File.open(path)) speakers = page.css('.spkr-info').map do |spkr_info| info_header = spkr_info.css('h3 a') { name: info_header.text, profile_path: info_header['href'], twitter_handle: spkr_info.css('.twitter-handle').text, bio: spkr_info.css('.trix-content').inner_html } end { title: page.css('h1')[0].text, speakers: speakers, } end
#map
. My thought process it too linear/these functions are not deeply engrained in me enough; I think, "I need an array that I can fill"; I don't think "I can create and fill the array at the same time". I'm making the same mistake with my hashes. Instead of defining an empty one to fill, I should define them on the fly. It is also much easier to read.Dir.glob("...")
method, or it's shorthand Dir["..."]
. Is there a reason you are starting your string with "./t/"
and not simply "t/"
?./
, I primarily do it to emphasise that the path is deliberately meant to be relative to the current directory and that I didn't forget a leading /
or ~/
. But I think it's personal taste.require 'nokogiri' taaalks = Dir["t/*/index.html"].map do |path| page = Nokogiri::HTML(File.open(path)) speakers = page.css('.spkr-info').map do |spkr_info| info_header = spkr_info.css('h3 a')[0] { name: info_header.text, id: info_header.attr('class').delete("^0-9"), side: spkr_info.attr('class').gsub('spkr-info spkr-info-',''), profile_path: info_header['href'], twitter_handle: spkr_info.css('.twitter-handle').text.strip, bio: spkr_info.css('.trix-content').inner_html, } end messages = page.css('.tlk-blob').map do |tlk_blob| { speaker_id: tlk_blob.css('div[class^="name-spkr-"]')[0].attr('class').delete("^0-9"), created_at: tlk_blob.css('div[class="tlk-blob-date"]').text.partition("(")[0].strip, message: tlk_blob.search('div[target="spkr-color-"]').children.to_html } end { title: page.css('h1')[0].text, speakers: speakers, messages: messages, } end
side
and id
to each speaker. side
is a string with a value of left
or right
, which is used in the Taaalk show.html.erb
file to determine if someone's "bubbles" are left or right aligned. For example, in this Taaalk you are right
and I am left
.id
seemed like the obvious way to join speakers with their messages.id
of the message's speaker, we are extracting the created at date, and ignoring any edit dates; for example when a post has been edited you get the following string in the html: "23:34, 30 Jan 22 (edit: 23:49, 30 Jan 22)"
. This is why I partition at "("
- .partition("(")[0].strip
. # individual message example { :speaker_id=>"129", :created_at=>"18:20, 05 Feb 21", :message=> "\n" + " <div class=\"tlk-blob-msg\">\n" + "<div class=\"trix-content\">\n" + " <div class=\"tlk-bubble-holder\"><div class=\"tlk-bubble\">...</div></div>\n" + "</div>\n" + "\n" + " </div>\n" + " " }
\n
characters will cause some sort of problem in future, but it feels good enough to work with for now..rb
file? Or something else?String#split
instead of #partition
since you don't need to keep the (
character itself. I think split
is a simpler and more common method and when you used partition
instead my immediate thought was "oh there must be a reason why he used this instead of split
".require 'json' puts(taaalks.to_json)
.json
file in your app somewhere. The import script is going to be another Ruby script that we run from the Heroku shell, so at some point we'll need to figure out how to access the file from there.require 'nokogiri' require 'json' taaalks = Dir["t/*/index.html"].map do |path| page = Nokogiri::HTML(File.open(path)) speakers = page.css('.spkr-info').map do |spkr_info| info_header = spkr_info.css('h3 a')[0] { name: info_header.text, id: info_header.attr('class').delete("^0-9"), side: spkr_info.attr('class').gsub('spkr-info spkr-info-',''), profile_path: info_header['href'], twitter_handle: spkr_info.css('.twitter-handle').text.strip, bio: spkr_info.css('.trix-content').inner_html, } end messages = page.css('.tlk-blob').map do |tlk_blob| { speaker_id: tlk_blob.css('div[class^="name-spkr-"]')[0].attr('class').delete("^0-9"), created_at: tlk_blob.css('div[class="tlk-blob-date"]').text.split("(")[0].strip, message: tlk_blob.search('div[target="spkr-color-"]').children.to_html } end { title: page.css('h1')[0].text, speakers: speakers, messages: messages, } end File.open('taaalks.json', 'w') { |file| file.write(taaalks.to_json) }
.partition
to .script
, I created a taaalks.json
file and wrote taaalks.to_json
into it (instead of puts
ing taaalks
and copying that output into a JSON file).taaalks.json
into the root directory of the Taaalk project... now I think about it I'm not sure this is the right location. My plan was to make a taaalk_importer.rb
script, however if I am going to be running this in the Heroku console, I probably can't do ruby taaalk_importer.rb
. I guess I need to write a rake task, or create a class which I can run - e.g. TaaalkImporter.call
.Conversations are one of the core pillars of human interaction. They are fundamental to how we collaborate, pass on knowledge, and understand, as well as be understood by, one another.
Taaalk is based on the belief that long-form conversations between people deserve their own space on the web. To make this space the best it can be, you are invited to discuss ideas and feedback you have about the website in our open "Taaalk feedback" conversation.
About
By using this website you agree to our: