[00:10] *** megaminxw has quit IRC (Quit: WeeChat 1.9.1) [00:11] *** svchfoo3 has quit IRC (Read error: Operation timed out) [00:12] *** Jonimus has quit IRC (Read error: Operation timed out) [00:12] *** Jonimus has joined #archiveteam-bs [00:12] *** swebb sets mode: +o Jonimus [00:17] *** BlueMax has joined #archiveteam-bs [01:05] *** BlueMax has quit IRC (Leaving) [01:44] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [01:45] *** BlueMax has joined #archiveteam-bs [02:16] *** Soni has quit IRC (Remote host closed the connection) [02:23] *** Soni has joined #archiveteam-bs [02:46] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [03:31] *** qw3rty114 has joined #archiveteam-bs [03:37] *** qw3rty113 has quit IRC (Read error: Operation timed out) [03:52] *** odemg has quit IRC (Ping timeout: 246 seconds) [04:06] *** Odd0002_ has joined #archiveteam-bs [04:06] *** odemg has joined #archiveteam-bs [04:08] *** Odd0002 has quit IRC (Read error: Operation timed out) [04:08] *** Odd0002_ is now known as Odd0002 [05:16] so i got a weird tape [05:16] it starts with dead by midnight on abc that aired 1997-11-23 [05:17] then for the 2nd half of the tape its nbc the pretender and profiler [05:18] it must have tape over something that was exactly 2 hours from a previous recording [05:42] *** Asparagir has quit IRC (Asparagir) [06:36] *** schbirid has joined #archiveteam-bs [08:50] *** ta9le has joined #archiveteam-bs [09:06] This forum is under fire for GDPR: http://tulpa.palstani.com/ I have a conflict of interest, if anyone is interested in archiving it. [11:58] *** Spydar007 has quit IRC (se.hub irc.efnet.nl) [12:00] *** Spydar007 has joined #archiveteam-bs [12:46] All this GDPR fallout, heh. [12:49] Shows who cares about their users' privacy and who doesn't. [13:17] *** Darkstar has quit IRC (Ping timeout: 1212 seconds) [13:21] *** Darkstar has joined #archiveteam-bs [13:22] *** BlueMax has quit IRC (Leaving) [13:40] *** Odd0002_ has joined #archiveteam-bs [13:40] I wonder... all the tulpa forums. o_O [13:41] *** Odd0002 has quit IRC (Ping timeout: 252 seconds) [13:41] *** Odd0002_ is now known as Odd0002 [13:46] *** REiN^ has joined #archiveteam-bs [13:53] Would you !a https://tulpa.io/ --igset forums ? I have a conflict of interest, again. [13:57] https://wiki.tulpa.info/ has a GDPR cookie notice. [14:04] *** odinho has joined #archiveteam-bs [14:06] The casparcg.com/forum job is finished. What do we normally do after? Can I post the URL https://archive.fart.website/archivebot/viewer/job/8qdw1 (or something better?) to the forum? Some people would be interested, but the site admins might not like it. On the other hand they actually won't have liability for archiveteams' backup, so they will probably be fine with it (just not say so). [14:06] *** Mateon1 has quit IRC (Ping timeout: 268 seconds) [14:07] *** Mateon1 has joined #archiveteam-bs [14:08] odinho: I guess you can post the link there if you want. Be aware that not all data has been uploaded to IA yet; there's usually a delay of a few hours to a day or two. [14:08] However, more useful for most people would probably be the mention that the forums are available in the Wayback Machine. [14:09] JAA: Okay, and this dump of info would actually be ingested and used by Wayback Machine later? For some (like me) having it locally makes sense because you can grep in it. I don't think the Archive.org has a public full text search? [14:09] That's correct. [14:10] And you'll want to use zgrep (or gunzip the files first) since the WARCs are compressed. [14:11] And you might also want to filter out the content of interest (to you) first so it doesn't grep through images etc. every time. [14:11] (Well, maybe not images due to binary data detection, but scripts, stylesheets, etc.) [14:12] Sure, that makes sense. I'll have a look once it's downloaded :) Thanks! [14:34] *** Stilett0 has joined #archiveteam-bs [14:35] *** Stilett0 has quit IRC (Client Quit) [16:10] Ciao, Ciao [16:48] *** wp494 has quit IRC (Read error: Operation timed out) [16:52] *** wp494 has joined #archiveteam-bs [16:52] *** svchfoo1 sets mode: +o wp494 [17:51] Welp, gonna attempt to backup every single Steam game I own [17:51] which already sounds like it's either a good idea or a bad one [19:18] Has anyone profiled wpull, to find out why it eats so much CPU? [19:19] Mostly HTML parsing, I believe. [19:20] It's pretty extreme. [19:35] Jens: https://github.com/ludios/grab-site/issues/117 [19:35] you can use the programs mentioned to see the rest of the bottlenecks [20:01] ivan: interesting programs [20:10] html5lib is known to be very slow - it was written as a proof of concept of the new html5 parser while writing the spec. I worked with one of the creators, he said it would be much better to use a web engine's HTML parser. Like the one we had in Presto (Opera), there was quite a lot of us wanting to just open source that part. These days there's sure to be a lot of much faster and compliant html5 parsers. [20:11] ivan: Are you also ivan on Freenode? [20:11] https://github.com/chfoo/wpull/issues/364 [20:12] Jens: yes [20:13] o hai :D [20:21] *** schbirid has quit IRC (Quit: Leaving) [20:26] I didn't know Servo's HTML parser was called html5ever, that's a cool name. There are some Python bindings for it, but none that seems maintained. [20:30] *** plue has quit IRC (Quit: leaving) [20:42] *** odemg has quit IRC (Read error: Operation timed out) [20:43] Yeah. Maybe something like rustypy could be used, but I have zero experience with anything related to Rust. [20:58] *** odemg has joined #archiveteam-bs [21:13] *** plue has joined #archiveteam-bs [21:15] *** wtrx has joined #archiveteam-bs [21:30] Hi! Does anyone know what happened to the data that was saved in this 2016 project? https://www.archiveteam.org/index.php?title=Virgin_Media [21:31] This redditor is trying to track down a very specific file that was likely included in the data https://old.reddit.com/r/DataHoarder/comments/8l84go/looking_for_file_wfchtmlzip_can_anyone_help/ [21:40] *** RichardG has quit IRC (Read error: Connection reset by peer) [21:56] *** Mateon1 has quit IRC (se.hub efnet.portlane.se) [21:56] *** dxrt_ has quit IRC (se.hub efnet.portlane.se) [21:56] *** Gfy has quit IRC (se.hub efnet.portlane.se) [21:56] *** Aoede has quit IRC (se.hub efnet.portlane.se) [21:56] *** espes___ has quit IRC (se.hub efnet.portlane.se) [21:56] *** Rai-chan has quit IRC (se.hub efnet.portlane.se) [21:56] *** svchfoo1 has quit IRC (se.hub efnet.portlane.se) [21:56] *** omglolbah has quit IRC (se.hub efnet.portlane.se) [21:58] *** Mateon1 has joined #archiveteam-bs [21:58] *** dxrt_ has joined #archiveteam-bs [21:58] *** Gfy has joined #archiveteam-bs [21:58] *** Aoede has joined #archiveteam-bs [21:58] *** espes___ has joined #archiveteam-bs [21:58] *** Rai-chan has joined #archiveteam-bs [21:58] *** svchfoo1 has joined #archiveteam-bs [21:58] *** omglolbah has joined #archiveteam-bs [21:58] *** efnet.portlane.se sets mode: +oo dxrt_ svchfoo1 [21:58] *** dxrt sets mode: +o dxrt_ [22:05] gotta go, but I'll check the logs to see if anyone has an answer [22:06] *** wtrx has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [22:07] *** RichardG has joined #archiveteam-bs [22:20] the virginmedia stuff was done in the channel #virginsacrifice, which i do not have logs of [22:20] unfortunately [22:24] pinging the people i see talking about virginmedia personal pages in my logs of #archiveteam: arkiver SketchCow HCross SimpBrain [22:24] *** jschwart has quit IRC (Quit: Konversation terminated!) [22:31] *** Spydar007 has quit IRC (se.hub irc.efnet.nl) [22:38] ah, found it [22:38] https://archive.org/download/archiveteam_20161107222558 [22:38] https://archive.org/download/archiveteam_20161107222599 [22:40] "wfc-html" does not appear in the .cdx.gz index files for either of these items [22:49] *** acridAxid has quit IRC (marauder) [22:50] *** acridAxid has joined #archiveteam-bs [22:55] *** wtrx has joined #archiveteam-bs [22:56] @astrid appreciate you taking the time to track that down - thanks! [22:56] sorry i couldn't be of more help :( [22:58] No worries! It's always spooky when something can just disappear from the internet entirely like that :/ [22:59] again, appreciate the help - thanks for searching through the file index too [23:08] it happens ... the internet only remembers things that you want it to forget [23:10] *** wtrx has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [23:28] *** SmileyG_ has joined #archiveteam-bs [23:30] *** SmileyG has quit IRC (Read error: Operation timed out) [23:45] *** ndiddylap has joined #archiveteam-bs [23:49] heh