#archiveteam-bs 2018-05-23,Wed

↑back Search

Time Nickname Message
00:10 🔗 megaminxw has quit IRC (Quit: WeeChat 1.9.1)
00:11 🔗 svchfoo3 has quit IRC (Read error: Operation timed out)
00:12 🔗 Jonimus has quit IRC (Read error: Operation timed out)
00:12 🔗 Jonimus has joined #archiveteam-bs
00:12 🔗 swebb sets mode: +o Jonimus
00:17 🔗 BlueMax has joined #archiveteam-bs
01:05 🔗 BlueMax has quit IRC (Leaving)
01:44 🔗 ta9le has quit IRC (Quit: Connection closed for inactivity)
01:45 🔗 BlueMax has joined #archiveteam-bs
02:16 🔗 Soni has quit IRC (Remote host closed the connection)
02:23 🔗 Soni has joined #archiveteam-bs
02:46 🔗 CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
03:31 🔗 qw3rty114 has joined #archiveteam-bs
03:37 🔗 qw3rty113 has quit IRC (Read error: Operation timed out)
03:52 🔗 odemg has quit IRC (Ping timeout: 246 seconds)
04:06 🔗 Odd0002_ has joined #archiveteam-bs
04:06 🔗 odemg has joined #archiveteam-bs
04:08 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
04:08 🔗 Odd0002_ is now known as Odd0002
05:16 🔗 godane so i got a weird tape
05:16 🔗 godane it starts with dead by midnight on abc that aired 1997-11-23
05:17 🔗 godane then for the 2nd half of the tape its nbc the pretender and profiler
05:18 🔗 godane it must have tape over something that was exactly 2 hours from a previous recording
05:42 🔗 Asparagir has quit IRC (Asparagir)
06:36 🔗 schbirid has joined #archiveteam-bs
08:50 🔗 ta9le has joined #archiveteam-bs
09:06 🔗 lindalap This forum is under fire for GDPR: http://tulpa.palstani.com/ I have a conflict of interest, if anyone is interested in archiving it.
11:58 🔗 Spydar007 has quit IRC (se.hub irc.efnet.nl)
12:00 🔗 Spydar007 has joined #archiveteam-bs
12:46 🔗 lindalap All this GDPR fallout, heh.
12:49 🔗 JAA Shows who cares about their users' privacy and who doesn't.
13:17 🔗 Darkstar has quit IRC (Ping timeout: 1212 seconds)
13:21 🔗 Darkstar has joined #archiveteam-bs
13:22 🔗 BlueMax has quit IRC (Leaving)
13:40 🔗 Odd0002_ has joined #archiveteam-bs
13:40 🔗 lindalap I wonder... all the tulpa forums. o_O
13:41 🔗 Odd0002 has quit IRC (Ping timeout: 252 seconds)
13:41 🔗 Odd0002_ is now known as Odd0002
13:46 🔗 REiN^ has joined #archiveteam-bs
13:53 🔗 lindalap Would you !a https://tulpa.io/ --igset forums ? I have a conflict of interest, again.
13:57 🔗 lindalap https://wiki.tulpa.info/ has a GDPR cookie notice.
14:04 🔗 odinho has joined #archiveteam-bs
14:06 🔗 odinho The casparcg.com/forum job is finished. What do we normally do after? Can I post the URL https://archive.fart.website/archivebot/viewer/job/8qdw1 (or something better?) to the forum? Some people would be interested, but the site admins might not like it. On the other hand they actually won't have liability for archiveteams' backup, so they will probably be fine with it (just not say so).
14:06 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
14:07 🔗 Mateon1 has joined #archiveteam-bs
14:08 🔗 JAA odinho: I guess you can post the link there if you want. Be aware that not all data has been uploaded to IA yet; there's usually a delay of a few hours to a day or two.
14:08 🔗 JAA However, more useful for most people would probably be the mention that the forums are available in the Wayback Machine.
14:09 🔗 odinho JAA: Okay, and this dump of info would actually be ingested and used by Wayback Machine later? For some (like me) having it locally makes sense because you can grep in it. I don't think the Archive.org has a public full text search?
14:09 🔗 JAA That's correct.
14:10 🔗 JAA And you'll want to use zgrep (or gunzip the files first) since the WARCs are compressed.
14:11 🔗 JAA And you might also want to filter out the content of interest (to you) first so it doesn't grep through images etc. every time.
14:11 🔗 JAA (Well, maybe not images due to binary data detection, but scripts, stylesheets, etc.)
14:12 🔗 odinho Sure, that makes sense. I'll have a look once it's downloaded :) Thanks!
14:34 🔗 Stilett0 has joined #archiveteam-bs
14:35 🔗 Stilett0 has quit IRC (Client Quit)
16:10 🔗 SketchCow Ciao, Ciao
16:48 🔗 wp494 has quit IRC (Read error: Operation timed out)
16:52 🔗 wp494 has joined #archiveteam-bs
16:52 🔗 svchfoo1 sets mode: +o wp494
17:51 🔗 ta9le Welp, gonna attempt to backup every single Steam game I own
17:51 🔗 ta9le which already sounds like it's either a good idea or a bad one
19:18 🔗 Jens Has anyone profiled wpull, to find out why it eats so much CPU?
19:19 🔗 JAA Mostly HTML parsing, I believe.
19:20 🔗 Jens It's pretty extreme.
19:35 🔗 ivan Jens: https://github.com/ludios/grab-site/issues/117
19:35 🔗 ivan you can use the programs mentioned to see the rest of the bottlenecks
20:01 🔗 arkiver ivan: interesting programs
20:10 🔗 odinho html5lib is known to be very slow - it was written as a proof of concept of the new html5 parser while writing the spec. I worked with one of the creators, he said it would be much better to use a web engine's HTML parser. Like the one we had in Presto (Opera), there was quite a lot of us wanting to just open source that part. These days there's sure to be a lot of much faster and compliant html5 parsers.
20:11 🔗 Jens ivan: Are you also ivan on Freenode?
20:11 🔗 JAA https://github.com/chfoo/wpull/issues/364
20:12 🔗 ivan Jens: yes
20:13 🔗 Jens o hai :D
20:21 🔗 schbirid has quit IRC (Quit: Leaving)
20:26 🔗 odinho I didn't know Servo's HTML parser was called html5ever, that's a cool name. There are some Python bindings for it, but none that seems maintained.
20:30 🔗 plue has quit IRC (Quit: leaving)
20:42 🔗 odemg has quit IRC (Read error: Operation timed out)
20:43 🔗 JAA Yeah. Maybe something like rustypy could be used, but I have zero experience with anything related to Rust.
20:58 🔗 odemg has joined #archiveteam-bs
21:13 🔗 plue has joined #archiveteam-bs
21:15 🔗 wtrx has joined #archiveteam-bs
21:30 🔗 wtrx Hi! Does anyone know what happened to the data that was saved in this 2016 project? https://www.archiveteam.org/index.php?title=Virgin_Media
21:31 🔗 wtrx This redditor is trying to track down a very specific file that was likely included in the data https://old.reddit.com/r/DataHoarder/comments/8l84go/looking_for_file_wfchtmlzip_can_anyone_help/
21:40 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
21:56 🔗 Mateon1 has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 dxrt_ has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Aoede has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 espes___ has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 Rai-chan has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 svchfoo1 has quit IRC (se.hub efnet.portlane.se)
21:56 🔗 omglolbah has quit IRC (se.hub efnet.portlane.se)
21:58 🔗 Mateon1 has joined #archiveteam-bs
21:58 🔗 dxrt_ has joined #archiveteam-bs
21:58 🔗 Gfy has joined #archiveteam-bs
21:58 🔗 Aoede has joined #archiveteam-bs
21:58 🔗 espes___ has joined #archiveteam-bs
21:58 🔗 Rai-chan has joined #archiveteam-bs
21:58 🔗 svchfoo1 has joined #archiveteam-bs
21:58 🔗 omglolbah has joined #archiveteam-bs
21:58 🔗 efnet.portlane.se sets mode: +oo dxrt_ svchfoo1
21:58 🔗 dxrt sets mode: +o dxrt_
22:05 🔗 wtrx gotta go, but I'll check the logs to see if anyone has an answer
22:06 🔗 wtrx has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
22:07 🔗 RichardG has joined #archiveteam-bs
22:20 🔗 astrid the virginmedia stuff was done in the channel #virginsacrifice, which i do not have logs of
22:20 🔗 astrid unfortunately
22:24 🔗 astrid pinging the people i see talking about virginmedia personal pages in my logs of #archiveteam: arkiver SketchCow HCross SimpBrain
22:24 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:31 🔗 Spydar007 has quit IRC (se.hub irc.efnet.nl)
22:38 🔗 astrid ah, found it
22:38 🔗 astrid https://archive.org/download/archiveteam_20161107222558
22:38 🔗 astrid https://archive.org/download/archiveteam_20161107222599
22:40 🔗 astrid "wfc-html" does not appear in the .cdx.gz index files for either of these items
22:49 🔗 acridAxid has quit IRC (marauder)
22:50 🔗 acridAxid has joined #archiveteam-bs
22:55 🔗 wtrx has joined #archiveteam-bs
22:56 🔗 wtrx @astrid appreciate you taking the time to track that down - thanks!
22:56 🔗 astrid sorry i couldn't be of more help :(
22:58 🔗 wtrx No worries! It's always spooky when something can just disappear from the internet entirely like that :/
22:59 🔗 wtrx again, appreciate the help - thanks for searching through the file index too
23:08 🔗 astrid it happens ... the internet only remembers things that you want it to forget
23:10 🔗 wtrx has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
23:28 🔗 SmileyG_ has joined #archiveteam-bs
23:30 🔗 SmileyG has quit IRC (Read error: Operation timed out)
23:45 🔗 ndiddylap has joined #archiveteam-bs
23:49 🔗 ivan heh

irclogger-viewer