#archiveteam-ot 2019-06-11,Tue

↑back Search

Time Nickname Message
00:10 🔗 systwi has joined #archiveteam-ot
00:16 🔗 BlueMax has joined #archiveteam-ot
00:44 🔗 anarcat is there a way to get cloaks on efnet?
00:44 🔗 anarcat since this is -ot after all
00:45 🔗 ivan don't think so
00:45 🔗 anarcat too bad
00:45 🔗 anarcat this is the only network that leaks my client IP
00:46 🔗 * Flashfire Starts researching anarcat
00:50 🔗 icedice has quit IRC (Leaving)
00:58 🔗 markedL has quit IRC (The Lounge - https://thelounge.chat)
00:58 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
01:03 🔗 t3 http://www.pc-collection.com/collection-ibm.php
01:06 🔗 asdf0101 has joined #archiveteam-ot
01:06 🔗 markedL has joined #archiveteam-ot
01:06 🔗 t3 I'm trying to find the model of the IBM computer I used to have.
01:07 🔗 t3 I think it's from 1996 or 1997. It looks very similar http://www.pc-collection.com/images/i/ibm/IBM_Aptiva_2193-420.jpg but it has an AMD processor and it doesn't have the two ports on the bottom part.
01:08 🔗 t3 And it didn't come with blue speakers. They were beige/grey, like the color of the computer, monitor, and peripherals.
01:08 🔗 t3 But it still had those blue accents.
01:08 🔗 t3 The computer still had those blue accents.
01:09 🔗 t3 I think it originally shipped with Windows 95, but was upgraded to Windows 98.
01:09 🔗 t3 I mean upgraded to Windows 98 SE.
01:10 🔗 t3 It had a disc drive on the top and a floppy disk drive in the middle, like in the picture.
01:14 🔗 t3 So I'm positive it is an IBM Aptiva computer. I know it had an AMD processor. It probably also had the Windows 95 sticker on the front.
01:42 🔗 kiska1 has quit IRC (Read error: Operation timed out)
02:03 🔗 kiska1 has joined #archiveteam-ot
02:03 🔗 Fusl sets mode: +o kiska1
02:11 🔗 chirlu has joined #archiveteam-ot
02:16 🔗 BlueMax has quit IRC (Quit: Leaving)
03:12 🔗 systwi Does archivebot obey robots.txt?
03:13 🔗 Flashfire systwi if by obey you mean use it to find more content to archive then yes
03:14 🔗 systwi I meant if a robots.txt file tells web crawlers to not archive their contents, does archivebot obey that? I fear some of the pages I tossed into AB were not saved if this is the case.
03:16 🔗 Flashfire If a site is manually excluded from wayback the contents wont end up there but all robots.txt are ignored by AB
03:22 🔗 ivan systwi: it ignores robots.txt
03:22 🔗 Fusl https://www.archiveteam.org/index.php?title=Robots.txt
03:23 🔗 Fusl ArchiveBot understands robots.txt (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however.
03:23 🔗 ivan pipeline/archivebot/seesaw/wpull.py
03:23 🔗 ivan 28: '--no-robots',
03:25 🔗 systwi Cool, thank god
03:25 🔗 systwi That puts me at ease
03:26 🔗 systwi On a different note, I think we should grab everything ProJared; his site, YT videos, social media content, etc. There have been some allegations of him cheating on his wife circulating on the net which makes me wonder if he's going to delete everything.
03:27 🔗 ivan I have his YouTube
03:29 🔗 systwi That's good to know. I assume it's all encompassing--or at least as all encompassing as possible?
03:29 🔗 ivan what youtube-dl grabs
03:30 🔗 ivan missing the comments and chat replays
03:30 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
03:39 🔗 BlueMax has joined #archiveteam-ot
03:45 🔗 systwi ivan: Check this tool out: https://github.com/philbot9/youtube-comment-scraper-cli
03:45 🔗 systwi I use this to save them to a .json file and it works pretty well
03:45 🔗 systwi Not sure about YT chat replays though
03:45 🔗 systwi Twitch chat replays, though, can be scraped too. If you want to know of a tool let me know
03:46 🔗 ivan I used https://github.com/egbertbouman/youtube-comment-downloader yesterday
03:49 🔗 systwi Hmm, interesting, I wonder how different these two are
03:54 🔗 systwi ivan: Do you know, does it also save information like if the comment is pinned, loved, like/dislike status, post date, edited (Y/N), etc.?
03:55 🔗 ivan youtube-comment-downloader saves a relative date but probably none of those other things
03:56 🔗 ivan maybe these should be saved with a browser
03:56 🔗 ivan or maybe the html from the ajax endpoint needs to be concatenated
03:57 🔗 systwi If you did want to try it out, I know that youtube-comment-scraper-cli saves all of that, minus the pinned and loved. Maybe/hopefully those were added in a newer verison, IDK
03:57 🔗 systwi I believe the time stamp is written unix-style
04:08 🔗 systwi ivan
04:08 🔗 ivan thanks
04:41 🔗 dhyan_nat has joined #archiveteam-ot
08:56 🔗 Odd0002_ has joined #archiveteam-ot
08:57 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
08:57 🔗 Odd0002_ is now known as Odd0002
09:43 🔗 killsushi has quit IRC (Quit: Leaving)
10:09 🔗 BlueMaxim has joined #archiveteam-ot
10:13 🔗 BlueMaxim has quit IRC (Client Quit)
10:19 🔗 BlueMax has quit IRC (Ping timeout: 615 seconds)
10:43 🔗 JAA systwi: I grabbed a bunch of ProJared Twitter stuff last month when this whole thing blew up.
10:43 🔗 JAA (At the request of Ryz)
10:51 🔗 Flashfire !a https://m.youtube.com/watch?feature=youtu.be&v=egIqLe4b9_A
11:14 🔗 wp494 has quit IRC (Ping timeout: 492 seconds)
11:14 🔗 wp494 has joined #archiveteam-ot
15:08 🔗 systwi JAA: Cool, glad to know some of it is saved.
15:48 🔗 sep332 has joined #archiveteam-ot
16:54 🔗 Dj-Wawa has joined #archiveteam-ot
16:59 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
17:10 🔗 anarcat still unclear to me what the distinction between here and -bs is
17:10 🔗 anarcat but whatever
17:10 🔗 anarcat anyone has experience archiving Trac sites?
17:10 🔗 anarcat preferably to static HTML?
17:13 🔗 JAA anarcat: -bs is for AT activities, coordination of small projects, etc. This channel is for more or less anything.
17:14 🔗 anarcat ok
17:14 🔗 anarcat well, this is bordeline
17:14 🔗 JAA I've handled some ArchiveBot jobs for Trac sites, and it's quite a nightmare.
17:14 🔗 anarcat borderline
17:14 🔗 anarcat ah crap
17:14 🔗 anarcat so torproject.org has been looking at switching to gitlab for a while now
17:14 🔗 JAA Well, just the usual "hey, here's another 100000 possible views of the same content!" crap.
17:14 🔗 anarcat and i heard noises it might actually happen, which means i'll need to deal with the Trac
17:15 🔗 JAA With the appropriate ignores, it should archive just fine.
17:15 🔗 JAA Unlike GitLab, it's actually usable without JS.
17:17 🔗 anarcat yeah
17:17 🔗 anarcat you don't happen to have an igset around, do you? ;)
17:17 🔗 anarcat still have*
17:19 🔗 JAA Nope
17:19 🔗 anarcat ack
17:20 🔗 JAA I'll happily merge one though. ;-)
17:20 🔗 anarcat hehe
17:20 🔗 anarcat well to be honest, i don't even know what to use for crawling anymore
17:21 🔗 anarcat i don't want to end up with a WARC file in this case
17:21 🔗 anarcat i'd probably make a static HTML site, hosted indefinitely
17:21 🔗 anarcat i don't think i can pull off the "magic htaccess" redirection game
17:23 🔗 JAA Yeah, depends on what the goal is. However, if the original site disappears, we might still want to grab a copy for the Wayback Machine regardless of whether you continue to host a mirror somewhere.
17:23 🔗 anarcat then maybe this is -bs material after all
17:24 🔗 JAA As for a static HTML mirror, maybe wget --mirror will be useful for this. Still needs a lot of ignores though for all the differently sorted lists etc.
17:24 🔗 JAA I don't remember what wget's options for ignoring things are, but I think recent versions have a --reject-regex option as well.
17:24 🔗 JAA wpull doesn't have --mirror.
17:25 🔗 JAA Maybe you can emulate it with other options, I've never used it, so I can't tell you more about that.
17:26 🔗 anarcat why --mirror?
17:26 🔗 anarcat looks like it was somewhat crawled, but a long time ago https://archive.fart.website/archivebot/viewer/domain/trac.torproject.org
17:26 🔗 anarcat should i just kick a crawl now?
17:27 🔗 anarcat then we can tweak a igset as we go?
17:27 🔗 JAA I need to leave for a bit now, how about we throw it in in an hour or so?
17:27 🔗 anarcat sure, no probs
17:28 🔗 anarcat can we extract the igset from archivebot later?
17:28 🔗 anarcat so we can reuse it?
17:28 🔗 JAA Yeah, just a matter of a few greps on the IRC logs. :-)
17:29 🔗 anarcat ah
17:29 🔗 anarcat that's not the best :p
17:36 🔗 Anthony1 has joined #archiveteam-ot
18:12 🔗 JAA -> #archivebot
19:16 🔗 jut Anybody here from warmer climates? How do you survive 35C heat?
19:17 🔗 astrid shade, water
19:18 🔗 astrid the human body is capable of adapting its metabolism to a wide variety of climates but in my experience it takes it a week or two of no aircon to figure out how to deal with heat
19:20 🔗 anarcat moving north
19:22 🔗 Fusl jut: 40C inside, i have an air condition directly pointing my face
19:22 🔗 anarcat yuck
19:23 🔗 jut Yeah it's northern europe, air conditioning is rare
19:29 🔗 jut Also, our school year has been extended by 3 weeks, and our schools are not built to be occupied during the summer.
19:29 🔗 VoynichCr jut: use winter wallpapers, it helps
19:31 🔗 astrid like, pictures of snow scenes?
19:31 🔗 VoynichCr yeah
19:32 🔗 VoynichCr (lol, it was a joke)
19:33 🔗 jut You know I will try
20:09 🔗 Anthony1 has quit IRC (Ping timeout: 260 seconds)
20:43 🔗 dhyan_nat has joined #archiveteam-ot
21:06 🔗 n00b659 has joined #archiveteam-ot
21:06 🔗 n00b659 has left
21:08 🔗 n00b has joined #archiveteam-ot
21:10 🔗 n00b has left
21:11 🔗 n00b680 has joined #archiveteam-ot
21:11 🔗 n00b680 has left
22:27 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
22:28 🔗 Jens has quit IRC (Remote host closed the connection)
22:28 🔗 Jens has joined #archiveteam-ot
22:51 🔗 BlueMax has joined #archiveteam-ot

irclogger-viewer