#archiveteam-ot 2019-06-11,Tue

↑back Search

Time	Nickname	Message
00:10 ^🔗		systwi has joined #archiveteam-ot
00:16 ^🔗		BlueMax has joined #archiveteam-ot
00:44 ^🔗	anarcat	is there a way to get cloaks on efnet?
00:44 ^🔗	anarcat	since this is -ot after all
00:45 ^🔗	ivan	don't think so
00:45 ^🔗	anarcat	too bad
00:45 ^🔗	anarcat	this is the only network that leaks my client IP
00:46 ^🔗	*	Flashfire Starts researching anarcat
00:50 ^🔗		icedice has quit IRC (Leaving)
00:58 ^🔗		markedL has quit IRC (The Lounge - https://thelounge.chat)
00:58 ^🔗		asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
01:03 ^🔗	t3	http://www.pc-collection.com/collection-ibm.php
01:06 ^🔗		asdf0101 has joined #archiveteam-ot
01:06 ^🔗		markedL has joined #archiveteam-ot
01:06 ^🔗	t3	I'm trying to find the model of the IBM computer I used to have.
01:07 ^🔗	t3	I think it's from 1996 or 1997. It looks very similar http://www.pc-collection.com/images/i/ibm/IBM_Aptiva_2193-420.jpg but it has an AMD processor and it doesn't have the two ports on the bottom part.
01:08 ^🔗	t3	And it didn't come with blue speakers. They were beige/grey, like the color of the computer, monitor, and peripherals.
01:08 ^🔗	t3	But it still had those blue accents.
01:08 ^🔗	t3	The computer still had those blue accents.
01:09 ^🔗	t3	I think it originally shipped with Windows 95, but was upgraded to Windows 98.
01:09 ^🔗	t3	I mean upgraded to Windows 98 SE.
01:10 ^🔗	t3	It had a disc drive on the top and a floppy disk drive in the middle, like in the picture.
01:14 ^🔗	t3	So I'm positive it is an IBM Aptiva computer. I know it had an AMD processor. It probably also had the Windows 95 sticker on the front.
01:42 ^🔗		kiska1 has quit IRC (Read error: Operation timed out)
02:03 ^🔗		kiska1 has joined #archiveteam-ot
02:03 ^🔗		Fusl sets mode: +o kiska1
02:11 ^🔗		chirlu has joined #archiveteam-ot
02:16 ^🔗		BlueMax has quit IRC (Quit: Leaving)
03:12 ^🔗	systwi	Does archivebot obey robots.txt?
03:13 ^🔗	Flashfire	systwi if by obey you mean use it to find more content to archive then yes
03:14 ^🔗	systwi	I meant if a robots.txt file tells web crawlers to not archive their contents, does archivebot obey that? I fear some of the pages I tossed into AB were not saved if this is the case.
03:16 ^🔗	Flashfire	If a site is manually excluded from wayback the contents wont end up there but all robots.txt are ignored by AB
03:22 ^🔗	ivan	systwi: it ignores robots.txt
03:22 ^🔗	Fusl	https://www.archiveteam.org/index.php?title=Robots.txt
03:23 ^🔗	Fusl	ArchiveBot understands robots.txt (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however.
03:23 ^🔗	ivan	pipeline/archivebot/seesaw/wpull.py
03:23 ^🔗	ivan	28: '--no-robots',
03:25 ^🔗	systwi	Cool, thank god
03:25 ^🔗	systwi	That puts me at ease
03:26 ^🔗	systwi	On a different note, I think we should grab everything ProJared; his site, YT videos, social media content, etc. There have been some allegations of him cheating on his wife circulating on the net which makes me wonder if he's going to delete everything.
03:27 ^🔗	ivan	I have his YouTube
03:29 ^🔗	systwi	That's good to know. I assume it's all encompassing--or at least as all encompassing as possible?
03:29 ^🔗	ivan	what youtube-dl grabs
03:30 ^🔗	ivan	missing the comments and chat replays
03:30 ^🔗		odemg has quit IRC (Ping timeout: 265 seconds)
03:39 ^🔗		BlueMax has joined #archiveteam-ot
03:45 ^🔗	systwi	ivan: Check this tool out: https://github.com/philbot9/youtube-comment-scraper-cli
03:45 ^🔗	systwi	I use this to save them to a .json file and it works pretty well
03:45 ^🔗	systwi	Not sure about YT chat replays though
03:45 ^🔗	systwi	Twitch chat replays, though, can be scraped too. If you want to know of a tool let me know
03:46 ^🔗	ivan	I used https://github.com/egbertbouman/youtube-comment-downloader yesterday
03:49 ^🔗	systwi	Hmm, interesting, I wonder how different these two are
03:54 ^🔗	systwi	ivan: Do you know, does it also save information like if the comment is pinned, loved, like/dislike status, post date, edited (Y/N), etc.?
03:55 ^🔗	ivan	youtube-comment-downloader saves a relative date but probably none of those other things
03:56 ^🔗	ivan	maybe these should be saved with a browser
03:56 ^🔗	ivan	or maybe the html from the ajax endpoint needs to be concatenated
03:57 ^🔗	systwi	If you did want to try it out, I know that youtube-comment-scraper-cli saves all of that, minus the pinned and loved. Maybe/hopefully those were added in a newer verison, IDK
03:57 ^🔗	systwi	I believe the time stamp is written unix-style
04:08 ^🔗	systwi	ivan
04:08 ^🔗	ivan	thanks
04:41 ^🔗		dhyan_nat has joined #archiveteam-ot
08:56 ^🔗		Odd0002_ has joined #archiveteam-ot
08:57 ^🔗		Odd0002 has quit IRC (Read error: Operation timed out)
08:57 ^🔗		Odd0002_ is now known as Odd0002
09:43 ^🔗		killsushi has quit IRC (Quit: Leaving)
10:09 ^🔗		BlueMaxim has joined #archiveteam-ot
10:13 ^🔗		BlueMaxim has quit IRC (Client Quit)
10:19 ^🔗		BlueMax has quit IRC (Ping timeout: 615 seconds)
10:43 ^🔗	JAA	systwi: I grabbed a bunch of ProJared Twitter stuff last month when this whole thing blew up.
10:43 ^🔗	JAA	(At the request of Ryz)
10:51 ^🔗	Flashfire	!a https://m.youtube.com/watch?feature=youtu.be&v=egIqLe4b9_A
11:14 ^🔗		wp494 has quit IRC (Ping timeout: 492 seconds)
11:14 ^🔗		wp494 has joined #archiveteam-ot
15:08 ^🔗	systwi	JAA: Cool, glad to know some of it is saved.
15:48 ^🔗		sep332 has joined #archiveteam-ot
16:54 ^🔗		Dj-Wawa has joined #archiveteam-ot
16:59 ^🔗		dhyan_nat has quit IRC (Read error: Operation timed out)
17:10 ^🔗	anarcat	still unclear to me what the distinction between here and -bs is
17:10 ^🔗	anarcat	but whatever
17:10 ^🔗	anarcat	anyone has experience archiving Trac sites?
17:10 ^🔗	anarcat	preferably to static HTML?
17:13 ^🔗	JAA	anarcat: -bs is for AT activities, coordination of small projects, etc. This channel is for more or less anything.
17:14 ^🔗	anarcat	ok
17:14 ^🔗	anarcat	well, this is bordeline
17:14 ^🔗	JAA	I've handled some ArchiveBot jobs for Trac sites, and it's quite a nightmare.
17:14 ^🔗	anarcat	borderline
17:14 ^🔗	anarcat	ah crap
17:14 ^🔗	anarcat	so torproject.org has been looking at switching to gitlab for a while now
17:14 ^🔗	JAA	Well, just the usual "hey, here's another 100000 possible views of the same content!" crap.
17:14 ^🔗	anarcat	and i heard noises it might actually happen, which means i'll need to deal with the Trac
17:15 ^🔗	JAA	With the appropriate ignores, it should archive just fine.
17:15 ^🔗	JAA	Unlike GitLab, it's actually usable without JS.
17:17 ^🔗	anarcat	yeah
17:17 ^🔗	anarcat	you don't happen to have an igset around, do you? ;)
17:17 ^🔗	anarcat	still have*
17:19 ^🔗	JAA	Nope
17:19 ^🔗	anarcat	ack
17:20 ^🔗	JAA	I'll happily merge one though. ;-)
17:20 ^🔗	anarcat	hehe
17:20 ^🔗	anarcat	well to be honest, i don't even know what to use for crawling anymore
17:21 ^🔗	anarcat	i don't want to end up with a WARC file in this case
17:21 ^🔗	anarcat	i'd probably make a static HTML site, hosted indefinitely
17:21 ^🔗	anarcat	i don't think i can pull off the "magic htaccess" redirection game
17:23 ^🔗	JAA	Yeah, depends on what the goal is. However, if the original site disappears, we might still want to grab a copy for the Wayback Machine regardless of whether you continue to host a mirror somewhere.
17:23 ^🔗	anarcat	then maybe this is -bs material after all
17:24 ^🔗	JAA	As for a static HTML mirror, maybe wget --mirror will be useful for this. Still needs a lot of ignores though for all the differently sorted lists etc.
17:24 ^🔗	JAA	I don't remember what wget's options for ignoring things are, but I think recent versions have a --reject-regex option as well.
17:24 ^🔗	JAA	wpull doesn't have --mirror.
17:25 ^🔗	JAA	Maybe you can emulate it with other options, I've never used it, so I can't tell you more about that.
17:26 ^🔗	anarcat	why --mirror?
17:26 ^🔗	anarcat	looks like it was somewhat crawled, but a long time ago https://archive.fart.website/archivebot/viewer/domain/trac.torproject.org
17:26 ^🔗	anarcat	should i just kick a crawl now?
17:27 ^🔗	anarcat	then we can tweak a igset as we go?
17:27 ^🔗	JAA	I need to leave for a bit now, how about we throw it in in an hour or so?
17:27 ^🔗	anarcat	sure, no probs
17:28 ^🔗	anarcat	can we extract the igset from archivebot later?
17:28 ^🔗	anarcat	so we can reuse it?
17:28 ^🔗	JAA	Yeah, just a matter of a few greps on the IRC logs. :-)
17:29 ^🔗	anarcat	ah
17:29 ^🔗	anarcat	that's not the best :p
17:36 ^🔗		Anthony1 has joined #archiveteam-ot
18:12 ^🔗	JAA	-> #archivebot
19:16 ^🔗	jut	Anybody here from warmer climates? How do you survive 35C heat?
19:17 ^🔗	astrid	shade, water
19:18 ^🔗	astrid	the human body is capable of adapting its metabolism to a wide variety of climates but in my experience it takes it a week or two of no aircon to figure out how to deal with heat
19:20 ^🔗	anarcat	moving north
19:22 ^🔗	Fusl	jut: 40C inside, i have an air condition directly pointing my face
19:22 ^🔗	anarcat	yuck
19:23 ^🔗	jut	Yeah it's northern europe, air conditioning is rare
19:29 ^🔗	jut	Also, our school year has been extended by 3 weeks, and our schools are not built to be occupied during the summer.
19:29 ^🔗	VoynichCr	jut: use winter wallpapers, it helps
19:31 ^🔗	astrid	like, pictures of snow scenes?
19:31 ^🔗	VoynichCr	yeah
19:32 ^🔗	VoynichCr	(lol, it was a joke)
19:33 ^🔗	jut	You know I will try
20:09 ^🔗		Anthony1 has quit IRC (Ping timeout: 260 seconds)
20:43 ^🔗		dhyan_nat has joined #archiveteam-ot
21:06 ^🔗		n00b659 has joined #archiveteam-ot
21:06 ^🔗		n00b659 has left
21:08 ^🔗		n00b has joined #archiveteam-ot
21:10 ^🔗		n00b has left
21:11 ^🔗		n00b680 has joined #archiveteam-ot
21:11 ^🔗		n00b680 has left
22:27 ^🔗		dhyan_nat has quit IRC (Read error: Operation timed out)
22:28 ^🔗		Jens has quit IRC (Remote host closed the connection)
22:28 ^🔗		Jens has joined #archiveteam-ot
22:51 ^🔗		BlueMax has joined #archiveteam-ot

irclogger-viewer