Time |
Nickname |
Message |
00:10
🔗
|
|
systwi has joined #archiveteam-ot |
00:16
🔗
|
|
BlueMax has joined #archiveteam-ot |
00:44
🔗
|
anarcat |
is there a way to get cloaks on efnet? |
00:44
🔗
|
anarcat |
since this is -ot after all |
00:45
🔗
|
ivan |
don't think so |
00:45
🔗
|
anarcat |
too bad |
00:45
🔗
|
anarcat |
this is the only network that leaks my client IP |
00:46
🔗
|
* |
Flashfire Starts researching anarcat |
00:50
🔗
|
|
icedice has quit IRC (Leaving) |
00:58
🔗
|
|
markedL has quit IRC (The Lounge - https://thelounge.chat) |
00:58
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
01:03
🔗
|
t3 |
http://www.pc-collection.com/collection-ibm.php |
01:06
🔗
|
|
asdf0101 has joined #archiveteam-ot |
01:06
🔗
|
|
markedL has joined #archiveteam-ot |
01:06
🔗
|
t3 |
I'm trying to find the model of the IBM computer I used to have. |
01:07
🔗
|
t3 |
I think it's from 1996 or 1997. It looks very similar http://www.pc-collection.com/images/i/ibm/IBM_Aptiva_2193-420.jpg but it has an AMD processor and it doesn't have the two ports on the bottom part. |
01:08
🔗
|
t3 |
And it didn't come with blue speakers. They were beige/grey, like the color of the computer, monitor, and peripherals. |
01:08
🔗
|
t3 |
But it still had those blue accents. |
01:08
🔗
|
t3 |
The computer still had those blue accents. |
01:09
🔗
|
t3 |
I think it originally shipped with Windows 95, but was upgraded to Windows 98. |
01:09
🔗
|
t3 |
I mean upgraded to Windows 98 SE. |
01:10
🔗
|
t3 |
It had a disc drive on the top and a floppy disk drive in the middle, like in the picture. |
01:14
🔗
|
t3 |
So I'm positive it is an IBM Aptiva computer. I know it had an AMD processor. It probably also had the Windows 95 sticker on the front. |
01:42
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
02:03
🔗
|
|
kiska1 has joined #archiveteam-ot |
02:03
🔗
|
|
Fusl sets mode: +o kiska1 |
02:11
🔗
|
|
chirlu has joined #archiveteam-ot |
02:16
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
03:12
🔗
|
systwi |
Does archivebot obey robots.txt? |
03:13
🔗
|
Flashfire |
systwi if by obey you mean use it to find more content to archive then yes |
03:14
🔗
|
systwi |
I meant if a robots.txt file tells web crawlers to not archive their contents, does archivebot obey that? I fear some of the pages I tossed into AB were not saved if this is the case. |
03:16
🔗
|
Flashfire |
If a site is manually excluded from wayback the contents wont end up there but all robots.txt are ignored by AB |
03:22
🔗
|
ivan |
systwi: it ignores robots.txt |
03:22
🔗
|
Fusl |
https://www.archiveteam.org/index.php?title=Robots.txt |
03:23
🔗
|
Fusl |
ArchiveBot understands robots.txt (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however. |
03:23
🔗
|
ivan |
pipeline/archivebot/seesaw/wpull.py |
03:23
🔗
|
ivan |
28: '--no-robots', |
03:25
🔗
|
systwi |
Cool, thank god |
03:25
🔗
|
systwi |
That puts me at ease |
03:26
🔗
|
systwi |
On a different note, I think we should grab everything ProJared; his site, YT videos, social media content, etc. There have been some allegations of him cheating on his wife circulating on the net which makes me wonder if he's going to delete everything. |
03:27
🔗
|
ivan |
I have his YouTube |
03:29
🔗
|
systwi |
That's good to know. I assume it's all encompassing--or at least as all encompassing as possible? |
03:29
🔗
|
ivan |
what youtube-dl grabs |
03:30
🔗
|
ivan |
missing the comments and chat replays |
03:30
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
03:39
🔗
|
|
BlueMax has joined #archiveteam-ot |
03:45
🔗
|
systwi |
ivan: Check this tool out: https://github.com/philbot9/youtube-comment-scraper-cli |
03:45
🔗
|
systwi |
I use this to save them to a .json file and it works pretty well |
03:45
🔗
|
systwi |
Not sure about YT chat replays though |
03:45
🔗
|
systwi |
Twitch chat replays, though, can be scraped too. If you want to know of a tool let me know |
03:46
🔗
|
ivan |
I used https://github.com/egbertbouman/youtube-comment-downloader yesterday |
03:49
🔗
|
systwi |
Hmm, interesting, I wonder how different these two are |
03:54
🔗
|
systwi |
ivan: Do you know, does it also save information like if the comment is pinned, loved, like/dislike status, post date, edited (Y/N), etc.? |
03:55
🔗
|
ivan |
youtube-comment-downloader saves a relative date but probably none of those other things |
03:56
🔗
|
ivan |
maybe these should be saved with a browser |
03:56
🔗
|
ivan |
or maybe the html from the ajax endpoint needs to be concatenated |
03:57
🔗
|
systwi |
If you did want to try it out, I know that youtube-comment-scraper-cli saves all of that, minus the pinned and loved. Maybe/hopefully those were added in a newer verison, IDK |
03:57
🔗
|
systwi |
I believe the time stamp is written unix-style |
04:08
🔗
|
systwi |
ivan |
04:08
🔗
|
ivan |
thanks |
04:41
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
08:56
🔗
|
|
Odd0002_ has joined #archiveteam-ot |
08:57
🔗
|
|
Odd0002 has quit IRC (Read error: Operation timed out) |
08:57
🔗
|
|
Odd0002_ is now known as Odd0002 |
09:43
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
10:09
🔗
|
|
BlueMaxim has joined #archiveteam-ot |
10:13
🔗
|
|
BlueMaxim has quit IRC (Client Quit) |
10:19
🔗
|
|
BlueMax has quit IRC (Ping timeout: 615 seconds) |
10:43
🔗
|
JAA |
systwi: I grabbed a bunch of ProJared Twitter stuff last month when this whole thing blew up. |
10:43
🔗
|
JAA |
(At the request of Ryz) |
10:51
🔗
|
Flashfire |
!a https://m.youtube.com/watch?feature=youtu.be&v=egIqLe4b9_A |
11:14
🔗
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
11:14
🔗
|
|
wp494 has joined #archiveteam-ot |
15:08
🔗
|
systwi |
JAA: Cool, glad to know some of it is saved. |
15:48
🔗
|
|
sep332 has joined #archiveteam-ot |
16:54
🔗
|
|
Dj-Wawa has joined #archiveteam-ot |
16:59
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
17:10
🔗
|
anarcat |
still unclear to me what the distinction between here and -bs is |
17:10
🔗
|
anarcat |
but whatever |
17:10
🔗
|
anarcat |
anyone has experience archiving Trac sites? |
17:10
🔗
|
anarcat |
preferably to static HTML? |
17:13
🔗
|
JAA |
anarcat: -bs is for AT activities, coordination of small projects, etc. This channel is for more or less anything. |
17:14
🔗
|
anarcat |
ok |
17:14
🔗
|
anarcat |
well, this is bordeline |
17:14
🔗
|
JAA |
I've handled some ArchiveBot jobs for Trac sites, and it's quite a nightmare. |
17:14
🔗
|
anarcat |
borderline |
17:14
🔗
|
anarcat |
ah crap |
17:14
🔗
|
anarcat |
so torproject.org has been looking at switching to gitlab for a while now |
17:14
🔗
|
JAA |
Well, just the usual "hey, here's another 100000 possible views of the same content!" crap. |
17:14
🔗
|
anarcat |
and i heard noises it might actually happen, which means i'll need to deal with the Trac |
17:15
🔗
|
JAA |
With the appropriate ignores, it should archive just fine. |
17:15
🔗
|
JAA |
Unlike GitLab, it's actually usable without JS. |
17:17
🔗
|
anarcat |
yeah |
17:17
🔗
|
anarcat |
you don't happen to have an igset around, do you? ;) |
17:17
🔗
|
anarcat |
still have* |
17:19
🔗
|
JAA |
Nope |
17:19
🔗
|
anarcat |
ack |
17:20
🔗
|
JAA |
I'll happily merge one though. ;-) |
17:20
🔗
|
anarcat |
hehe |
17:20
🔗
|
anarcat |
well to be honest, i don't even know what to use for crawling anymore |
17:21
🔗
|
anarcat |
i don't want to end up with a WARC file in this case |
17:21
🔗
|
anarcat |
i'd probably make a static HTML site, hosted indefinitely |
17:21
🔗
|
anarcat |
i don't think i can pull off the "magic htaccess" redirection game |
17:23
🔗
|
JAA |
Yeah, depends on what the goal is. However, if the original site disappears, we might still want to grab a copy for the Wayback Machine regardless of whether you continue to host a mirror somewhere. |
17:23
🔗
|
anarcat |
then maybe this is -bs material after all |
17:24
🔗
|
JAA |
As for a static HTML mirror, maybe wget --mirror will be useful for this. Still needs a lot of ignores though for all the differently sorted lists etc. |
17:24
🔗
|
JAA |
I don't remember what wget's options for ignoring things are, but I think recent versions have a --reject-regex option as well. |
17:24
🔗
|
JAA |
wpull doesn't have --mirror. |
17:25
🔗
|
JAA |
Maybe you can emulate it with other options, I've never used it, so I can't tell you more about that. |
17:26
🔗
|
anarcat |
why --mirror? |
17:26
🔗
|
anarcat |
looks like it was somewhat crawled, but a long time ago https://archive.fart.website/archivebot/viewer/domain/trac.torproject.org |
17:26
🔗
|
anarcat |
should i just kick a crawl now? |
17:27
🔗
|
anarcat |
then we can tweak a igset as we go? |
17:27
🔗
|
JAA |
I need to leave for a bit now, how about we throw it in in an hour or so? |
17:27
🔗
|
anarcat |
sure, no probs |
17:28
🔗
|
anarcat |
can we extract the igset from archivebot later? |
17:28
🔗
|
anarcat |
so we can reuse it? |
17:28
🔗
|
JAA |
Yeah, just a matter of a few greps on the IRC logs. :-) |
17:29
🔗
|
anarcat |
ah |
17:29
🔗
|
anarcat |
that's not the best :p |
17:36
🔗
|
|
Anthony1 has joined #archiveteam-ot |
18:12
🔗
|
JAA |
-> #archivebot |
19:16
🔗
|
jut |
Anybody here from warmer climates? How do you survive 35C heat? |
19:17
🔗
|
astrid |
shade, water |
19:18
🔗
|
astrid |
the human body is capable of adapting its metabolism to a wide variety of climates but in my experience it takes it a week or two of no aircon to figure out how to deal with heat |
19:20
🔗
|
anarcat |
moving north |
19:22
🔗
|
Fusl |
jut: 40C inside, i have an air condition directly pointing my face |
19:22
🔗
|
anarcat |
yuck |
19:23
🔗
|
jut |
Yeah it's northern europe, air conditioning is rare |
19:29
🔗
|
jut |
Also, our school year has been extended by 3 weeks, and our schools are not built to be occupied during the summer. |
19:29
🔗
|
VoynichCr |
jut: use winter wallpapers, it helps |
19:31
🔗
|
astrid |
like, pictures of snow scenes? |
19:31
🔗
|
VoynichCr |
yeah |
19:32
🔗
|
VoynichCr |
(lol, it was a joke) |
19:33
🔗
|
jut |
You know I will try |
20:09
🔗
|
|
Anthony1 has quit IRC (Ping timeout: 260 seconds) |
20:43
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
21:06
🔗
|
|
n00b659 has joined #archiveteam-ot |
21:06
🔗
|
|
n00b659 has left |
21:08
🔗
|
|
n00b has joined #archiveteam-ot |
21:10
🔗
|
|
n00b has left |
21:11
🔗
|
|
n00b680 has joined #archiveteam-ot |
21:11
🔗
|
|
n00b680 has left |
22:27
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
22:28
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
22:28
🔗
|
|
Jens has joined #archiveteam-ot |
22:51
🔗
|
|
BlueMax has joined #archiveteam-ot |