#archiveteam-bs 2017-12-21,Thu

↑back Search

Time Nickname Message
00:05 🔗 zino jrwr, not that it helps, but I'm sending all my symphaty and good thoughts.
00:05 🔗 jrwr Im going to hack up some cgi-bin
00:05 🔗 jrwr and compile php7 on this box
00:05 🔗 jrwr wouldnt be the first time I've worked around shit like this (LOOKING AT YOUR FERAL)
00:56 🔗 jrwr so I turned on file based cache
00:57 🔗 jrwr it should help /some/
00:57 🔗 jrwr I'm working with apache some to get this working its being a PITA, I suspect a apache module doing this
01:08 🔗 BnAboyZ has joined #archiveteam-bs
01:27 🔗 Somebody2 BTW, regarding WARC uploads going into the Wayback Machine -- I've now gotten confirmation that it is still a trusted-uploaders-only process (which isn't surprising).
01:28 🔗 Somebody2 JAA is trusted, and ivan as well, presumably.
01:42 🔗 jrwr So SketchCow, I'm stuck, the PHP is too old to update mediawiki, Apache is not behaving with the CGI override due to mod_security being forced on, and overall the entire account is limited to 7 (confirmed) connections concurrent (thats whats causing the resource limit pages currently)
01:42 🔗 jrwr I've added the static file cache and it is helping
01:44 🔗 SketchCow There'll be some roughness as we figure out what to do.
01:44 🔗 jrwr Ya
01:44 🔗 jrwr its using 2.6 as its kernel....
01:44 🔗 SketchCow But if I have intelligent requests for the host, I'm sure they can help.
01:46 🔗 jrwr Ok, So the main one is can I have my limits increased for the number of CGI scripts ran at one one time. I keep getting resource limit errors on top of this error log: [Wed Dec 20 20:41:37 2017] [error] mod_hostinglimits:Error on LVE enter: LVE(527) HANDLER(application/x-httpd-php5) HOSTNAME(archiveteam.org) URL(/index.php) TID(318310) errno (7) Read more: http://e.cloudlinux.com/MHL-E2BIG min_uid (0)
01:46 🔗 SketchCow Well, assemble them all in one place for me.
01:47 🔗 SketchCow I mean after a day of looking it COMPLETELY over
01:47 🔗 SketchCow And then I'll bring it to TQ and see what they thing
01:47 🔗 SketchCow think
01:47 🔗 jrwr Ok
01:47 🔗 SketchCow No sense in piecemealing
01:47 🔗 SketchCow Also, let me glance at the cpanel
01:49 🔗 jrwr Ok
01:50 🔗 jrwr Ya, its pretty much those two issues I have with it, I'm compiling them into a google sheet for tracking
01:58 🔗 jacketcha Wow. I was having problems compiling WARC files in javascript and was going to ask if there was a preexisting API for something like that, but I can barely even read what you guys are saying.
01:58 🔗 jrwr Im poking the poor wiki very hard
01:59 🔗 jrwr whats up jacketcha
01:59 🔗 jrwr WARC reading in javascript, hrm
01:59 🔗 jacketcha yeah
01:59 🔗 jrwr Well, the WARC standard is pretty simple overall
01:59 🔗 jrwr its all about the indexing and lookups that make it fast and JavaScript is not that great at it.
02:00 🔗 jrwr https://www.npmjs.com/package/node-warc
02:00 🔗 jrwr have some node
02:00 🔗 jrwr its /javascript/
02:00 🔗 jacketcha thanks
02:01 🔗 jacketcha I was planning to add it to my chrome extension
02:01 🔗 jrwr ah
02:02 🔗 jrwr im logging off for now SketchCow, See ya in the morning
02:13 🔗 robink has quit IRC (Ping timeout: 246 seconds)
02:15 🔗 jacketcha ok, so can somebody explain to me how warc files work
02:15 🔗 jacketcha sorry for being dumb
02:15 🔗 jacketcha i whonestly have no idea
02:15 🔗 jacketcha *honestly
02:16 🔗 Frogging do you have a more specific question?
02:20 🔗 jacketcha How is the data structured? I am going to assume that it isn't just copying in the HTML source code after the headers are added.
02:20 🔗 Frogging It stores the full response headers and body
02:21 🔗 Frogging That includes responses containing binary data, HTML, CSS, plain text, whatever
02:22 🔗 jacketcha Is there any specific order to that?
02:24 🔗 Frogging Records can be in any order AFAIK
02:25 🔗 jacketcha Great, so it'll work just find with asynchronous saving. Thanks, that was actually really helpful.
02:28 🔗 Frogging http://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/index.html
02:28 🔗 jacketcha thanks!
02:29 🔗 Frogging there's WARC 1.1 (which is the latest version) linked there too
02:38 🔗 bithippo has joined #archiveteam-bs
02:43 🔗 bithippo Thinking about grabbing Imgur. All of it. Anything I should keep in mind prior to putting it in cold storage?
02:44 🔗 bithippo (iterating over ever permutation of image urls based on how Imgur generates image urls)
02:47 🔗 jacketcha Is the way Imgur generates urls known?
02:47 🔗 jacketcha Better question, is the source of data for the RNG Imgur uses known?
02:48 🔗 robink has joined #archiveteam-bs
02:50 🔗 bithippo https://blog.imgur.com/2013/01/18/more-characters-in-filenames/
02:50 🔗 bithippo "Choosing 5 characters from 26 lowercase letters + 26 uppercase letters + 10 numerical digests, leaves us with 916,132,832 possible combinations (625). Upgrading to 7 characters gives us 3,521,614,606,208 (3.52 trillion) possibilities."
02:53 🔗 bithippo 404->check back in the future, 200->WARC gz
02:53 🔗 bithippo Etag header on a request is the MD5 of the image
02:53 🔗 jacketcha That is still 3 trillion web requests
02:53 🔗 bithippo ¯\_(ツ)_/¯
02:54 🔗 bithippo Alternatives? Besides waiting until Imgur runs out of runway and then its more pressing :/
02:55 🔗 bithippo (ie Twitpic 2.0)
02:56 🔗 bithippo My only frustration is that the URL isn't deterministic from a hash of the image, so it's possible an image exists, is deleted, and then replaced without any way to know
02:58 🔗 jacketcha Unless it was already archived
02:58 🔗 jacketcha Look, it's a good idea in practice, but here's the thing
02:58 🔗 bithippo Ahh, truth
02:58 🔗 jacketcha Imgur gets around 17.3611111111 new images per second
02:59 🔗 jacketcha That would place it at around 2687000000 images today
03:01 🔗 bithippo Le sigh.
03:01 🔗 jacketcha It gets worse
03:02 🔗 jacketcha That means you have a roughly 0.076300228743465619423705658937702969914635168416766807608280491104220976146849283303277891127907878357259164917744009861455363906114286574925748247085571170136781627322552461470824865159611513732687195% chance of getting an image every time you send a request
03:03 🔗 jacketcha Don't be fooled by the high precision, the accuracy of your plan is very low.
03:03 🔗 jacketcha But, there is a way to make it higher.
03:04 🔗 jacketcha Much higher, in fact
03:04 🔗 jacketcha If you can figure out the source of the data that Imgur uses for its random number generation algorithms, you can at least grab the newest images
03:06 🔗 bithippo Sounds workable using their API to get latest image paths and then working backwards
03:07 🔗 jacketcha possibly
03:08 🔗 jacketcha But if it is randomly generated, even pseudorandomly generated, you're still screwed.
03:09 🔗 jacketcha or
03:09 🔗 jacketcha you know
03:09 🔗 jacketcha you could just email them
03:09 🔗 jacketcha and ask
03:10 🔗 bithippo "Hi. I will take one Imgur pls. Will be over with hard drives shortly."
03:10 🔗 bithippo Appreciate the input!
03:11 🔗 jacketcha No problem
03:19 🔗 jacketcha hold up
03:20 🔗 jacketcha If a Warrior or ArchiveBot finds a WARC file, is it added to the collection of WARC files or is it added into the WARC file of the site it is located on?
03:23 🔗 godane i'm up to 18k items this month
03:23 🔗 godane this year has been slower then last year
03:23 🔗 godane i just hope i can get it past 100k for the year
03:24 🔗 godane https://archive.org/details/@chris85?&and[]=addeddate:2017
03:24 🔗 godane it 97,682 so far
03:24 🔗 jacketcha woah
03:24 🔗 jacketcha maybe I should start counting mine
03:24 🔗 jacketcha and putting it in actual warc files
03:33 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
03:45 🔗 robink has quit IRC (Ping timeout: 246 seconds)
03:48 🔗 Somebody2 jacketcha: if a HTTP request returns a WARC file, and that HTTP request and response is being stored into a WARC file,
03:48 🔗 Somebody2 then, yes, you'll have nested WARC-formatted data
03:48 🔗 Somebody2 AFAIK, no WARC-recording tool will automatically un-nest it (and that would probably not be a good idea in any case)
03:55 🔗 jacketcha wait
03:56 🔗 jacketcha so that means that there possibly could be an archive of the entire internet floating around the wayback machine somewhere, but nobody would ever know because it was nested.
04:01 🔗 SketchCow This is.....
04:01 🔗 SketchCow See, this is one of the things
04:01 🔗 SketchCow You are asking... well, you're asking for a college course in how WARC works
04:01 🔗 SketchCow It's sort of on topic and sort of off
04:01 🔗 SketchCow It's certainly sucking all the air out of the room
04:01 🔗 SketchCow It's nice to see people talking
04:01 🔗 jacketcha so nested warc files are basically politics
04:02 🔗 jacketcha got it
04:11 🔗 bithippo has quit IRC (Ping timeout: 260 seconds)
04:14 🔗 SketchCow No.
04:14 🔗 SketchCow You're wandering into a welding shop going "So.... why cold welds"
04:16 🔗 jacketcha that seems very accurate
04:20 🔗 bithippo has joined #archiveteam-bs
04:33 🔗 kyounko has joined #archiveteam-bs
04:55 🔗 qw3rty117 has joined #archiveteam-bs
05:01 🔗 qw3rty116 has quit IRC (Read error: Operation timed out)
05:22 🔗 bithippo has quit IRC (Quit: Page closed)
05:29 🔗 Stiletto has quit IRC (Read error: Operation timed out)
05:34 🔗 Stilett0 has joined #archiveteam-bs
05:35 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
05:36 🔗 BlueMaxim has joined #archiveteam-bs
06:02 🔗 wp494 has quit IRC (Ping timeout: 250 seconds)
06:02 🔗 wp494 has joined #archiveteam-bs
06:12 🔗 zgrant has left
06:15 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
06:16 🔗 wp494 has joined #archiveteam-bs
06:27 🔗 kimmer1 has joined #archiveteam-bs
06:29 🔗 midas2 has quit IRC (Ping timeout: 1212 seconds)
06:33 🔗 kimmer12 has quit IRC (Ping timeout: 633 seconds)
06:47 🔗 midas2 has joined #archiveteam-bs
07:24 🔗 wp494_ has joined #archiveteam-bs
07:29 🔗 ZexaronS has quit IRC (Read error: Connection reset by peer)
07:30 🔗 ZexaronS has joined #archiveteam-bs
07:31 🔗 wp494 has quit IRC (Read error: Operation timed out)
07:39 🔗 wp494_ has quit IRC (Ping timeout: 633 seconds)
07:41 🔗 odemg has quit IRC (Read error: Operation timed out)
07:46 🔗 odemg has joined #archiveteam-bs
07:53 🔗 wp494 has joined #archiveteam-bs
08:03 🔗 robink has joined #archiveteam-bs
08:06 🔗 jacketcha I wonder how many times jquery has been archived
08:07 🔗 jacketcha By this point, there must be at least a hundred copies made of it each week
09:09 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
09:09 🔗 jacketcha has joined #archiveteam-bs
09:16 🔗 Mateon1 has quit IRC (Ping timeout: 245 seconds)
09:16 🔗 Mateon1 has joined #archiveteam-bs
09:35 🔗 jrwr SWEET BABY JESUS
09:35 🔗 jrwr someone, I got php7 to run
09:35 🔗 jrwr on this holy shit old host
09:39 🔗 PurpleSym Is this a dedicated server, jrwr?
09:39 🔗 jrwr not even close
09:40 🔗 jrwr its a shared host running linux 2.6 on a old cpanel
09:40 🔗 jrwr running a god old apache + php 5.3
09:40 🔗 jrwr I override the mod_security and mod_suphp all to fux to get PHP scripts to run with a custom statically linked php binary I made
09:41 🔗 PurpleSym Wtf? 2.6 EOL’d years ago.
09:43 🔗 jrwr I'm making do with what I have
09:43 🔗 jrwr its where its staying
09:43 🔗 jrwr I'm making its own little world on this webhost
09:43 🔗 * jrwr compiles memcached
09:45 🔗 jrwr now
09:45 🔗 jrwr comes the fun part
09:45 🔗 jrwr I'm going to update mediawiki
09:46 🔗 jacketcha good luck
09:47 🔗 jacketcha i can't even update windows without doing a ritual to please the tech gods
09:50 🔗 jrwr this is dark magic
09:50 🔗 jrwr php does not like doing this
09:52 🔗 jrwr I have a plan
09:52 🔗 jrwr to compile php with memcached, and then run a little memcached server so mediawiki can cache objects
09:53 🔗 jacketcha nah
09:53 🔗 jacketcha you don't even need php
09:54 🔗 jacketcha just do what I do and use 437 IFTTT applets as your server
09:54 🔗 jacketcha with a touch of github pages
09:54 🔗 jrwr lol
09:54 🔗 jrwr this is the archiveteam wiki I'm working on
09:55 🔗 jacketcha Is the ArchiveTeam wiki archived?
09:55 🔗 Igloo Mostly :p
09:57 🔗 jacketcha You know what I want to try? My school has unlimited storage on all google accounts under its organization. I wonder how far they would let me push that.
09:58 🔗 jrwr Its staying where it is for now
09:58 🔗 jrwr for ~reasons~
09:58 🔗 jacketcha is it because you missed a semicolon somewhere but there isn't a really good php linter yet
10:00 🔗 jacketcha oh no
10:01 🔗 jacketcha i just remembered i have midterms
10:01 🔗 jacketcha gn
10:19 🔗 jrwr oh man
10:19 🔗 jrwr thats a ton better
10:19 🔗 jacketcha has quit IRC (Remote host closed the connection)
10:19 🔗 jrwr Archive team is now running on mediawiki 1.30.0
10:21 🔗 jacketcha has joined #archiveteam-bs
10:24 🔗 jacketcha has quit IRC (Remote host closed the connection)
10:29 🔗 jacket has joined #archiveteam-bs
10:43 🔗 fie has joined #archiveteam-bs
10:48 🔗 fie has quit IRC (Read error: Connection reset by peer)
11:07 🔗 pizzaiolo has joined #archiveteam-bs
11:14 🔗 jrwr Igloo: better huh?
11:16 🔗 Igloo jrwr: miles and miles
11:16 🔗 jrwr Ya
11:17 🔗 jrwr Response times are sub 200ms
11:17 🔗 jrwr Before they where 1400ms
11:20 🔗 JAA jrwr: Well done! Much, much better. <3
11:20 🔗 jrwr Thanks
11:21 🔗 JAA Somebody2: Ah, makes sense. Thanks for checking.
11:21 🔗 jrwr I woke up from a strange dream at 3am (flying a airplane and somewhat crashing it)
11:22 🔗 jrwr And then had a brainwave on how to get php working correctly
11:22 🔗 jrwr Been up since then, work is going to be hell today
12:08 🔗 BlueMaxim has quit IRC (Leaving)
12:40 🔗 Igloo Ok, So it looks like we can iterate through the numbers
12:41 🔗 JAA For user profiles, maybe. For characters, no way.
12:41 🔗 Igloo 212 million users? Unlikely
12:42 🔗 JAA But it should be fairly simple to scrape them from https://www.saintsrow.com/community/characters/mostrecent
12:42 🔗 JAA The question is, how do we get the actual characters (not just the images)?
12:42 🔗 Smiley is there a log of what has gone before?
12:42 🔗 Smiley I have a 'archiveteam' account registered along with my personal one
12:42 🔗 Smiley not sure why
12:42 🔗 Smiley maybe i just suggested scraping for SR3
12:46 🔗 Igloo https://www.saintsrow.com/users/show/212300001 appears to be the lowest. https://www.saintsrow.com/users/show/213056573 latest
12:46 🔗 Igloo 2300001 3056573 ~705,000 user profiles?
12:47 🔗 Igloo Those are easy
13:22 🔗 jrwr SketchCow: email fixed
13:22 🔗 jrwr confirmed working with password resets being sent to a gmail account
13:23 🔗 SketchCow Great
13:32 🔗 jrwr https://usercontent.irccloud-cdn.com/file/brVInVWJ/image.png
13:32 🔗 jrwr you can see when I dropped in the php changes
13:52 🔗 jrwr joepie91: it doesnt work like that
13:52 🔗 jrwr this whole box is from 2011
13:53 🔗 joepie91 ah, just an old cpanel then that doesn't support it, or?
14:02 🔗 jrwr ya
14:02 🔗 jrwr I have methods and apis
14:02 🔗 jrwr im patching it in
14:05 🔗 icedice has joined #archiveteam-bs
14:13 🔗 icedice has quit IRC (Ping timeout: 250 seconds)
14:13 🔗 JAA jrwr: LOL, that graph is beautiful!
14:28 🔗 jrwr Thanks
14:28 🔗 jrwr its going up on my wall
15:18 🔗 jrwr JAA: Igloo
15:18 🔗 jrwr guess what
15:18 🔗 jrwr SSL BITCHES
15:19 🔗 JAA Yiss
15:19 🔗 jrwr https://www.ssllabs.com/ssltest/analyze.html?d=archiveteam.org
15:19 🔗 jrwr fucking A rating!
15:20 🔗 MrRadar2 :D :D :D https://i.imgur.com/CloHYLR.png
15:20 🔗 jrwr with Strict Transport Security (HSTS) on (left it pretty short just in case)
15:20 🔗 Igloo Just need a redirect now ;-)
15:20 🔗 JAA ^
15:21 🔗 jrwr na, not going to enforce it
15:21 🔗 jrwr HTST is enough
15:23 🔗 zgrant has joined #archiveteam-bs
15:24 🔗 jrwr fuck it
15:24 🔗 jrwr done
15:24 🔗 jrwr SketchCow: SSL is now installed
15:24 🔗 jrwr anything else?
15:24 🔗 Igloo :)
15:24 🔗 Igloo hehe, all my home stuff with LE gets A rating too
15:24 🔗 Igloo Which is bonza
15:26 🔗 SketchCow I think that's all I can think of
15:26 🔗 SketchCow Someone proposed some sort of theme upgrade
15:27 🔗 SketchCow But it all seems just fine to me now.
15:31 🔗 jrwr ah
15:31 🔗 jrwr its fine
15:32 🔗 jrwr I /might/ get bored and add in a new editor but the new editor requires all kinds of crazy
15:32 🔗 SketchCow If people come up with things, we'll consider them now that it's possible
15:32 🔗 SketchCow Generally, someone complaining they can't work on te Wiki because they miss a gimgaw are focused on the wrong things.
15:33 🔗 jrwr Ya
15:33 🔗 jrwr I am using the file based cache built into mw
15:33 🔗 jrwr so bots and stuff all get served static pages
15:37 🔗 jrwr I feel like I just refurbed my 1984 Chrysler lebaron convertible (I own one) https://drive.google.com/file/d/1AQqXNiluKTk5xuCYStfVexiH1LLUOYaLLQ/view?usp=sharing
15:43 🔗 Igloo Nice car
15:45 🔗 jrwr 900$
15:45 🔗 jrwr runs great, and talks to you
15:46 🔗 jrwr https://www.youtube.com/watch?v=nGuRS-L2BN0
15:53 🔗 jrwr I love the old DEC speech Synths
15:53 🔗 jrwr sound better then software ones
16:28 🔗 godane so another box of tapes i bought is shipped
16:50 🔗 dd0a13f37 has joined #archiveteam-bs
16:53 🔗 godane so this happenedhttp://mashable.com/2017/12/20/sesame-street-irc-macarthur-grant-refugee-middle-east/#fIS9la5_bSq7
17:18 🔗 schbirid has joined #archiveteam-bs
17:21 🔗 jacket has quit IRC (Read error: Connection reset by peer)
17:22 🔗 jacket has joined #archiveteam-bs
17:23 🔗 dd0a13f37 aria2c is a mystery
17:23 🔗 dd0a13f37 if I have it use 1 connection or 10, I still get about 2r/s
17:24 🔗 dd0a13f37 if I split it up across 6 command windows, 12r/s
17:24 🔗 dd0a13f37 might have to do with the fact that it's split across multiple IPs though
17:25 🔗 dd0a13f37 Anyone know a good tool to do this automatically? Split up http requests over multiple proxies?
17:59 🔗 bithippo has joined #archiveteam-bs
18:42 🔗 ola_norsk is there some sort of code available to look at on how IA get urls from warcs?
18:43 🔗 ola_norsk or does it convert first, somehow, then do _that_ ?
18:45 🔗 ola_norsk i was kind of expecting warcs to a kind of archive with an index, not all data being a single file :/
18:46 🔗 ola_norsk e.g containing something i could open in gedit etc..
18:50 🔗 dd0a13f37 iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/index.html
18:51 🔗 bithippo https://github.com/recrm/ArchiveTools/wiki/warc-extractor
18:52 🔗 dd0a13f37 Can someone help me?
18:53 🔗 dd0a13f37 I'm trying to archive a site with aria2c. When using multiple concurrent connections, I get about 2r/s. The speed is around 50bkit, despite my internet connection being much faster.
18:53 🔗 dd0a13f37 When using multiple instances over multiple IPs, it's not much better. The individual speeds for some drop down to 5kbit.
18:54 🔗 bithippo Recommend using https://github.com/ludios/grab-site to archive instead
18:54 🔗 dd0a13f37 I am currently running 50 instances of aria2 with 20 concurrent connections each. I get 5r/s total.
18:54 🔗 bithippo If you need a WARC file, request/response headers, etc.
18:54 🔗 dd0a13f37 This is abysmally low, what gives?
18:54 🔗 bithippo You are most likely being throttled by IP
18:54 🔗 dd0a13f37 But I'm spreading it over 50 different IPs.
18:54 🔗 bithippo (even if distributing across multiple IPs)
18:55 🔗 bithippo Sliding window of bytes in the webserver config. Initial requests are fast, subsequent requests slow down if you try to firehose
18:55 🔗 bithippo What's the hostname?
18:55 🔗 dd0a13f37 ratsit.se
18:55 🔗 dd0a13f37 Or my hostname?
18:55 🔗 bithippo Nope, site hostname. Checking something.
18:55 🔗 dd0a13f37 So how can it throttle them to 2-5 KiB/s when using 50 different IPs, but 50 KiB/s when using 1?
18:57 🔗 bithippo Are these all anonymous web requests? Or are you signed in/setting a cookie to be logged in to fetch data?
18:58 🔗 dd0a13f37 These are all anonymous requests from tor exit nodes. No cookies are stored.
18:58 🔗 bithippo Could be throttling by tor IPs. I did that at my last gig on our Nginx servers.
18:58 🔗 bithippo (Tor requests were notoriously bad scraping actors in our case)
18:58 🔗 dd0a13f37 But I used tor IPs before too. And those were at 50KiB/s, not 2-5KiB/s.
18:59 🔗 bithippo I don't have a good answer unfortunately :/ Lots of variables that could be causing it. What's the purpose of using Tor to perform the requests?
18:59 🔗 dd0a13f37 The only logical explanation is my connection being the bottleneck, but that would put it at around 2 mbit, which is way too slow
19:00 🔗 dd0a13f37 Because I don't want to get in any trouble for the scraping, and they could ban my IP
19:01 🔗 dd0a13f37 Now it jumped up to 12 resp/s, which was my previous peak when using 6 different IPs.
19:01 🔗 bithippo Is a cloud provider VM out of the question with a slow concurrency rate?
19:01 🔗 bithippo 2 requests per second, say.
19:01 🔗 dd0a13f37 It could be on their end too.
19:02 🔗 dd0a13f37 I could just leave the computer on over night, but I would prefer not to pay any money.
19:02 🔗 dd0a13f37 Could it be they just serve 12 connections at the same time?
19:03 🔗 dd0a13f37 Now down to 7 again. Sure is a mystery what is going on...
19:06 🔗 dd0a13f37 And now back up to 14. At this speed, it will take 6 hours, which is slow but acceptable.
19:07 🔗 bithippo I can rip it for you and provide a torrent file when I'm done.
19:08 🔗 dd0a13f37 It went up to 34 now.
19:08 🔗 dd0a13f37 How? With grab-site?
19:08 🔗 dd0a13f37 They might block the IP being used in that case
19:09 🔗 bithippo 10 second wait between requests
19:09 🔗 bithippo It'll take a while, but it'll finish eventually.
19:09 🔗 bithippo Have to head out, leave me a note here if that's a plan
19:11 🔗 dd0a13f37 10 seconds would take half a year for the whole site, and it would change during the time, so I don't think that's a good idea
19:12 🔗 dd0a13f37 But if it continues to be this fast then it should be done in a few hours, which is good.
19:21 🔗 dd0a13f37 Well, the only logical explanation is some advanced throttling algorithm in place. I can't find any other explanation for why it's so slow.
19:27 🔗 dd0a13f37 https://pastebin.com/VkCS1yJ1 It apparently got faster over time, with a peak of 46 resp/s, before slowing down.
19:45 🔗 ola_norsk any sqlite geninouses savvy to very basic sqlite reational database structure in here who wouldn't mind if ask some questions?
19:45 🔗 ola_norsk relational*
19:46 🔗 dd0a13f37 I have a basic knowledge, shoot
19:47 🔗 ola_norsk dd0a13f37: thanks. if you please take a look at the SQL here, (just page search for 'sqlqueries') https://github.com/DuckHP/twario-warrior-tool/blob/master/src/twario/sqlitetwario.py
19:48 🔗 ola_norsk i'm sure that sql could be done better, and i think you'll agree. Sadly my sql is quite shit
19:49 🔗 ola_norsk to optimize storage etc..i mean
19:49 🔗 ola_norsk and speed etc
19:50 🔗 dd0a13f37 The schema?
19:50 🔗 ola_norsk aye
19:50 🔗 dd0a13f37 You can add a constraint for TweetUserId so it has to have a corresponding entry in Users
19:51 🔗 dd0a13f37 And Users should either have id INTEGER PRIMARY KEY, or TweetUserID should be a username
19:51 🔗 ola_norsk yeah been thinking that so i made a 'users' table
19:51 🔗 ola_norsk ok
19:52 🔗 dd0a13f37 Display name isn't stable, but it might be overkill to provision for that
19:52 🔗 dd0a13f37 search for foreign key constraint
19:52 🔗 ola_norsk aye, i'm not even sure yet if 'tweep' reads display names
19:52 🔗 dd0a13f37 https://sqlite.org/foreignkeys.html http://www.sqlitetutorial.net/sqlite-foreign-key/
19:53 🔗 dd0a13f37 Well, if you want to archive avatars etc it might be neat to have. You could have three tables, but it might be overkill
19:53 🔗 dd0a13f37 tweets - tweet text, date, username
19:53 🔗 dd0a13f37 users - username (not unique), date, avatar, displayname
19:53 🔗 dd0a13f37 or wait, that makes two
19:54 🔗 Valentine has quit IRC (Read error: Connection reset by peer)
19:54 🔗 ola_norsk avatar might be doable
19:54 🔗 dd0a13f37 And then just do SELECT * FROM users WHERE username = ... LIMIT 1
19:54 🔗 ola_norsk ty
19:55 🔗 dd0a13f37 not sure about the syntax
19:58 🔗 ola_norsk the requests sql i can figure out i think, but i suck at schema/structure :/
19:58 🔗 BnAboyZ has quit IRC (Quit: The Lounge - https://thelounge.github.io)
19:59 🔗 ola_norsk i'll check out that link you posted. thanks
20:00 🔗 dd0a13f37 Well, have one users table that for each username can have multiple entries (e.g. if they change their avatar you get a new entry with same username)
20:00 🔗 dd0a13f37 And one tweets table, since they are immutable
20:00 🔗 Valentine has joined #archiveteam-bs
20:03 🔗 ola_norsk e.g if a tweet is identical, just have a 'content' table perhaps, and refereance that?
20:03 🔗 dd0a13f37 If you're ever building your own scraper, it seems like mobile.twitter.com is more pleasant to work with
20:03 🔗 dd0a13f37 view-source:https://twitter.com/jack view-source:https://mobile.twitter.com/jack
20:03 🔗 ola_norsk i'm just "re-doing" a tool called 'tweep'
20:04 🔗 dd0a13f37 As in, modifying it?
20:05 🔗 ola_norsk aye, this: https://github.com/haccer/tweep ..it seems to work quite well, but could use some tweaking
20:05 🔗 dd0a13f37 If you want a complete archive, you could probably crawl pretty nicely. Start off by the timeline, then see what accounts and hashtags you find. Then traverse those accounts and hashtags, see what accounts and hashtags you find.
20:06 🔗 dd0a13f37 >The --fruit feature will display Tweets that might contain sensitive info
20:06 🔗 dd0a13f37 uh
20:07 🔗 ola_norsk aye, have not tested that yet, but i've been thinking of removing it
20:07 🔗 ola_norsk basically 'user' and 'search words' is my focus
20:07 🔗 ola_norsk not exactly too keen on archiving 'doxing' tweets
20:08 🔗 dd0a13f37 Well, why bother taking it out? Just don't use it, or remove all documentation references to it if you're really concerned.
20:08 🔗 ola_norsk aye
20:09 🔗 dd0a13f37 I think mobile.twitter.com is better. It shows 30 tweets/page instead of 20, and the pages are faster to download
20:10 🔗 hook54321 JAA: Are you still grabbing the Catalonia cameras that update every 5 minutes or so?
20:10 🔗 ola_norsk dd0a13f37: it seems to require a signin/account
20:10 🔗 JAA hook54321: Yeah
20:10 🔗 JAA I think so, at least.
20:10 🔗 JAA Let me check.
20:10 🔗 hook54321 lol
20:10 🔗 ola_norsk dd0a13f37: i deliberaly made myself banned from twitter :/
20:10 🔗 hook54321 I need to start recording the cameras I was recording again
20:11 🔗 JAA Yep, it's still grabbing... something.
20:11 🔗 dd0a13f37 mobile.twitter.com doesn't need an account
20:11 🔗 JAA Haven't looked at the content in a long time though.
20:11 🔗 dd0a13f37 https://mobile.twitter.com/jack?max_id=938593014343024639 works just fine for me
20:12 🔗 ola_norsk dd0a13f37: so it's just because i'm using desktop browser then?
20:12 🔗 ola_norsk dd0a13f37: that link worked btw
20:13 🔗 ola_norsk dd0a13f37: doh, i got a "join today to see it all" when scrolling
20:13 🔗 hook54321 JAA: Should I grab this whole youtube channel? https://www.youtube.com/user/gencat
20:14 🔗 dd0a13f37 I am using tor browser with JS disabled.
20:16 🔗 ola_norsk dd0a13f37: i think if 'twario/tweep' is made a bit less agressive, it wouldn't need to be 'torified'
20:16 🔗 dd0a13f37 https://mobile.twitter.com/jack?max_id=743833014343024639 I can go quite a bit back
20:17 🔗 dd0a13f37 Why castrate your perfectly working tweet scraping tool? Requests can use proxies, or multiple.
20:17 🔗 ola_norsk dd0a13f37: with original 'tweep' it seemed to stop at half a year or so back in time at search word
20:18 🔗 ola_norsk they could, but it would eventually get noticed i think if's running continously :/
20:18 🔗 dd0a13f37 differnt users go differently far back https://mobile.twitter.com/realDonaldTrump?max_id=793833014343024639
20:19 🔗 ola_norsk i don't mean users, but e.g one word
20:19 🔗 dd0a13f37 There are many tor exit nodes.
20:21 🔗 ola_norsk how could a python script be _fully_ torifyed? If it could be done without using a virtual machine, that would be cool :D
20:21 🔗 dd0a13f37 torsocks python ./myscript
20:21 🔗 ola_norsk ty
20:22 🔗 dd0a13f37 Or you can just have requests use a proxy
20:22 🔗 dd0a13f37 torsocks -i for guaranteed fresh ip
20:24 🔗 BnAboyZ has joined #archiveteam-bs
20:27 🔗 hook54321 What collection should I upload that channel to? There's like 400 videos....
20:28 🔗 ola_norsk dd0a13f37: will definetly test that. And i'm guessing just the tiny bit of extra time storing to an sqlitedb counts as tiny bit of it being nice-ifyed :D
20:29 🔗 dd0a13f37 The time the request takes will, unless you're using twisted/multithreading
20:30 🔗 ola_norsk dd0a13f37: the reason 'local capture time' column is in tweets i think i put in for exactly that purpose, since JAA pointed out that 'tweep' itself does not seem to be correct at keeping times
20:30 🔗 ola_norsk aye
20:31 🔗 dd0a13f37 The mobile search url query string is ... interesting...
20:31 🔗 dd0a13f37 https://mobile.twitter.com/hashtag/EU?src=hash
20:31 🔗 dd0a13f37 https://mobile.twitter.com/search?q=EU&next_cursor=TWEET-943937901217370114-943937901217370114-BD1UO2FFu9QAAAAAAAAVfAAAAAcAAABWQABAAAAIAAAAAAAAQgAAAAAAAJAAAAAAAAAAABAAAAQAAAAAAAAAAiAAAQAAAAAAABAAAAAAAAAACBAAIAAAAAQAAAAAAIAAAAAACAAIAAAAAAAAAAAAAAACAAAhAAAAAAAAACAgAAAAAAAAAAAIAAAAAAAAAAAAAAAAgAAIAAAAAAFAAIIAAACCAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAEAAAAAACgCAAAAgAABwAAABAAAAAAAAAAIAAAAARAAEAAAAAAAA
20:31 🔗 dd0a13f37 AAAAAAIAAAAgAAAAAAAAAAAAAAAAACAAAABAAAAABAAAAAAAAQAAAAQAAEAABAAAEAAEAQAAAAAgAAAAAAAAAAAAAwACAAAAAAAAAAAAAAAAABQAAAAAAAAAAAAAAAAAACQAACAAAAAAAAAIAAQACAAAAFABAAAAAAAQkAAAEAAAAAAAAoAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAIAAAAAICAAAAAAAAAAAAAAEAAAAAAEAAACAAAAAAAAAEAAAAAAAAAgAAAAAAQAEAAQAAAAAAAAABAUAAAEAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAAIQAAAACACAQAAQAAIAAAAIAAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQBAAAAwAAAAAAAAAAAAAAAAAA
20:31 🔗 dd0a13f37 AAAAAQAAAAAAAAAAAgAAAAEAAAAAAACABAAAAAAAAAAAAAAEAAQAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAQAAAAAAAAAAEAAAAAAAAAQAAAAAABABAAAAAAACAAAQAAAAAAAAAAAAAAAAAgAAAAAIAABACIAAAAAAAAAIAAAQAAAAAAAAA%3D%3D-R-0
20:31 🔗 dd0a13f37 (that was one link)
20:31 🔗 hook54321 oh dear
20:31 🔗 dd0a13f37 the base64 encoded part is some kind of bitmask
20:31 🔗 ola_norsk my eyes!
20:32 🔗 ola_norsk i think i've seen that garbled shit before, at tweep crashing :/
20:33 🔗 ola_norsk when adding 'loggin' module, that looks exactly like the output given on the line where it stopped
20:33 🔗 ola_norsk logging*
20:34 🔗 dd0a13f37 hmm, strange
20:34 🔗 dd0a13f37 because there is no base64 encoding or anything of the sort in tweep
20:34 🔗 ola_norsk maybe i have the output..one sec
20:36 🔗 ola_norsk seems i've deleted the log, I'll risk trying to run the same command one more time. brb
20:38 🔗 jrwr good news
20:39 🔗 jrwr I enabled the new editor toolbar in the wiki (cc SketchCow )
20:40 🔗 ola_norsk dd0a13f37: could it be compressed stuff, like in 'header: gz' crap?
20:41 🔗 dd0a13f37 No, it's base64
20:42 🔗 dd0a13f37 run base64 -d | xxd, then paste it in
20:42 🔗 dd0a13f37 You'll see most of the bytes only have one bit set
20:42 🔗 SketchCow Hurrah
20:42 🔗 dd0a13f37 Since it doesn't do anything if you change the numbers at the beginning (max id), the max_id parameter is in there too
20:42 🔗 ola_norsk dd0a13f37: all i know that mess of "AAAAAAAA" was the end of the log line when i last tested tweep. And also where it apprently failed.
20:43 🔗 dd0a13f37 Not at the beginning, since that's the same across requests
20:43 🔗 ola_norsk i'm running the same command now, and will pastebin (when) it fails
20:44 🔗 jrwr SketchCow: its snazzy
20:45 🔗 ola_norsk dd0a13f37: for all i know it might've been some nasty character(s) that did it
20:46 🔗 jrwr makes it a /little/ simpler to edit pages
20:48 🔗 bithippo has quit IRC (Quit: Page closed)
20:54 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
20:59 🔗 dd0a13f37 Oh, regular twitter has that same AAAAAAAAAAA mess, just not as the requested URL
21:01 🔗 JAA I think it does. Load a page in your browser (with JS enabled), enable dev console, scroll to the bottom, check out the requests that happen in the background.
21:02 🔗 JAA I think tweep just tries to imitate what the browser would do.
21:02 🔗 ola_norsk hmm
21:04 🔗 ola_norsk it has not crashed yet here like last time, but seems like a lot of people love to tweet 'netneutrality' these days. So it's not even done with this month. I think last time it crashed at about this years month of may tweets
21:05 🔗 ola_norsk holy shit people have been tweeting 'netneutrality' lol
21:05 🔗 dd0a13f37 Twitter gets 6k tweets/sec, with 20 tweets/request archiving this is in the realm of possibility
21:06 🔗 ola_norsk dd0a13f37: aye :D and using webarchive.io, or wget/curl with requests to web.archive.org/save/ is quite futile :D
21:07 🔗 dd0a13f37 You would need to do a few hundred requests per second. The problem is archiving all those avatars, if you saturate a 1gbit/s line you can afford to archive 20kbit avatars assuming no overhead or IP bans
21:07 🔗 dd0a13f37 But the avatars aren't very important, are they?
21:07 🔗 ola_norsk reconstructing the links to the tweets is more important
21:08 🔗 dd0a13f37 That's possible too, all the info is in the HTML
21:08 🔗 ola_norsk and 'tweep' captures the id
21:08 🔗 ola_norsk aye
21:08 🔗 dd0a13f37 Does tweep have a mode where it can just show you all the tweets being done?
21:09 🔗 ola_norsk it does by default
21:09 🔗 dd0a13f37 Without narrowing down to a hashtag? Does it get 100%?
21:10 🔗 ola_norsk i don't know. I does a fuck of a lot of tweets though :D
21:11 🔗 ola_norsk if why it stopp(ed), could be worked out, i bet it could do 100%
21:12 🔗 jrwr for the wikidump nerds At 04:00 on Friday. a copy of the wiki's XML + Images are uploaded to the IA
21:12 🔗 jrwr for good measure
21:12 🔗 ola_norsk right now i'm just doing "python tweep -s 'netneutrality' > tweets.txt" ..to see if it eventually stops like last time. For all i know, piping to a textfile is what did it.
21:13 🔗 dd0a13f37 But can you just run python tweep > t.txt?
21:14 🔗 SketchCow AND THE WINNER OF THE "DO YOU WANT THIS" SWEEPSTAKES FOR DECEMBER 21 IS
21:14 🔗 ola_norsk with '-s <search word>' , yes
21:14 🔗 SketchCow ...hundreds of gigs of funeral recordings in mp3
21:14 🔗 BartoCH has joined #archiveteam-bs
21:14 🔗 dd0a13f37 But without -s parameter?
21:15 🔗 dd0a13f37 You could cheat and just use the X most common words, but that's not a nice solution
21:15 🔗 ola_norsk then it asks for parameters i think .. I think it's either '-u (user)' or '-s (word(s))' possible
21:16 🔗 ola_norsk either one of those are required.. _i think_
21:16 🔗 dd0a13f37 Then a full scrape is difficult, or at least harder
21:16 🔗 jrwr SketchCow: your collection never ceases to amaze me
21:16 🔗 JAA jrwr: Yay, finally. The last such dumps were uploaded in 2014 or 15.
21:17 🔗 jrwr they get dumped here after processing https://www.archiveteam.org/dumps/
21:17 🔗 ola_norsk dd0a13f37: with a 'users' table i could be easier though perhaps..or :D
21:17 🔗 jrwr only keeps one
21:17 🔗 ola_norsk dd0a13f37: it*
21:17 🔗 jrwr the backup log for it is in https://www.archiveteam.org/backup.log
21:18 🔗 dd0a13f37 Well, there's still a few users who are never mentioned by others, never use certain hashtags, and never use certain words
21:18 🔗 ola_norsk dd0a13f37: yeah
21:18 🔗 JAA Neat
21:19 🔗 ola_norsk dd0a13f37: not to mentioned banned, yet mentioned, and private..etc. i guess
21:20 🔗 ola_norsk dd0a13f37: i don't have much experience using tweep, so i don't even know how it behaves on finding disabled accounts, or banned users :/
21:22 🔗 dd0a13f37 If you're scraping in realtime, that doesn't matter.. it would be one hell of a tweet to get banned in under 5 milliseconds
21:22 🔗 ola_norsk dd0a13f37: i think it just goes back in time from point of start
21:23 🔗 dd0a13f37 You'll never keep up, better to go in realtime
21:23 🔗 ola_norsk dd0a13f37: that would need some genoius at python threads i think..and perhaps faster bandwitch than mine :D
21:24 🔗 ola_norsk bandwidth*
21:24 🔗 dd0a13f37 twisted-http is fast, no?
21:24 🔗 jacketcha has joined #archiveteam-bs
21:25 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
21:25 🔗 ola_norsk dd0a13f37: i do not know..tweep uses 'request' / 'urllib(3?)' i think
21:26 🔗 dd0a13f37 one results page is 8.5k gzipped, contains 20 tweets, at 6k tweets/sec this gives 20 mbit/s
21:26 🔗 ola_norsk dd0a13f37: it wouldn't surprise me if in certain senarios a hashtag were quicker than i could process
21:26 🔗 dd0a13f37 Yeah, requests without threading.
21:27 🔗 jacket has quit IRC (Ping timeout: 248 seconds)
21:27 🔗 dd0a13f37 But I think caching will make such attempts impossible, if you do the same query multiple times you'll get the same result
21:27 🔗 ola_norsk when using crontab wget, i had to cut time from 5 minutes to 3 minutes between each web.archive.org/save/ request..just to have a chance
21:29 🔗 dd0a13f37 You can do those in the background. Fire and forget. But IA won't like it
21:29 🔗 ola_norsk dd0a13f37: going "upwards" in time in a twitter feed is most likely the best solution. But my grasp of how to do that..is weak :D
21:30 🔗 dd0a13f37 I think archiving twitter is an insanity project anyway, better to just wait for library of congress to get their shit together
21:30 🔗 ola_norsk dd0a13f37: i just focous on hastags, like netneutrality :D
21:30 🔗 dd0a13f37 that's probably possible
21:31 🔗 ola_norsk dd0a13f37: entire twitter, or twitter by even years or months..yeah, some congress would've have to do that :D
21:31 🔗 jrwr and now I rest from poking the wiki really hard over the last 24hr
21:33 🔗 jacketcha has quit IRC (Read error: Operation timed out)
21:33 🔗 ola_norsk dd0a13f37: tweets containing 'netneutrality' been scrolling on my screen for 'since i said i started the command' , and i'm still on 2017-12-19 :/
21:34 🔗 ola_norsk dd0a13f37: though i expect it will speed up when getting past the 14th a bit
21:34 🔗 dd0a13f37 You could archive faster if you modify it to use twisted
21:35 🔗 ola_norsk just by the protocol stuff or using threading?
21:36 🔗 dd0a13f37 What?
21:36 🔗 dd0a13f37 >Twisted is an event-driven networking engine
21:37 🔗 dd0a13f37 https://twistedmatrix.com/documents/current/api/twisted.web.client.html
21:37 🔗 ola_norsk so its beatifulsoup that's bottleneck, or?
21:38 🔗 dd0a13f37 No, requests
21:38 🔗 dd0a13f37 and that it's not using requests with threads
21:40 🔗 ola_norsk could it "re-use" already established connections? because that is one thing that pisses me off about tweep. It seems to do one connection per damn tweet
21:40 🔗 dd0a13f37 yeah
21:40 🔗 ola_norsk ..or at least try, like wget
21:40 🔗 ola_norsk ty
21:41 🔗 jrwr anyway JAA I figured once a week is a good backup for a low traffic wiki
21:42 🔗 dd0a13f37 or apparently asyncio is recommended
21:42 🔗 JAA Yeah, sounds reasonable.
21:43 🔗 JAA ola_norsk, dd0a13f37: It would probably be easiest to reimplement the whole thing based on aiohttp or similar.
21:44 🔗 * ola_norsk taking notes of all :D
21:44 🔗 JAA I've written scrapers with aiohttp before, it's really nice.
21:45 🔗 ola_norsk got git? :D
21:45 🔗 JAA HTTP/2 support would be even better.
21:45 🔗 JAA No, haven't shared it yet.
21:45 🔗 JAA It's on my list for the holidays, uploading all my grabs and the corresponding code.
21:46 🔗 ola_norsk JAA: feel free to punch in some stuff :) https://github.com/DuckHP/twario-warrior-tool
21:46 🔗 ola_norsk i have to the get database thingy working first i guess, before i do anything else :/
21:47 🔗 ola_norsk (that, and making sure it doesn't freeze)
21:47 🔗 dd0a13f37 http://www.sqlalchemy.org/
21:48 🔗 JAA What did you change so far?
21:48 🔗 JAA Also, port to Python 3 please.
21:48 🔗 ola_norsk JAA: i've barely (not really) touched tweep itself so far :/
21:49 🔗 JAA Ah ok
21:49 🔗 ola_norsk bah...I've not python'ed in years..2.7 is new to me :D
21:50 🔗 JAA Oh please, Python 3 was released in 2008. :-P
21:51 🔗 ola_norsk why is there not a python script that converts e.g 'print "shit"' ?
21:51 🔗 dd0a13f37 2to3?
21:51 🔗 JAA 2to3
21:51 🔗 JAA That should handle the most obvious stuff.
21:52 🔗 ola_norsk good
21:53 🔗 icedice has joined #archiveteam-bs
21:54 🔗 ola_norsk the need of porting an interpreted language...that's a travesty by itself :(
21:54 🔗 JAA Well, it's necessary because they cleaned up a ton of poorly designed stuff in Python 3.
21:55 🔗 dd0a13f37 it truly boggles the mind
21:55 🔗 dd0a13f37 yet c can remain source compatible for 28 years and counting
21:55 🔗 JAA Yeah, let's compare C to Python...
21:56 🔗 JAA And I doubt that C was as stable in the early stages of development.
21:56 🔗 JAA Let's discuss that again when Python is 45 years old.
21:56 🔗 ola_norsk :D
21:56 🔗 dd0a13f37 But python is old by now. The 2 to 3 migration was a complete catastrophe.
21:57 🔗 JAA Well yeah, many of those things (e.g. string vs. unicode distinction) should've been fixed earlier.
21:57 🔗 JAA But they waited and accumulated all those things and then made one big backwards-incompatible release.
21:58 🔗 JAA Which makes sense, otherwise you'd have to keep changing the code all the time.
21:58 🔗 hook54321 Should these videos be uploaded to community video, or a different collection? https://www.youtube.com/user/gencat
21:58 🔗 JAA Anyway, this is getting way too offtopic for this channel.
21:58 🔗 dd0a13f37 C was standardized in 1989, and k&r was released in 1978 - 11 years
21:58 🔗 ola_norsk :D
21:58 🔗 dd0a13f37 python was released in 1991
21:58 🔗 dd0a13f37 it was not standardized by 2002
21:59 🔗 JAA changes topic to: Lengthy Archive Team and archive discussions here | Offtopic: #archiveteam-ot | <godane> SketchCow: your porn tapes are getting digitized right now
21:59 🔗 schbirid has quit IRC (Quit: Leaving)
21:59 🔗 dd0a13f37 Or, to be fair, python2 was released in 2000, and it wasn't standardized by 2011
22:00 🔗 JAA -> #archiveteam-ot
22:01 🔗 dd0a13f37 I didn't know we had an offtopic channel for the offtopic channel
22:01 🔗 JAA This isn't the offtopic channel, it was always about lengthy discussions (because #archiveteam is limited to announcements).
22:01 🔗 JAA And -ot is new, just opened last week I think.
22:02 🔗 DFJustin oh great another channel
22:02 🔗 hook54321 lol
22:04 🔗 dd0a13f37 #archiveteam-ot-bs when?
22:06 🔗 icedice2 has joined #archiveteam-bs
22:08 🔗 JAA hook54321: Community video sounds reasonable to me. Are you uploading each video as its own item? If so, you should probably ask info@ to create a collection of all of them in the end.
22:08 🔗 icedice has quit IRC (Ping timeout: 250 seconds)
22:09 🔗 hook54321 Each of them there own item yeah. I'm using tubeup to do it. I'll email info@ when it's done I guess.
22:10 🔗 ola_norsk has quit IRC (R.I.P dear known Python :( https://youtu.be/uy9Mc_ozoP4)
22:12 🔗 Smiley is -bs not the off topic channel tho?!
22:12 🔗 * Smiley so confuse. fuck that.
22:14 🔗 dd0a13f37 When do Igloo's pipelines upload? As part of
22:14 🔗 dd0a13f37 Archiveteam: Archivebot GO Pack?
22:15 🔗 JAA Yes
22:15 🔗 JAA All pipelines do, except astrid's and FalconK's.
22:16 🔗 dd0a13f37 So why can't I find a certain !ao job in it?
22:16 🔗 JAA Let's go to #archivebot.
22:19 🔗 pizzaiolo has joined #archiveteam-bs
22:46 🔗 icedice has joined #archiveteam-bs
22:48 🔗 icedice2 has quit IRC (Ping timeout: 245 seconds)
22:51 🔗 icedice2 has joined #archiveteam-bs
22:53 🔗 icedice has quit IRC (Ping timeout: 245 seconds)
22:55 🔗 icedice2 has quit IRC (Client Quit)
22:55 🔗 kristian_ has joined #archiveteam-bs
22:56 🔗 icedice has joined #archiveteam-bs
23:06 🔗 icedice2 has joined #archiveteam-bs
23:08 🔗 icedice has quit IRC (Ping timeout: 245 seconds)
23:48 🔗 jacketcha has joined #archiveteam-bs
23:56 🔗 jacketcha Hey, does anybody know if there is a node.js implementation of the Warrior program?

irclogger-viewer