#archiveteam-bs 2017-12-02,Sat

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***Stilett0 has joined #archiveteam-bs [00:03]
ola_norskJAA: aye..trying to archive a twitter hashtag has taught me that :/ "There was a problem loading..(retry button)") [00:04]
JAAYeah, Twitter's also pretty good at not letting you grab everything.
Reddit as well.
(We were having a discussion about that earlier in #archivebot.)
At least you can iterate over all thread IDs in a reasonable amount of time on Reddit though.
So it appears that you can get 10k results from the vid.me API.
[00:05]
ola_norski feel naughty doing curl requests to https://web.archive.org/save/https://twitter.com/hashtag/netneutrality?f=tweets , currently every 3rd minute :/ [00:07]
JAAYou can do that for different categories, new/hot, and probably search terms (didn't try).
There are 17 categories plus hot, new, and team picks. In the ideal case, that means 20 sections times 10k results, which is still only about 1/7th of the whole site.
This is only about how to gather lists of videos and their metadata (uploader, description, etc.), not the actual videos.
(Videos are available as Dash and HLS streams.)
There are also tags, and of course you can retrieve all (?) of an uploader's videos.
[00:07]
ola_norskJAA: As for twitter, i think one problem is that they would easily present an archive of ANYTHING, as long as they get paid for it. [00:10]
JAAFor each tag, you get hot, new, and top videos.
ola_norsk: Yeah, probably.
[00:10]
ola_norskJAA: most definitely [00:11]
JAAThere's a "random video" link. We could hammer that to get videos. I don't want to do the math how many times we need to retrieve it to discover the vast majority of all videos right now though. [00:14]
ola_norskJAA: for a legal warrant, or a slump of money, they could present all tweets with any hastag, since the dawn of ti..twitter [00:14]
JAAAh, I thought you were talking about vid.me now.
Yeah, there is a company which has an entire archive of Twitter, I believe.
[00:14]
ola_norskah, sorry, that was just a link regarding GOG Connect [00:15]
JAAAh, you're not in #archiveteam. vid.me is shutting down on Dec 14.
That's why I'm looking into them.
[00:16]
ola_norskreally? that soon? [00:16]
JAAhttps://medium.com/vidme/goodbye-for-now-120b40becafa [00:16]
ola_norskwow, that's going to piss off alot of germans :D [00:16]
JAAola_norsk: I was thinking about Gnip, by the way. Looks like Twitter bought them a few years ago. [00:17]
ola_norsk"We’re building something new." ..
a.k.a "Trust us, we're not completely destroying this shit..We're building something new!"..
[00:18]
Froggingfree image/video host "couldn't find a path to sustainability" [00:20]
ola_norskman, i actually thought vid.me had something good going [00:20]
Froggingwhat a surprise :p [00:20]
ola_norskhttps://archive.org/details/jscott_geocities
wow, there's actually people who cancelled their youtube accounts after having used vid.me's easy export solution
and as far as i know, that shit might not be such easy to export back, since i don't think YT does import by url..
og well
[00:21]
omglolbahwhy not upload to both? <.< [00:28]
ola_norskaye
omglolbah: according to "SidAlpha", if you know that youtuber, he would'nt because it would mean he'd have to interact on several platforms..
[00:28]
omglolbahIf only he had moved to vidme [00:30]
ola_norskthat was his response to the request for that, not move, but upload there as well [00:31]
omglolbahno, I'm saying I wished he had moved so that he would be gone :p [00:31]
ola_norskoh
where does shit go if Youtube goes though? I mean, Google Video went to Youtube..
Where did Yahoo Video go?
Justin.tv became Twitch right?
[00:31]
zinoJustin.tv created Twitch and then closed down. Nothing was automatically moved. I don't know if Justin had vods though. [00:37]
ola_norskaye [00:37]
JAAI was wrong about the vid.me API not returning all results.
The actual API does return everything, or at least nearly everything.
The "API" used by the website doesn't.
I just didn't find the real API docs previously.
https://docs.vid.me/#api-Videos-List
No auth required either.
You can get chunks of 100 videos per request.
[00:38]
zino\o/
Do we have a death date?
[00:38]
JAAIt gets quite slow for large offsets, indicating that they don't know how to use offsets.
14 Dec
[00:39]
zino:-/ [00:39]
JAAhow to use indices* [00:39]
zinoindexes? [00:39]
Froggingwhere would youtube go? nowhere. it's too big :p [00:40]
JAAI never know which plural's correct. [00:40]
ola_norskFrogging; aye [00:40]
bithippoFrodding: we'll just show up with a tractor trailer "Load it all in back y'all" [00:40]
JAAThe real API returns a bit more videos, by the way: 1360532.
(About 11k more, specifically.)
Might be the NSFW/unmoderated/private filter stuff.
bithippo: YouTube is around 1 exabyte. Have fun with that.
Well, at least that order of magnitude.
[00:40]
bithippoI used to manage hundreds of petabytes :-P [00:41]
ola_norskola_norsk shoves in in his usb stick and applies youtube-dl ! [00:41]
bithippolol [00:41]
JAAI'm sure someone from China will sell you a 1 EB USB stick if you ask them.
Well, "1 EB".
[00:42]
bithippoWhich will quickly err out once a few GB have been written .... :( [00:42]
JAAYep [00:42]
ola_norski'll just save it all in /dev/null [00:42]
JAAOr not error out, just overwrite the previous data etc. [00:42]
zinoDepends. Some of them are cyclical, so you can write all you want as long as you don't try to read it. :) [00:43]
JAAYep
I'm a fan of S4.
The Super Simple Storage Service.
http://www.supersimplestorageservice.com/
[00:43]
bithippoThat pricing is a bargain. [00:44]
zinobithippo: Interesting. What did you work with that included 100s of PiBs? I deal in 10s of them. [00:44]
bithippoData taking for LHC detector [00:44]
JAAOoh, nice! [00:44]
bithippoOnly a couple hundred TB of spinning disk on storage arrays, the rest were tape archive libraries. [00:45]
zinobithippo: Ah. I sort of do that on the sly. Part of our storage is for the Nordic LHC grid. [00:45]
bithippo#TeamCMS [00:45]
zinoI deal mostly with crimate data though. Have a few petabytes of that. [00:46]
bithippoThat's awesome.
I <3 big data sets
[00:46]
zinoIndeed :) [00:47]
dashcloud@ola_norsk If you're interested in how to make something be emulated on IA, here's some pages that lay it out for you- http://digitize.archiveteam.org/index.php/Internet_Archive_Emulation http://digitize.archiveteam.org/index.php/Making_Software_Emulate_on_IA [00:47]
ola_norskdashcloud: ty, i'm thinking there must be ways. If there's dosbox, there's e.g Frodo that could run in that.. [00:49]
dashcloudI've done a bunch of DOSBOX games, and there's a whole collection of emulated DOS/Win31/Mac Classic stuff up [00:50]
zinoola_norsk: What, the C64 emu? [00:51]
ola_norskyes [00:51]
zinoNo, nonononono. Go helt the jsmess people get Vice running instead. [00:51]
ola_norski was hoping that was already done [00:52]
zinoI know it's started. [00:52]
ola_norskgood stuff [00:52]
zinoBut it might be stalled forever for all I know. [00:52]
ola_norski have no idea about these things, but it would be cool to see C64 on Internet Arcade [00:54]
zinoJAA: I'll have very little time to do anything before the 9th, and probably not much after either, but ping me if storage is needed for vid.me. [00:54]
ola_norskdashcloud, ill try to make an item per that, using dosbox
ty for info
[00:55]
JAAzino: Will do. I'll set up a scrape of the API first to get all the relevant information about the platform. Then we'll see. [00:56]
dashcloudif your software needs installation or configuration before the first run, you'll want to do that ahead of time [00:57]
JAAscrape/archive, whatever. That's the information we can save for sure.
Unless they ban us...
Using minVideoId and maxVideoId might be faster than the offset/limit method, especially for the later pages.
Current video IDs are slightly above 19 million, so that's around 190k requests (to be sure no videos are missed).
[00:57]
jrwrattending my first 2600 meeting [01:03]
wp494so the thing with vidme, there's a bunch of original stuff
there's a little bit of lewd stuff (they ban outright porn, but they do permit "artistic" nsfw)
and then there's a bit of it that consists of reuploads of copyrighted stuff
[01:05]
jrwrOK [01:06]
wp494not that I think it'll be a big deal since IA can just dark the affected stuff if someone does come yelling, but something to keep in mind [01:06]
JAASounds more or less what I'd expect.
like*
I can't find any information about API rate limits, except this Reddit thread: https://redd.it/6acvg5
[01:07]
***icedice has quit IRC (Quit: Leaving) [01:11]
Ceryn has quit IRC (Connection closed) [01:18]
ola_norsk"The Internet is Living on Borrowed Time" .. https://vid.me/1LriY (ironically on vid.me) ..That's pretty dark title, for being Lunduke :d [01:26]
JAATo be fair, it's also available on YouTube: https://www.youtube.com/watch?v=1VD_pJOFnZ0 [01:33]
..... (idle for 21mn)
ola_norskthats not fair :D
i think most of his vids are also on IA :d
but yeah
seriously though. I imagine there's a shitload of german vidme'ers currently bewildered as to what to do..
a lot of people used the url importing at vidme, thinking they would simple move their entire channels..
from what i've heard tales of, germany youtube is not the same youtube as everywhere elsetube
[01:54]
ranmaGEMA blocks a fuckton of music there [02:07]
ola_norskaye
ranma: is that the only reason though? There were so many germans coming to vid.me it was made a video about it..
[02:07]
ranmaJAA: how do you get your data OUT of S4?
and what are the costs?
[02:12]
CoolCanuks4? [02:12]
ola_norskranma: "German INVASION"...100k creators..https://vid.me/JjNaH [02:13]
ranmaoh, it's a joke :'( [02:13]
CoolCanuki hate slow internet. ml
*fml
[02:14]
***phuzion has joined #archiveteam-bs [02:14]
ola_norskranma: does it simply block ALL music? i can't see any other reason for such a noticable influx and flight of users
ranma: It's actually hard to browse vidme because of it at times, since often 1 in 2 videos on the feed is german
[02:18]
ranmakinda wish some site could ZIP/7z another site
just noticed archivebot slurped down https://ftp.modland.com/
[02:22]
CoolCanukdid it *completely* slurp modland? [02:23]
ola_norskdd -i http://google.com -o http://bing.com [02:23]
ranma<Major> Muad-Dib: Your job for https://ftp.modland.com/ has finished.
actually, not that i'd have the space for it, tho
[02:23]
***ola_norsk has quit IRC (its the beer talking) [02:25]
CoolCanukdoes anyone have a great upload script for ia? their docs are too much for me to understand and uploading 1 by 1 is painful [02:26]
ezfor anyone wanting to mirror vid.me, its possible to page everything there: https://api.vid.me/videos/list?minVideoId=100&maxVideoId=1000
just step the min/max (its easier on the db).
[02:34]
CoolCanuk..... https://usercontent.irccloud-cdn.com/file/PZalOsZ6/image.png [02:34]
ezJAA: ^ [02:35]
CoolCanukour wiki is more stable than this beta-like system :P [02:35]
bithippoCoolCanuk: What do you mean by "upload script for ia"?
Such as https://github.com/jjjake/internetarchive ?
[02:39]
CoolCanukan easier way
eg I can loop is for 100s of files in a folder, but upload as 100 items.
[02:40]
bithippoThat repo is your best bet for that sort of operation.
What sort of files and metadata?
[02:41]
CoolCanukpdf
currently, newspapers
and sears crap
[02:42]
bithippoHmm
The two routes would be "web interface", which gives you a nice interface and shouldn't be too painful if you're putting up each folder as an item (with all of the files contained within that folder attributed to the item). Failing that, you'd need some light python or bash scripting skills to pickup up files per item, associate metadata with each item, and upload.
I could be wrong of course! But that's my interpretation based on working with the IA interfaces.
[02:44]
eztbh, IA interface is just plain atrocious to use [02:47]
bithippoIndeed. [02:47]
ezi suppose thats artificial barrier of entry on purpose to avoid people uploading crap [02:47]
CoolCanukI guess
I'm uploading stuff I know will probably not be found anywhere else
[02:48]
ezyea, the commitment to jump the hoops is paired with commitment to curate content [02:49]
CoolCanukOnly think I'm worried about is repetitive strain injury [02:50]
***wp494_ has joined #archiveteam-bs [02:52]
wp494 has quit IRC (Read error: Operation timed out)
ld1 has quit IRC (Quit: ~)
ld1 has joined #archiveteam-bs
[02:59]
CoolCanukwhy does IA have a difficult time using the FIRST page of a pdf as the COVER >:( [03:19]
..... (idle for 22mn)
***wp494_ is now known as wp494 [03:41]
MrRadarX-posting from #archiveteam: if you're using youtube-dl to grab vid.me content, be aware of this issue: https://github.com/rg3/youtube-dl/issues/14199
tl;dr: their HLS streams return a data format youtube-dl doesn't fully handle resulting in corrupted output files
Use a workaround in the 2nd to last comment to force youtube-dl to grab from the DASH endpoints instead
[03:47]
wp494posting highlights of https://www.youtube.com/watch?v=KMaWSinw4MI&t=41m33s here
first one being that linus has significant disagreements with senior management, and especially NCIX's owner
which seems to be a very common theme
he also left NCIX because the people he mentored departed
says he thinks some were forced out because of extraordinarily poor management decisions
(in his opinion)
[03:54]
FroggingI'm reading this https://np.reddit.com/r/bapcsalescanada/comments/77h771/for_anyone_that_purchased_a_8700k_from_ncix/domm2ca/?context=3 [03:57]
***josho493 has joined #archiveteam-bs [04:09]
wp494linus pitched what sounded like a pretty good idea, try and get bought, but how? his solution was to open "NCIX Lite"s across the country which would be really small pickup places that you could ship to since shipping direct to your home sometimes killed the deal
said that the writing was on the wall as early as 7 years ago (before Amazon was doing pickup) to anyone actively paying attention, so the idea would've been to attract someone similar to Amazon if not Amazon themselves using that infrastructure when they wanted to gobble someone Whole Foods style
linus said when management didn't do that, he said it became obvious that he had to GTFO
he says he hasn't been screwed over personally by Steve (the owner) and his wife unlike some of the other horror stories going out
he wound up signing a non-compete for 2 years (which got extended by 1)
when he left he took the LTT assets, and did it on paper (and was glad he did), because even though he wouldn't think Steve would do anything untoward to him, creditors are sharks looking for their next kill
and that's about it
[04:09]
ezwhy do people bother with fairly standard eshop drama, was ncix the canadian amazon or something? [04:27]
MrRadarMore like Canadian Newegg... before Newegg moved into Canada
It was *the* place to go for computer parts online, from what I understand
[04:27]
ezah
razor thin margins, yea we have that locally too
all with fake "in stock" stickers where you wait 2 weeks and everything
[04:28]
***qw3rty115 has joined #archiveteam-bs
qw3rty114 has quit IRC (Read error: Operation timed out)
[04:30]
Froggingthey had a location here in Ottawa. I used to shop there until they closed it [04:38]
...... (idle for 29mn)
***josho493 has quit IRC (Quit: Page closed) [05:07]
CoolCanukdefunct as of today :o
*yesterday
[05:09]
***Mateon1 has quit IRC (Ping timeout: 245 seconds)
Mateon1 has joined #archiveteam-bs
[05:11]
CoolCanukam I the only one who doesnt really see the big deal of google home/mini or amazon alexa? [05:15]
***ranavalon has quit IRC (Read error: Connection reset by peer) [05:29]
Froggingthere's a big deal? [05:36]
***shindakun has joined #archiveteam-bs
Jcc10 has joined #archiveteam-bs
[05:38]
ezCoolCanuk: we're all waiting for amazon to give access to alexa transcripts to app devs
so we can start archiving every little embarassing thing anyone has ever said
[05:42]
wp494which of vidme's logos should I use for the article, the wordmark or their "astro" mascot
https://vid.me/media
[05:42]
CoolCanukthe one on the main page (red) [05:48]
wp494wordmark it is [05:48]
CoolCanuksadly cant be eps or svg :( [05:48]
wp494gonna resize it a little otherwise it'll appear about as big in a warrior project [05:49]
CoolCanukor we could fix the template
wait what do you mean
[05:49]
wp494lemme go dig through the spuf logs to show you
(come to think of it I'm not even sure if I took an image, I might have just pull requested and moved on)
[05:50]
CoolCanukour {{Template project}} should be fixed to a larger logo size
using it online is not an issue, because we can dynamicly resize
http://tracker.archiveteam.org/
[05:51]
wp494yeah there it could benefit from being a touch bigger at least for logos that are rectangles instead of squares [05:52]
CoolCanukapparently we can't... :| [05:52]
wp494(it seems to like squares the best) [05:52]
CoolCanuk"benefit"?
distortion?
[05:52]
wp494and yeah, I was about to say, our copy of mediawiki isn't quite as flexible as wikimedia's where you can stuff in any number and it'll spit it out for you
even ridiculously large ones like 10000px
[05:52]
CoolCanukI just noticed that. that's too bad
another reason to use SVG.
[05:52]
wp494even SVGs too [05:54]
CoolCanukno. SVGs are not raster
you can blow them up to 1000000000px and it will never distort unless you have embedded rasters
https://upload.wikimedia.org/wikipedia/commons/3/35/Tux.svg
[05:54]
wp494ok I was gonna recreate an example with SPUF but there's a live one that I can get you right now
see how the miiverse logo goes a bit out of its bounds and pushes content downwards: https://i.imgur.com/P3Wcfbp.png
[06:02]
CoolCanukew
logo should be within that white div, not yellow
(within, not overlaid)
[06:03]
wp494now take the version of the steam icon we had stored on the wiki and stuffed into the project code (http://www.archiveteam.org/images/4/48/Steam_Icon_2014.png) and it wound up being a bit worse than that example
luckily a 100px version that mediawiki gracefully generated more or less solved things: https://github.com/ArchiveTeam/spuf-grab/pull/2/commits/1c319d3d144cc13599f1fe571e699ca8b3d79e60
[06:04]
CoolCanuknot the image's fault, it's the tracker ;) [06:04]
wp494afaik tracker main page was ok [06:05]
CoolCanukhow could it be ok [06:05]
wp494note how it looks like it's fine on http://tracker.archiveteam.org/ [06:05]
CoolCanuksimply use max-width for img in css
*height
[06:05]
wp494but with that said scroll bars do appear [06:06]
CoolCanukthen you need to
overflow: hidden
[06:06]
wp494but it's nothing near as annoying as the in-warrior example, though still a nuisance albeit very minor [06:06]
CoolCanukI will fix it [06:07]
wp494k so a 600 x 148 version will go up on the wiki
and then if it causes problems we can grab a 100px url
for project code
[06:08]
CoolCanukwe have or
**or
just use max-height: 100px
;)
[06:09]
wp494ok project page is going up [06:09]
CoolCanuklol how did it let you upload file name with a space :P
it makes me use _
[06:09]
wp494it does insert a _
the recent changes bot treats it as a space though
but for actually using the filename you're going to need to use underscores
[06:10]
CoolCanuko [06:11]
wp494aw crap I'm getting spam filtered and I don't even get a prompt to put in the secret phrase
oh well let's see if this workaround of inserting a space in the url works
[06:11]
CoolCanukheh
SHHHH that's supposed to be a secret :x
[06:12]
wp494ok wow that apparently worked [06:13]
CoolCanuki'll fix it for ya [06:15]
wp494gl with the filter [06:15]
CoolCanukoh you fixed it [06:15]
wp494I was surprised I was even able to toss such a tiny little stone at that goliath
ok that's a solid foundation I think
[06:15]
CoolCanukhuh
I have a workaround :P
[06:19]
***slyphic has quit IRC (Read error: Operation timed out) [06:21]
Odd0002I got a 508 clicking that purplebot link [06:21]
SketchCowgodane: What does "WOC" mean with the MPGs? [06:21]
Odd0002resource limit reached [06:21]
CoolCanukthis 208 error will be the death of me
*508
[06:22]
Odd0002connection timed out now... [06:23]
CoolCanuksame here ughhhh
ffff
impossible to eidt
[06:23]
Odd0002oh finally [06:25]
CoolCanukthere must be more than just "shared hosting" being the problem
can the topic in #archiveteam changed from Compuserve to vidme? lmfao
*be changed
[06:25]
wp494if it gets pointed out a few times like with compuserve then someone will probably do it
if it's just once or twice more then it's no big deal just say "yeah we're on it"
[06:32]
CoolCanukfair [06:33]
....... (idle for 33mn)
making up a tag for vidme is going to be tricky. it's so short.. hard to come with a spinoff [07:06]
........ (idle for 36mn)
***Pixi has quit IRC (Ping timeout: 255 seconds)
Pixi has joined #archiveteam-bs
BlueMaxim has quit IRC (Ping timeout: 633 seconds)
BlueMaxim has joined #archiveteam-bs
[07:42]
................. (idle for 1h20mn)
Dimtree has quit IRC (Peace) [09:05]
Dimtree has joined #archiveteam-bs [09:11]
......... (idle for 44mn)
fie has quit IRC (Ping timeout: 245 seconds) [09:55]
.... (idle for 16mn)
fie has joined #archiveteam-bs [10:11]
CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [10:16]
BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:27]
schbirid has joined #archiveteam-bs [10:35]
....... (idle for 33mn)
fie has quit IRC (Ping timeout: 246 seconds) [11:08]
fie has joined #archiveteam-bs [11:21]
bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…) [11:33]
jschwart has joined #archiveteam-bs [11:44]
...... (idle for 27mn)
JAAez: Yep, that's what I came up with yesterday as well. You can either iterate min/maxVideoId in blocks of 100 with limit=100 or implement pagination. I'd probably go for the former, i.e. retrieve video IDs 1 to 100, 101 to 200, etc. (need to figure out whether these parameters are exclusive or not though). [12:11]
***MangoTec has joined #archiveteam-bs [12:17]
..... (idle for 24mn)
jrwrmy god
the best thing I've ever heard just got tweeted
@ElonMusk: Payload will be my midnight cherry Tesla Roadster playing Space Oddity. Destination is Mars orbit. Will be in deep space for a billion years or so if it doesn’t blow up on ascent.
[12:41]
.... (idle for 16mn)
zinoElon knows how to put on a show. [12:57]
jrwrYep
I mean, he thinks its going to blow, they didn't want to make a real payload... so fuck it send a Car
[12:57]
zinoAt this time I'll recommand the old Top Gear episode where they convert a car to a space shuttle and blast it off with rockets.
recommend*
[13:01]
..... (idle for 23mn)
***MangoTec has quit IRC (Quit: Page closed) [13:24]
.... (idle for 19mn)
schbiridhetzner's auctions seem to have dropped in price a lot, 1/3 aka -10€ for what i have
https://www.hetzner.com/sb
nvm, had fucking US version without VAT =(
[13:43]
odemghttps://medium.com/vidme/goodbye-for-now-120b40becafa
https://medium.com/vidme/goodbye-for-now-120b40becafa
https://medium.com/vidme/goodbye-for-now-120b40becafa
What the fuck!!
Okay you know about it
but what the actual fuck!
[13:49]
..... (idle for 21mn)
jrwrpeople are finding out its REALLY hard to make a video website [14:10]
odemgIt's easy to make a video site, it's just hard to monetise it, mediacru.sh was the best in terms of technology in my opinion but they didn't manage to monetise either. [14:16]
***ranavalon has joined #archiveteam-bs
ranavalon has quit IRC (Remote host closed the connection)
[14:16]
odemgI'm collecting video ids from reddit anyways, heads up the bulk of the older urls (and possibly new ones) are going to be reddit porn related. [14:17]
***ranavalon has joined #archiveteam-bs [14:17]
schbiridwait, youtube is still operating at loss
why the FUCK are people making so much money on their ad share then?
*?
[14:18]
***voidsta has joined #archiveteam-bs [14:21]
odemgGoogle isn't operating at a loss, so they can keep YouTube afloat and keep trying new things to pump up their bottom line, which is why we see a new yt related shit storm every other week, yt may as well be called YouTube[beta] or YouTube[this is an experiment] [14:22]
schbiridYouTube{incredible journey] [14:23]
odemgThough because it's Google ad because there is no real competition for them making any real headway we can talk like yt is 'never' going to close doors, or turn their service off, but it'll come, maybe not today, maybe not in 5 years, but it'll come when we're 'what the fucking' at a Google blog post announcing there coming plans to phase out YouTube or just turn it off.
Hopefully that comes at a time 500PB* is nothing and something we can grab in a few months
[14:24]
JAA... except YouTube will be 10 EB by then. [14:26]
Kazwait what
vimeo is dead?
[14:27]
jrwrvid.me I though [14:28]
Kazvid.me
ffs it looked very close to vimeo
[14:28]
odemgIt's an odd time we're living in when we first started 10TB was insane to think we could get, now we're doing sites nearing 300TB without a great deal of thought, we're scaling pretty well with the times I suppose, but how long before ia close doors and we have to find somewhere to put that? (I know we're talking about it...) [14:28]
jrwrUgh
if IA ever goes bust
[14:29]
Kazso
we have 2 weeks for vidme
[14:29]
odemgyup [14:29]
JAAI'm setting up an API scrape right now. [14:29]
Kazprobably needs a channel, not sure how big it is [14:30]
odemg#vidmeh [14:30]
JAA1.3x million videos [14:30]
schbiridvidwithoutme, vidnee, vidmeh [14:30]
Kazvidmeh will do [14:31]
zinoThis will almost need to be a warrior project. We can probably fix storage, but there is no way we can download this in time using a script-solution unless someone buys up Amazon nodes to do it.
JAA: Any idea what the average size of a video is?
[14:31]
JAAzino: I haven't looked at the videos themselves at all yet, only the metadata.
The API returns a link to download the videos as an MP4, by the way.
The website uses Dash/HLS.
Those MP4s are hosted on CloudFront, by the way, i.e. Amazon. That could be annoying.
[14:32]
.... (idle for 16mn)
jrwrwiki is slow as balls [14:51]
.... (idle for 16mn)
***voidsta has left [15:07]
MrRadarzino: I've been scraping a few channels and here's what I've seen so far. Their highest quality is 2 mbps video (at 1080p or 720p depending on the original resolution) with audio between 128kbps and 320 kbps(!)
SD-quality video is around 1200 kbps
[15:08]
jrwrUgh
thats not too bad overall
[15:09]
MrRadarAnd I'm grabbing with youtube-dl's "bestvideo+bestaudio" option, if storage/bandwidth becomes an issue they have lower-quality versions we could grab instead [15:10]
jrwrNa
We have da powerrrr
right now I'm working on the grabber, mostly just going to mod eroshare-grab
[15:11]
MrRadarSome files are randomly capped at 150 KB/s download while others will saturate my 50 mbit connection [15:12]
jrwrthe channel pages are going to be interesting since they scroll load type [15:12]
MrRadarAs long as the URLs for those follow a pattern that shouldn't be too hard
Oh, I just noticed there's a channel, #vidmeh
[15:12]
jrwrya [15:13]
HCross2Nothing is bloody working [15:23]
jrwrfor what [15:24]
HCross2I've spent all day trying to get my proxmox cluster sorted [15:24]
jrwrdat CDN [15:30]
***Jcc10 has quit IRC (Ping timeout: 260 seconds) [15:35]
jrwrhay JAA your pulling all the APIs, are you saving all the reposes so we can get the raw URL for the videos?
reponses*
[15:39]
..... (idle for 22mn)
JAAYeah, of course I save them. To WARC, specifically. [16:01]
***kristian_ has joined #archiveteam-bs [16:04]
...... (idle for 25mn)
CoolCanuk has joined #archiveteam-bs
fie has quit IRC (Ping timeout: 360 seconds)
[16:29]
shin has joined #archiveteam-bs
fie has joined #archiveteam-bs
[16:43]
..... (idle for 22mn)
shindakundon't know if it will help but i was made a brute force video/metadata downloader for vidme https://github.com/shindakun/vidme i don't really have the bandwidth or storage to let it run though
you guys already have a lot of tooling though
[17:06]
JAANo need to bruteforce, we can get a list of all videos through their API. [17:07]
PurpleSymshindakun: /join #vidmeh [17:07]
JAA(I'm doing that currently.) [17:07]
shindakunthat's basically what it does sort of... i found some seemed to be unlisted so i request details for every videoid
off to vidmeh lol
[17:08]
JAARight. There's an API endpoint for getting lists of videos though, so you don't have to run through all ~19M IDs.
You can do it with 190k requests. With further optimisation, it might be possible to decrease that even further, but that's a bit more complex.
[17:08]
***ola_norsk has joined #archiveteam-bs [17:09]
ola_norskmade a test C64/dosbox emulator item (https://archive.org/details/iaCSS64_test) , but it seems very slow. At least on my potato pc.
unfortunatly i'm no ms-dos guru. But might there be a way to optimize speed trough some dos utilites/settings that could reside in the zip file?
[17:11]
zinoYou are emulating in two layers. It's not going to be fast, or accurate. [17:14]
ola_norskyeah it's kind of emu-inception :d But, could fastER be done perhaps?
i did try it in Brave browser as well as Chromium, and Brave seemed to run it a bit better.
and my pc is kind of shit
[17:16]
Igloo/join #vidmeh
ahem
[17:21]
***Stilett0 has quit IRC (Ping timeout: 246 seconds) [17:27]
CoolCanukahhhhh. CLEVER [17:29]
***Pixi has quit IRC (Quit: Pixi) [17:32]
kristian_ has quit IRC (Quit: Leaving) [17:38]
mundus201 is now known as mundus
Pixi has joined #archiveteam-bs
[17:45]
..... (idle for 22mn)
hook54321How can I automatically save links from an RSS feed onto the wayback machine? [18:08]
***pizzaiolo has joined #archiveteam-bs [18:08]
CoolCanuki'd use something like this http://xmlgrid.net/xml2text.html . then get rid of the non urls in excel/google sheets. [18:14]
JAAEw [18:15]
CoolCanukthen upload your list of urls to pastebin, get the raw link. in #archivebot , use !ao < PASTEBINrawLINK
you got a better idea, JAA ? :P
[18:15]
ola_norskif you have the links in a list; curl --silent --max-time 120 --connect-timeout 30 'https://web.archive.org/save/THE_LINK_TO_SAVE' > /dev/null , is a way to save them i think [18:15]
JAAGrab the feed, extract the links (by parsing the XML), throw them into wpull, upload WARC to IA. Throw everything into a cronjob, done. [18:15]
CoolCanuko ok [18:16]
JAAI suspect he's looking for something that doesn't require writing code though. [18:16]
CoolCanukmost users are :P
also why curl? cant we just use HTTP GET?
[18:16]
astridthat's what curl does [18:17]
JAAThat's what curl does. You could also use wget, wpull, or whatever else.
Hell, you could do it with openssl s_client if you really wanted to.
And yeah, you can obviously replace the "throw them into wpull, upload WARC to IA" with that.
[18:17]
CoolCanukoh.. I thought curl downloads the web.archive.org page as well [18:18]
JAAIt wouldn't grab the requisites though, I think.
CoolCanuk: That's exactly what it does, and it triggers a server-side archiving.
[18:18]
CoolCanukunhelpful if you have a bad internet connection and don't want to download the archive.org page every request :P [18:19]
ola_norskidk :d i just use that as cronjobs to save tweets https://pastebin.com/raw/ZE4udKTi [18:19]
arkiverno page requisites are saved when you use /save/ like that
only the one URL you have after /save/
no images, or other stuff from the page is saved
[18:20]
ola_norskdoh [18:20]
CoolCanuk(which is probably fine for net neutrality.. it should mostly be text/links to othe rsites)
if there are any images, it's likely already been posted before
[18:21]
arkiveryou can't see what picture is on a page if it's not saved
no matter how many times the picture might have been saved in other places acros the web
[18:22]
CoolCanukyou can't see pictures that are still online? [18:23]
ola_norsktwitter also uses their damn tc.co url shortening [18:23]
arkiverI think we save things in case they go offline [18:23]
astrid<3 [18:24]
CoolCanuk(I hope that wasn't passive aggressive) :( [18:25]
arkiverarkiver isn't an aggressive person :) [18:26]
CoolCanukaggressive at archiving :P
hehe
[18:27]
arkiver:) [18:27]
ola_norski've been running those cronjobs since the 26th (i think). Should i perhaps just halt that idea then, or might it be useful data for someone else to dig trough? At least the text and links are there i guess..
was planning to run them until the netneutrality voting stuff is over on the 14th(?)
[18:28]
arkivertext is always useful [18:29]
JAADefinitely better than nothing. [18:30]
arkiverI believe the data from Alexa on IA also does not include pictures
but I'm not totally sure about that
[18:30]
ola_norski'm just going to let it run then [18:34]
JAAWhat does the /save/ URL return exactly? Are the URLs for page requisites also replaced with /save/ URLs?
If so, it might be possible to use wget --page-requisites to grab them.
[18:34]
ola_norskone sec
https://pastebin.com/raw/dJrVbnpr
that's what i get when running: curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.89 Chrome/62.0.3202.89 Safari/537.36" --silent --max-time 120 --connect-timeout 30 'https://web.archive.org/save/https://twitter.com/hashtag/netneutrality?f=tweets'
[18:35]
JAAYep, everything there is also replaced with /save/ URLs.
So give wget --page-requisites a shot if you want.
(Plus a bunch of other options, obviously.)
[18:40]
arkiverJAA: yes [18:40]
ola_norskok [18:41]
arkiverI believe embed are replace with a /save/_embed/ URL and links with a /save/ URL [18:41]
JAAYep [18:41]
ola_norskby 'other options' do you mean just to make it run quiet? [18:42]
JAAYeah, and making it not write the files to disk. [18:44]
ola_norskok [18:44]
JAANot sure what else you'd need for this. [18:45]
ola_norskme neither unfortunatly, i had to browse a bit just to learn that much curl :d
but i'll check it out
[18:45]
i did ask info@archive.org if it's ok to do the curl commands so frequent (every 3-5 minute), but no response back yet.
i just hope they won't suddenly go 'wtf is this!?' and block me :d
[18:52]
***ZexaronS has joined #archiveteam-bs [19:04]
arkiverno
it's just one URL that's saved per curl command
https://archive.org/details/liveweb?sort=-publicdate
the number of URLs per item in there is a lot higher than how many you are saving in a day
[19:08]
ola_norskas long it's fine with IA i'm good
arkiver: could there be a way to 'retro-crawl' the tweets i've already saved?
to get the images to load into the saves, i mean
[19:12]
***Stilett0 has joined #archiveteam-bs [19:14]
ola_norskthis is the mail i wrote on the 27th btw: https://pastebin.com/AV1vbKUr [19:15]
arkiverI'm sure they're fine with it [19:16]
ola_norskgood stuff [19:16]
arkiverlet me know if anything goes wrong [19:16]
ola_norskok [19:16]
arkiverwith the 'retro-crawl', I guess you could get the older captures, get the URLs for the pictures from those and save those
but you can't really /save/ an old page again
or continue a /save/ or something
[19:17]
ola_norskok. I'm guessing at least some number of the tweets are bound to have become deleted by the users themselves (or banned user accounts). [19:19]
JAAIf you visit the pages, it should grab any images that aren't in the archives already.
So I guess you could make your browser go through all those old crawls.
[19:19]
ola_norskouch
but yeah, that is what i meant :d
[19:20]
JAAOr perhaps it would work with wget --page-requisites as well, not sure. [19:20]
ola_norski'll rather try that than sit scrolling in my browser :D [19:21]
opening a capture in the browser does not seem to work to pull the images https://web.archive.org/web/20171130120002/https:/twitter.com/hashtag/netneutrality?f=tweets
only user avatars etc seems to be present
and those f*cking t.co links...pissing me off :/
[19:32]
.... (idle for 15mn)
***dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[19:51]
CoolCanukI'm not American and articles aren't helping... how fast is Cumulus Media declining ?
This looks like quite the "portfolio" https://en.wikipedia.org/wiki/List_of_radio_stations_owned_by_Cumulus_Media
[19:58]
schbiriddoes gdrive use some kind of incremental throttling for uploads? i am down to 1.5MB/s now :(
and it seems quite linear over time
[20:01]
***bitspill has joined #archiveteam-bs [20:01]
ola_norskCoolCanuk: https://www.marketwatch.com/investing/stock/cmlsq ..Not sure if it's really indicative though [20:02]
CoolCanukomg
0.095?!
iHeartRadio also seems troubled
however, iHeartRadio in Canada is likely not impacted, since I'm pretty sure Bell purchased rights to use it and it's a crappy radio streaming app for Bell Media radio stations- not true iHeartRadio
[20:02]
ola_norskCoolCanuk: All i see is the slope going down :d https://www.marketwatch.com/investing/stock/cmlsq/charts That's basically the max of my knowledge about stocks and shit :d [20:06]
CoolCanuksame here [20:07]
***SimpBrain has quit IRC (Remote host closed the connection) [20:14]
ola_norskCoolCanuk: a friend of mine who unfortunitaly passed away in 2015 once showed me daytrading thingy software. If i remember correctly the only thing that differed from the free API testing was that all the data was delayed
CoolCanuk: it wouldn't be useful for trading, but perhaps for alerting about online services going to hell
[20:18]
Kazschbirid: i think there's a limit of 750GB/day uploaded?
if you're close to that, could explain things
[20:28]
schbiridah, maybe
nope... today is just at "Transferred: 104.014 GBytes (1.540 MBytes/s)"
[20:29]
zinoschbirid, any packet loss? [20:31]
schbiridno idea, how do i check? [20:31]
Froggingping maybe [20:31]
zinoWell, step one: Be on linux (and run the upload from the same machine), step two: run "mtr hostname.here" [20:32]
schbiridno idea what the hostnames for gdrive are [20:32]
Froggingoh yeah mtr that's better [20:32]
schbiridmtr rules [20:32]
zinoStep 0: Install iftop and check what address all your data is going too. :)
to*
[20:32]
schbiridduh, i feel dumb [20:33]
zinoDon't. There are many ways to do this, and today you learned a new one. [20:35]
schbiridrelearned [20:35]
zinoThere will be a test on what all flags to tar are and what they do tomorrow! [20:36]
schbiridi use longform
:P
tar is easy
looks like there is notraffic at all and rclone is doing some crap instead. makes sense to have the "speed" die down linearly then
[20:38]
.... (idle for 16mn)
***dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
SmileyG has joined #archiveteam-bs
Smiley has quit IRC (Read error: Operation timed out)
[20:54]
ola_norskCoolCanuk: that cumulus media thing made my brain conjure up some silly idea https://pastebin.com/raw/32k6st0E [20:57]
***SmileyG has quit IRC (Ping timeout: 260 seconds)
dashcloud has joined #archiveteam-bs
Smiley has joined #archiveteam-bs
[21:04]
Kazschbirid: any cpu activity from rclone? [21:19]
***BlueMaxim has joined #archiveteam-bs [21:20]
schbiridi just straced it and it has connection time outs all over [21:20]
........ (idle for 37mn)
***schbirid has quit IRC (Quit: Leaving) [21:57]
CoolCanukshould Wikia be moved to Fandom, or is it okay to redirect Fandom to Wikia? [21:58]
ola_norskJAA: i tried this wget command, wget -O /dev/null --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" --quiet --page-requisites "https://web.archive.org/save/https://twitter.com/hashtag/bogus?f=tweets" ..it's 100% quiet, though it doesn't seem to return more than using curl did.
JAA: i won't know until the captures show up on wayback though
[22:00]
JAAola_norsk: You might want to write a log file to figure out what it's doing exactly. -o is the option, I think. [22:09]
ola_norskJAA: without -O it does make a directory structure. but it doesn't seem to contain image data
JAA: It seems to be just the same data, only then in e.g web.archive.org/save/https\:/twitter.com/hashtag/bogus\?f=tweets
in folders, i mean, instead of the (same?) data going to -O
[22:13]
JAAHm [22:17]
ola_norskJAA: https://pastebin.com/FKu3mHbh this showes the structure of what it does
JAA: the 'hashtag/bogus?f=tweets' is the only file apart from robots.txt
[22:19]
JAARight [22:20]
***noirscape has joined #archiveteam-bs [22:20]
ola_norskcould Lynx browser be tricked into acting like a 'real' browser perhaps? [22:21]
***noirscape has quit IRC (Client Quit) [22:22]
JAAI doubt it. [22:22]
***fie has quit IRC (Ping timeout: 633 seconds) [22:23]
JAANot sure why your command doesn't work.
But yeah, a log file would help.
Maybe with -v or -d even.
[22:23]
ola_norskone sec [22:23]
***MrDignity has joined #archiveteam-bs [22:25]
ola_norskJAA: 'default output is verbose.' ..and there's quite little there i'm afraid :/
ill see if there's some options that give it better
[22:27]
***fie has joined #archiveteam-bs [22:28]
JAAola_norsk: "Not following https://web.archive.org/save/_embed/https://pbs.twimg.com/profile_images/848200666199629824/ZwvxQIzP_bigger.jpg because robots.txt forbids it."
Fucking robots.txt
It breaks everything. :-P
[22:30]
ola_norsktry setting --user-agent [22:30]
JAAI did. [22:31]
ola_norskmaybe it's a javascript thingy, that loads all the shit? :/ [22:31]
JAAI used your exact command.
-e robots=off
[22:31]
ola_norskhmm [22:31]
***shin has quit IRC (Quit: Connection closed for inactivity) [22:35]
ola_norskJAA: here is output from me running the command (Note, it's in norwegian :/ ) https://pastebin.com/awJ9j4D8 [22:36]
CoolCanuk(please correct me if i'm wrong) [22:37]
ola_norskJAA: could it be i'm using older wget or something? [22:37]
JAAola_norsk: With -e robots=off?
Maybe, what version are you using?
I'm on 1.18.
I don't think it should matter too much though.
[22:37]
ola_norskJAA: GNU Wget 1.17.1
sry, didn't notice the robots=off
[22:39]
JAAHmm
It seems that it doesn't work with -O /dev/null, interesting.
[22:40]
ola_norskrobots=off did something else indeed, but i'm guessing it didn't do much better than when you ran it
a slew of 404 errors appeared
[22:41]
JAAYeah, I got a bunch of 404s as well, but not all requests were 404s. [22:42]
ola_norsk--2017-12-02 23:41:17-- https://web.archive.org/save/_embed/https://abs.twimg.com/a/1512085154/css/t1/images/ui-icons_2e83ff_256x240.png
Kobler til web.archive.org (web.archive.org)|207.241.225.186|:443 … tilkoblet.
HTTP-forespørsel sendt. Venter på svar … 404 Not Found
2017-12-02 23:41:18 PROGRAMFEIL 404: Not Found.
is one png
[22:44]
JAAYeah, that doesn't exist.
But my command earlier grabbed https://pbs.twimg.com/profile_images/848200666199629824/ZwvxQIzP_bigger.jpg for example.
[22:44]
ola_norskso, it's robots.txt on the endpoints that causes the failures? [22:45]
JAArobots.txt at web.archive.org, yes.
Ah, no.
That's what causes wget not to retrieve the page requisites without -e robots=off.
[22:45]
ola_norskno, i mean at e.g : abs.twimg.com ? [22:46]
JAAThose 404s, not sure. Might just be broken links or misparsing. [22:46]
ola_norskdamn internet, it's a broken big fat mess
cloudflare and shit
[22:47]
CoolCanukwhich website are you trying to access that cloudflare wont let you
I can possibly help get the true IP
[22:48]
ola_norskit's to get waybackmachine to capture webpages, including images, with doing just request [22:49]
CoolCanukoh :/ [22:50]
JAAHTML is a huge clusterfuck. Well, to be precise, HTML is fine, but the parsing engines' forgiveness is awful.
And don't get me started on JavaScript.
[22:51]
ola_norskCoolCanuk: i've messed up, thinking it would actually do captures by doing just that with automatic requests..but turns out it wasn't that easy :/
JAA: aye. Is it possible that twitter uses javascript to put in the images, AFTER the page is loaded?
[22:51]
JAADefinitely possible. [22:52]
ola_norskJAA: if so, i'm giving up even trying :d [22:52]
JAABut at least part of it is not scripted.
My test earlier grabbed https://pbs.twimg.com/media/DQDHMryX4AEseEo.jpg for example, which is an image from a post most likely (though I'm not going to try and figure out which one).
[22:53]
ola_norskI think i'll just let the curl stuff run until the 14th, and let someone brigther than me figure it out in the future. [22:55]
JAASometimes, I hate the WM interface. "3 captures" *click* only lists one. [22:55]
ola_norskone thing is images, but another is that basically all links on twitter are shorterened links [22:56]
JAAYeah, but if you want to follow those, you'll definitely need more than that.
I mean, it might work with --recursive and --level 1 or something like that.
But it would really be better to just write WARCs locally and upload those to IA.
[22:57]
ola_norskthe t.co links do come with the actual link the ALT= tag i think , not sure though
<a alt=> property i mean
[22:57]
JAANever looked into them.
What you're describing is more or less what I'm doing from time to time with webcams.
I did that during the eclipse in the US in August, and I'm currently retrieving images from cams across Catalonia every 5 minutes.
It's just a script which runs wpull in the background + sleep 300 in a loop.
A cronjob might be cleaner, but whatever.
[22:58]
ola_norskwith --recursive it does seem to take a hell of a lot longer..
and that's maybe a good sign
[23:02]
JAAYeah, it's now retrieving all of Twitter.
Well, maybe not all of it, but a ton.
[23:02]
ola_norskola_norsk suddenly archive all of internets [23:03]
JAASolving IA's problems. Genius! [23:03]
ola_norskaye
maybe that level thing is not a bad idea :d
[23:03]
JAA:-P [23:05]
ola_norskany way i could limit it to let's say 1-2 "hops" away from twitter? :D
...seriously, it's still going
it went from #bogus hashtag to shotting #MAGA..
[23:05]
JAAYep, and it'll retrieve every other hashtag it can find. [23:07]
ola_norskaye [23:07]
JAAIt's the best recursion. Believe me! [23:07]
ola_norsk'recurse all the things!' lol
at the very least i think it needs some pause between these requests :d
[23:07]
zinostack exausted, core dumped. [23:09]
ola_norskit's doing bloddy mobile.twitter.com now ..
nobody needs that
it's brilliant though :D , i just hope it did the images :D
[23:10]
JAAIt did exactly what you told it to. :-P [23:12]
ola_norskthat just proves computes are stupid :d [23:13]
JAAYeah, that or... :-P [23:13]
ola_norskthe Illuminati did it
but, i'm thinking if was limited to just 1-2 hops, even 1, that would be enough to get most images. Or?
[23:14]
JAA--page-requisites gets the images already.
(But apparently only if you actually write the files to disk. My tests with -O /dev/null did not work.)
You only need recursion with a level limit if you also want to follow links on the page.
Which might make sense, retrieving the individual tweets for example.
[23:17]
ola_norskcould you pastebin the command you did that does image capture? [23:18]
JAABut if you want to have any control over what it grabs (for example, not 100 copies of the support and ToS sites), it'll get complex...
Uh
Closed the window already, hold on.
[23:18]
ola_norskthe --recursion is violent :d [23:19]
JAAIt's awesome, you just need to know how to control it. :-) [23:19]
***jschwart has quit IRC (Quit: Konversation terminated!) [23:19]
ola_norskaye
as for any output, if i can't put in /dev/null it'll go in a ramdisk that cleared quicly
that's
[23:20]
JAAUhm, dafuq? https://web.archive.org/web/20171202231923/https:/twitter.com/hashtag/bogus?f=tweets
That's my grab from a few minutes ago.
Well, it did grab the CSS etc.
I didn't specify the UA though. That might have something to do with it.
[23:22]
ola_norski'm not sure how they distrubute the requests between 'nodes' [23:24]
JAAThe command was wget --page-requisites -e robots=off 'https://web.archive.org/save/https://twitter.com/hashtag/bogus?f=tweets' [23:24]
ola_norskty [23:24]
JAARegarding the temporary files: mktemp -d, then cd into it, run wget, cd out, rm -rf the directory.
Five-line bash script. :-)
[23:24]
ola_norskJAA: sometimes i notice twitter.com requires login for anything. Maybe it varies by country. I'm not sure.
ty, gold stuff
[23:28]
JAAYeah, Twitter's quite annoying to do anything with it at all.
We still don't have a solution for archiving an entire account or hashtag.
[23:29]
ola_norskthey make money of off doing that
so they will not make it easy
if you're from a research institution, they would easily hand over hashtag archive from day0. For a slump of money, of course
[23:30]
JAAYeah [23:32]
CoolCanukis there a mirror of the wiki we can use until it's stable? [23:36]
JAANo, I don't think so.
There's a snapshot from a few months ago in the Wayback Machine, I believe.
[23:36]
ola_norskthat command entails 1.7Megabytes of data :D what is the internet coming to?? lol
mankind doesn't deserve it :d
[23:46]
JAA"The average website is now larger than the original DOOM." was a headline a few years ago...
web page* I guess
[23:49]
ola_norskaye, i think just the fucking front page of my online bank is ~10MB :/
no wonder dolphins are dying from space radiation and ozone
[23:49]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)