Time |
Nickname |
Message |
00:00
🔗
|
godane |
so lifehacker.com and deadspin.com sitemap urls are all updated |
00:07
🔗
|
godane |
tay.kotaku.com redirect to tay.kinja.com |
00:11
🔗
|
|
JesseW has joined #archiveteam-bs |
00:12
🔗
|
|
BartoCH has joined #archiveteam-bs |
00:15
🔗
|
JesseW |
Suggestions for scripts for owners of a private github repo to use extract the issues and publish them? https://github.com/npm/www/issues/9 |
00:15
🔗
|
JesseW |
I suggested joeyh's github-backup -- but other suggestions would also be very welcome. |
00:25
🔗
|
|
RichardG_ is now known as RichardG |
00:27
🔗
|
godane |
https://archive.org/details/Making_of_Antarctica |
00:27
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
00:27
🔗
|
godane |
https://archive.org/details/Otaku_JJ_Beineix |
00:28
🔗
|
godane |
https://archive.org/details/Raid_1954 |
00:28
🔗
|
godane |
https://archive.org/details/Audience_of_One_-_2007 |
00:29
🔗
|
godane |
https://archive.org/details/Showdown_in_Little_Tokyo_Uncut_CG |
00:29
🔗
|
godane |
https://archive.org/details/Hollywood_Mavericks.1990.Florence_Dauman.Dale_Ann_Steiber.mkv |
00:30
🔗
|
godane |
thats all of the Cinemageddon videos i uploaded to FOS |
00:30
🔗
|
godane |
i figure people here would want them |
00:37
🔗
|
hook54321 |
I'm pretty sure lycos ignores robots.txt. At least partially... |
00:39
🔗
|
hook54321 |
Does Lycos have any search operators? |
00:40
🔗
|
alembic |
looks like they lost most of them in 2004 when they switched to Yahoo! DB? |
00:40
🔗
|
alembic |
http://www.searchengineshowdown.com/features/lycos/ |
00:45
🔗
|
hook54321 |
the advanced page search doesn't seem to exist anymore :/ |
00:47
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
00:57
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
01:02
🔗
|
|
username1 has joined #archiveteam-bs |
01:05
🔗
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
01:59
🔗
|
|
schbirid2 has joined #archiveteam-bs |
02:03
🔗
|
|
username1 has quit IRC (Read error: Operation timed out) |
02:06
🔗
|
|
Aranje has joined #archiveteam-bs |
02:19
🔗
|
|
hook54321 has left |
03:05
🔗
|
|
hook54321 has joined #archiveteam-bs |
03:13
🔗
|
hook54321 |
can someone re-op me in #archivebot? svchfoo disappeared |
03:28
🔗
|
|
mutoso_ has joined #archiveteam-bs |
03:29
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
03:29
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
03:31
🔗
|
|
Whopper_ has joined #archiveteam-bs |
03:31
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
03:31
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
03:32
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
03:34
🔗
|
|
Smiley has joined #archiveteam-bs |
03:37
🔗
|
|
Whopper has quit IRC (Read error: Operation timed out) |
03:47
🔗
|
|
closure has joined #archiveteam-bs |
04:20
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:27
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:28
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
04:36
🔗
|
|
beardicus has joined #archiveteam-bs |
04:49
🔗
|
godane |
i'm uploading a adland.tv web archive from 2014-09-01 |
04:50
🔗
|
godane |
just know its incomplete meaning it stop before being completed |
04:50
🔗
|
godane |
but its +500M of it |
04:51
🔗
|
SketchCow |
Star Trek Beyond |
04:52
🔗
|
SketchCow |
a-ok |
04:52
🔗
|
godane |
i watched that on my birthday |
04:52
🔗
|
godane |
i also went to five guys |
04:53
🔗
|
godane |
https://archive.org/details/adland.tv-20140901 |
05:04
🔗
|
|
brayden has joined #archiveteam-bs |
05:04
🔗
|
|
swebb sets mode: +o brayden |
05:09
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
05:36
🔗
|
|
tomwsmf has quit IRC (Ping timeout: 255 seconds) |
06:09
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:12
🔗
|
ranma |
wow |
06:12
🔗
|
ranma |
The Cuban CDN http://hn.premii.com/#/article/12319063 |
06:13
🔗
|
|
dashcloud has joined #archiveteam-bs |
07:14
🔗
|
|
JesseW has joined #archiveteam-bs |
07:27
🔗
|
|
acridAxid has quit IRC (marauder) |
07:28
🔗
|
|
acridAxid has joined #archiveteam-bs |
07:51
🔗
|
|
Honno has joined #archiveteam-bs |
07:59
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
08:29
🔗
|
|
GE has joined #archiveteam-bs |
08:35
🔗
|
|
BartoCH has joined #archiveteam-bs |
10:51
🔗
|
|
GE_ has joined #archiveteam-bs |
10:54
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
10:54
🔗
|
|
GE_ is now known as GE |
11:47
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
11:48
🔗
|
|
BartoCH has joined #archiveteam-bs |
12:03
🔗
|
|
BartoCH has quit IRC (Quit: WeeChat 1.5) |
12:05
🔗
|
|
BartoCH has joined #archiveteam-bs |
12:57
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
13:02
🔗
|
|
BartoCH has joined #archiveteam-bs |
13:08
🔗
|
|
luckcolor has quit IRC (Read error: Operation timed out) |
13:08
🔗
|
|
luckcolor has joined #archiveteam-bs |
13:21
🔗
|
|
davidar has quit IRC (Quit: Connection closed for inactivity) |
13:26
🔗
|
|
GE has quit IRC (Quit: zzz) |
14:08
🔗
|
|
atrocity has joined #archiveteam-bs |
14:08
🔗
|
atrocity |
oh 90%, why was I so young... |
14:08
🔗
|
atrocity |
90's... |
14:08
🔗
|
atrocity |
https://www.youtube.com/watch?v=IY2j_GPIqRA |
14:41
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
15:04
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
15:06
🔗
|
|
RichardG has joined #archiveteam-bs |
15:19
🔗
|
|
GE has joined #archiveteam-bs |
16:26
🔗
|
|
GE_ has joined #archiveteam-bs |
16:28
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
16:28
🔗
|
|
GE_ is now known as GE |
17:38
🔗
|
|
JesseW has joined #archiveteam-bs |
17:59
🔗
|
godane |
i'm uploading more pdfs from the Sky and Telescope |
18:59
🔗
|
|
GE_ has joined #archiveteam-bs |
19:00
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
19:00
🔗
|
|
GE_ is now known as GE |
19:27
🔗
|
Igloo^ |
Is that the one I tried yesterday HCross |
19:27
🔗
|
Igloo^ |
4 million items? |
19:28
🔗
|
HCross |
is it the NASA funded docs? |
19:28
🔗
|
Igloo^ |
Yis |
19:28
🔗
|
Igloo^ |
(the NASA items aren't that big tho) |
19:29
🔗
|
HCross |
ah I was looking at it too |
19:29
🔗
|
HCross |
ArchiveBot prob wont get it all |
19:29
🔗
|
HCross |
ive exported it as XML and am looking at it |
19:29
🔗
|
Igloo^ |
Archivebot went a bit mad. |
19:29
🔗
|
Igloo^ |
I was working through it as individual items |
19:32
🔗
|
HCross |
Igloo^, doing some testing, but it may be easier to create a large list |
19:36
🔗
|
|
RichardG has quit IRC (Ping timeout: 250 seconds) |
19:37
🔗
|
Igloo^ |
Yeah, Create a list and whack it through AB |
19:42
🔗
|
godane |
that nasa docs i uploaded is around 90k |
19:42
🔗
|
|
RichardG has joined #archiveteam-bs |
19:43
🔗
|
|
tomwsmf has joined #archiveteam-bs |
19:44
🔗
|
HCross |
Igloo^, they let you export the ID #'s but its per page. If I list it by 100 then its only 9 pages. Ill then write something that will generate up the URLs |
19:45
🔗
|
Igloo^ |
!ao works |
19:45
🔗
|
Igloo^ |
However, Doesn't get the sub images |
19:45
🔗
|
HCross |
or not. Managed to download a complete list |
19:45
🔗
|
HCross |
yea |
19:45
🔗
|
Igloo^ |
Which is a bit ropey. |
19:46
🔗
|
HCross |
!ao gets too much other stuff |
19:46
🔗
|
Igloo^ |
ao gets the full site for the waybackmachine |
19:46
🔗
|
HCross |
it may be that its better if I do a grab-site instance with some custom ignores etc |
19:46
🔗
|
Igloo^ |
I was thinking of doing a custom Heritrix run |
19:46
🔗
|
Igloo^ |
BUT I don't think that'll work |
19:46
🔗
|
Igloo^ |
as it'll be huge |
19:46
🔗
|
HCross |
want me to generate a full list of URLs anyway |
19:47
🔗
|
Igloo^ |
Sure |
20:07
🔗
|
HCross |
Igloo^, www.ncbi.nlm.nih.gov/pmc/articles/PMC4973959 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4980455 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4971634 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4964660 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4971156 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4939048 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4937211 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4917110 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4934352 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4926486 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4932956 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4872529 |
20:07
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4870578 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4896262 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4919777 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4848480 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4831017 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4846461 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4820435 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4814050 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4797119 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4794207 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4808930 |
20:08
🔗
|
Igloo^ |
Fucking patebin it or something |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4866469 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4771323 |
20:08
🔗
|
Igloo^ |
Instead of several hundred lines |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4751316 |
20:08
🔗
|
Igloo^ |
:P |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4750446 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4760178 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4810239 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4738353 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4829277 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4770934 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4731148 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4729913 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4728390 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4727388 |
20:08
🔗
|
HCross |
www.ncbi.nlm.nih.gov/pmc/articles/PMC4718941 |
20:08
🔗
|
|
HCross was kicked by Frogging (HCross) |
20:08
🔗
|
Igloo^ |
Thank you Frogging |
20:10
🔗
|
Frogging |
I wonder if he meant to paste the pastebin link but still had the list in his clipboard :po |
20:10
🔗
|
Frogging |
:p * |
20:10
🔗
|
Igloo^ |
I think that's what he meant to do :P |
20:10
🔗
|
Igloo^ |
But yaknow, noob etc |
20:12
🔗
|
HCross2 |
Now waiting while my hexchat stops having a meltdown over that, sorry |
20:12
🔗
|
Frogging |
no problem :p |
20:12
🔗
|
|
Frogging sets mode: +o HCross2 |
20:13
🔗
|
|
HCross has joined #archiveteam-bs |
20:13
🔗
|
HCross |
there we go |
20:13
🔗
|
HCross |
http://paste.nerds.io/axorogoxif.avrasm |
20:13
🔗
|
|
Frogging sets mode: +o HCross |
20:13
🔗
|
HCross |
thanks |
20:15
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
20:15
🔗
|
|
JesseW has joined #archiveteam-bs |
20:16
🔗
|
|
kristian_ has joined #archiveteam-bs |
20:26
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
20:29
🔗
|
arkiver |
is it 4 million docs? |
20:29
🔗
|
HCross |
the new ones that they released are just 900 oddd |
20:34
🔗
|
Igloo^ |
It would be a good warrior job. |
20:41
🔗
|
godane |
http://www.ncbi.nlm.nih.gov/pmc/journals/1978/ |
20:41
🔗
|
godane |
you grab by journal number |
20:41
🔗
|
godane |
then grab the links from those pages |
20:41
🔗
|
godane |
the pdfs are linked there |
20:42
🔗
|
godane |
http://www.ncbi.nlm.nih.gov/pmc/issues/218561/ |
21:17
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
21:45
🔗
|
|
GE has quit IRC (Quit: zzz) |
21:56
🔗
|
|
GE has joined #archiveteam-bs |
22:17
🔗
|
|
GE_ has joined #archiveteam-bs |
22:20
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
22:20
🔗
|
|
GE_ is now known as GE |
22:34
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:34
🔗
|
|
Start has joined #archiveteam-bs |
22:45
🔗
|
|
Coderjoe has joined #archiveteam-bs |
22:59
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
23:13
🔗
|
|
kristian_ has quit IRC (Leaving) |
23:14
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:14
🔗
|
|
tomwsmf has quit IRC (Read error: Operation timed out) |